Input description: A plaintext message T or encrypted text E, and a key k.
Problem description: Encode T using k giving E, or decode E using k back to T.
Discussion: Cryptography has grown substantially in importance in recent years, as computer networks have made confidential documents more vulnerable to prying eyes. Cryptography is a way to increase security by making messages difficult to read if they fall into the wrong hands. Although the discipline of cryptography is at least two thousand years old, its algorithmic and mathematical foundations have recently solidified to the point where there can now be talk of provably secure cryptosystems.
There are three classes of cryptosystems everyone should be aware of:
The key issue in selecting a cryptosystem is identifying your paranoia level, i.e. deciding how much security you need. Who are you trying to stop from reading your stuff: your grandmother, local thieves, the Mafia, or the NSA? If you can use an accepted implementation of RSA, such as PGP discussed below, you should feel safe against just about anybody.
If there is an implementation of DES on your machine, that will likely be good enough for most applications. For example, I use DES to encrypt my final exam each semester, and it proved more than sufficient the time an ambitious student broke into my office looking for it. If the NSA had been breaking in, the story might have been different, although even here it is important to understand that the most serious security holes are human, not algorithmic. Making sure your password is long enough, hard to guess, and not written down is far more important than obsessing about the encryption algorithm.
Simple ciphers like the Caesar shift are fun and easy to program. For this reason, it is perhaps healthy to use them for applications needing only a casual level of security (such as hiding the punchlines of jokes). Since they are easy to break, they should never be used for serious security applications.
One thing that you should never do is mess around with developing your own novel cryptosystem. The security of DES and RSA is accepted largely because these systems have both survived over twenty years of public scrutiny. Over this period, many other cryptosystems have been proposed, proven vulnerable to attack, and then abandoned. This is not a field for amateurs. If you are charged with implementing a cryptosystem, carefully study a respected program such as PGP (discussed below) to see how its author, Philip Zimmermann, skillfully handled such issues as key selection and key distribution. A cryptosystem is as strong as its weakest link.
There are several problems related to cryptography that arise often in practice:
A more efficient if less reliable method is to use a checksum, a simple mathematical function that hashes a long text down to a simple number or digit, and then transmit the checksum along with the text. The checksum can be recomputed on the receiving end and bells set off if the computed checksum is not identical to what was received. Perhaps the simplest checksum scheme just adds up the byte or character values and takes the sum modulo some constant, say . Unfortunately, an error transposing two or more characters would go undetected under such a scheme, since addition is commutative.
Cyclic-redundancy check (CRC) provides a more powerful method for computing checksums and is used in most communications systems and internally in computers to validate disk drive transfers. These codes compute the remainder in the ratio of two polynomials, the numerator of which is a function of the input text. The design of these polynomials involves considerable mathematical sophistication and ensures that all reasonable errors are detected. The details of efficient computation are complicated enough that we recommend that you start from an existing implementation, described below.
Given a file, I can compute a checksum for it, and then encrypt this checksum using my own private key. I send you the file and the encrypted checksum. You can now edit the file, but to fool the judge you must also edit the encrypted checksum such that it can be decrypted to the correct checksum. With a suitably good checksum function, designing a file that yields the same checksum becomes an insurmountable problem.
What we need is some way for me to convince you that I know my key, without actually telling you the key. One such method to do so is for you to send me a random number or text, and I use my key to encrypt your text and send it back to you. If you then decrypt this text and compare it to the message you sent me, you gain confidence that I am who I say I am. We can repeat this exercise on several random texts until sufficient authentication has been agreed upon. If we use a secure enough cryptosystem, we can be confident that an eavesdropper will not be able to deduce my key even given several plain and encrypted texts.
Such authentication protocols of back-and-forth messages often involve the use of randomness to frustrate eavesdroppers. Different protocols satisfy particular needs and constraints about who has to know what. It is important to do some reading before attempting to design your own protocols. References are provided below.
Implementations: The USENET FAQ (frequently asked questions) file on cryptography provides a wealth of information, including pointers to implementations. Check it out at ftp://rtfm.mit.edu/pub/usenet/news.answers/cryptography-faq/.
Distributing cryptographic software is complicated by United States export restrictions, which make it illegal to export encryption software. PGP (Pretty Good Privacy) is such a good implementation of RSA that its author Philip Zimmerman was charged with export violations by federal authorities. PGP may be obtained from the Electronic Frontier Foundation (EFF) at http://www.eff.org/pub/Net_info/Tools/Crypto/PGP/.
A good discussion on checksums and cyclic-redundancy codes, with implementations in C, appear in [BR95]. The code for these algorithms is printed in the text and is available on disk for a modest fee.
The Stanford Graphbase (see Section ) uses checksums to ensure that data files remain unmodified from the original distribution. Algorithm 536 [Kno79] of the Collected Algorithms of the ACM is an encryption function for passwords, written in Fortran. See Section for further information.
Notes: Kahn [Kah67] presents the fascinating history of cryptography from ancient times to 1967 and is particularly noteworthy in light of the secretive nature of the subject. More recent and more technical works on cryptography include Denning [Den82] and Schneier [Sch94], the latter of which provides a through overview of different cryptographic algorithms, including implementations for sale. Rawlins [Raw92] provides a good introduction to cryptographic algorithms, from Caesar shift to public key to zero-knowledge proofs. An algorithm for breaking simple substitution ciphers appears in [PR79].
Expositions on the RSA algorithm [RSA78] include [CLR90]. The RSA Laboratories home page http://www.rsa.com/rsalabs/ is very informative. See [Sta95] for an excellent guide to PGP and its underlying algorithms.
The history of DES is well presented in [Sch94]. Particularly controversial was the decision by the NSA to limit key length to 56 bits, presumably short enough to be cracked by special-purpose computers costing on the order of several million dollars. Despite some theoretical progress in breaking DES analytically [BS93], the most significant threat remains special-purpose hardware.
MD5 [Riv92] is the secure hashing function used by PGP to compute digital signatures. Expositions include [Sch94, Sta95].
Related Problems: Factoring and primality testing (see page ), Text compression (see page )).