Encryption is used to protect the confidentiality and integrity of information and provide authentication between the client and the server. Web technology uses two kinds of encryption mechanisms together: single (private) key encryption and public key encryption. The main subject of this book is the security of networks by the means of firewalls and proxy servers; it will not focus on the protection of data by encryption. However, we will briefly go over the basics of encryption technology, and how it is used in the World Wide Web.
SINGLE KEY CRYPTOGRAPHY
In single key encryptionalso known as symmetric encryptionthe same key is used to both encrypt and decrypt the data. The encryption algorithm uses a key to produce a sequence of data so that, given the same key, the data can be decrypted into its original form. The key has to be kept secret in order to protect the data. If the key gets compromised, the data can be decrypted. If the key is lost, the data cannot be decrypted. If the key is wrong or corrupt, the decryption will produce garbled data.
Single key encryption can be used to encrypt files and messages to protect them from unauthorized eyes. However, to use single key encryption between two parties to encrypt the messages, the keys have to be established beforehand. Both parties must have the same secret key in order to establish secure communication.
Single key encryption can be represented with the following formula, where the message
M is encrypted with the secret key KS:
M
encrypted = Encrypt(KS, M)
and can be decrypted with the same key:
M
= Decrypt(KS, Mencrypted)
Example.
A very simple example of single key encryption might be an encoding that advances the alphabetic characters by a certain number (the key). Lets say the key is 3; in that case, "A" becomes "D," "B" becomes "E," and so on. Wrapping occurs at the end of the alphabet, so that "X" becomes "A," "Y" becomes "B," and "Z" becomes "C." The same can be applied to digits. Now, two people, Rob and Mike, want to exchange encrypted messages. They agree on the key "3," and can now send messages to each other, incomprehensible to other people. Rob encrypts the message to Mike by incrementing the characters and digits by three. The original message
HI MIKE LETS MEET AT THE CAFETERIA AT 10AM. ROB.
Mike decrypts it by using the same key "3," but doing the reverse transformation: decrementing each character by 3.
Naturally, real-life encryption algorithms are much more complex. The encryption in the above example is easy to breakthat is, it is easy to figure out what the key and the contents of the message are without knowing the key. It demonstrates the following weaknessesall of which are important aspects in stronger algorithms:
Only alphanumeric characters are encryptedpunctuation and word and line breaks are left in tact. This provides several hints of the structure of the message:
The two first words are probably a greeting; one might guess that "KL" probably stands for "HI."
"43DP" is probably some sort of numeric expression, since there are no other numbers appearing in the message. It might be a time, where "DP" stands for "AM" or "PM," or it could mean a distance, such as "MI" or "KM."
The word in the end is probably the sending persons name.
One of the strategies for breaking encrypted messages is to guess what the few first bytes of the message might be and start to reverse the encryption algorithm from then on.
Note that the HTTP protocol request and response headers contain quite few predictable fields, with predictable values. This is the case for many other applications as well, such as electronic forms used for data entry. In other words, predictable data is not limited to texts written by humans; in fact, the predictability can be even higher with protocols and applications that are well known by the cracker.
Real-life encryption algorithms treat all the data in the message equally, as a stream of binary data. They make no distinction between alphanumeric characters and others.
Our algorithm works the same throughout the message. Once we have figured out that "KL" stands for "HI" and noticed that both characters are shifted by 3 in the alphabet, it doesnt take long to realize that the whole message is encrypted in this fashion.
Advanced algorithms change throughout the message, so that the data that has been encrypted before has an affect on how the following data will be encrypted. For example, the next character might be encrypted by the key, plus the sum of some of the previous characters. This way, in order to break the encryption of a portion in the middle of the message, the whole code so far must have been broken.
We will not go to more detail on this subject in this book. To fully understand the strengths that encryption algorithms must have, and tactics that may be used to break encryption, books on that specific field should be studied.
PUBLIC KEY CRYPTOGRAPHY
Public key encryptionalso known as asymmetric encryptionuses two keys: a public key and a private key. The keys are coupledthe key pairs are generated together, and if one of them is lost, the system becomes useless. Data is encrypted with one of the keys, and it is decrypted with the other. Furthermore, something encrypted with one of the keys
cannot be decrypted with the same key. Public key cryptography is based on complex mathematical algorithms which are next to impossible to reverse, even when the key used for encryption is known. On the other hand, knowing the corresponding second key, the decryption will happen smoothly and fairly fast.
Public key cryptography is used so that all the parties wishing to engage in encrypted communication generate a key pair for themselves. One of the keys is made public; the other one is kept private. A party wishing to send an encrypted message to someone else simply fetches and uses the public key of that party. The resulting message is unreadable to anyone except the holder of the corresponding private keythe intended recipient of the message.
The beauty of public key cryptography is that the public keys can be freely distributed. Encrypted communication can be established between any two parties once the public keys are shared between them. No secret information needs to be preestablished between the parties. Anyone can have access to the public keys, but they will still not be able to eavesdrop on the encrypted communication.
Formally, the public key
Kpublic is used to encrypt the message M:
M
encrypted = Kpublic(M)
and the result can be decrypted by applying the private key:
M
= Kprivate(Mencrypted)
Example.
Lets say Rob wants to send Mike another secure message, this time using public key cryptography. Rob will use Mikes public key (which is available to anyone) to encrypt the message. Note that the message cannot be decrypted by anyone else, since the public key cannot be used to decrypt it. Only Mike has the corresponding private key that will decrypt messages encrypted with the public key.
Note, that the public key
can be used to decrypt any messages encrypted by the private key. That is, a message M encrypted with the private key Kprivate
M
encrypted = Kprivate(M)
can be decrypted with the public key
Kpublic
M
= Kpublic(Mencrypted)
Well explain the uses of this important feature later in this chapter when we discuss authentication.
Public key encryption and decryption are computationally
expensive operations; hence they are slow. Even advanced workstation computers can only perform just a few operations per second which involve the private key, and about 50100 operations with the public key. Therefore, in practice, public key cryptography is not used to encrypt entire messages. Instead, it is used only to establish the encryption key for the data, and then single key encryption (which is faster) is used for encrypting the data itself.
In other words, Rob would not have encrypted the message he sent to Mike using Mikes public key. Instead, he would have picked a random key, used that as a secret key for single key encryption, and then encrypted this key using Mikes public key. Then he would send both the encrypted key, as well as the encrypted message itself, to Mike. Mike would then first decrypt the key using his private key and then use the result to decrypt the actual message.
Formally, the message is encrypted using a single key algorithm:
M
encrypted = Encrypt(KS, M)
and the key is then encrypted using public key of the recipient:
K
Sencrypted = Kpublic(KS)
Both the encrypted message
Mencrypted and the encrypted secret key KSencrypted will be sent to the recipient. The recipient will start by decrypting the secret key with his or her private key:
K
S = Kprivate(KSencrypted)
after which the message can be decrypted with the resulting secret key:
M
= Decrypt(KS, Mencrypted)
AUTHENTICATION WITH PUBLIC KEY CRYPTOGRAPHY
In the above example, the sender of the message is not authenticated. That is, when Mike receives the message and it says that it is from Rob, Mike has really no way of verifying it. Anybody could have used Mikes public key and constructed that message. However, public key encryption can be used to provide authentication as well. This is accomplished such that after the message has been encrypted with the recipients public key, it will be reencrypted with the senders private key. The recipient will first decrypt the message with the (claimed) senders public key (which is available to everybody), and then the recipients private key. This way, the recipient knows that the message was really sent by the claimed sender since no one else would have been able to construct a message that would decrypt with the senders public key (unless the corresponding private key were compromised).
Again, as mentioned earlier, entire messages are not really encrypted and decrypted with the private and public keys. In our example, the encryption would be done only on the key used to encrypt the actual message. In other words, after Rob has encrypted the message data
M using a fast single key algorithm with a randomly generated key KS:
M
encrypted = Encrypt(KS, M)
he would then encrypt the key
KS with Mikes public key KMikepublic, and then with his own private key KRobprivate:
K
Sencrypted = KRobprivate(KMikepublic(KS))
Now, Mike will start by decrypting the message first with Robs public key, then his own private key:
K
Sencrypted = KRobpublic(KSencrypted)
K
S = KMikeprivate(KSencrypted)
or, expressed in a single formula:
K
S = KMikeprivate(KRobpublic(KSencrypted))
If the message is spoofedthat is, not really sent by Robthe first decryption will fail, or produce corrupt data. This will be noticed either immediately by integrity checks built into the decryption algorithm, or later when the message fails to decrypt (or yields garbled data). Otherwise, Mike will proceed to decrypt the actual message with the secret key given as a result from the above decryption:
M
= Decrypt(KS, Mencrypted)
MESSAGE DIGEST (HASH) ALGORITHMS
Message digest (hash) algorithms are mathematical algorithms that take any amount of data as their input and produce a fixed-size result that is a "signature," or a "fingerprint" of the data. Such a fingerprint is an extremely compressed form of the source data. The compression is not reversiblethat is, it is not possible to take the message digest and turn it back to the original data. However, it can be used to verify with very high probability that the data is the same as the data used to generate the message digest in the first place. Even changing one character in the message will change its message digest. Message digest schemes also notice if the data is rearranged, if bits are transposed, or even if a 1 is added to a byte and subtracted from the next. In practice, its next to impossible to tamper with the data in any way without changing the value of the message digest.
Sidebar
A "fingerprint" is in fact a very good analogy for a message digest. A fingerprint cannot be used to determine what the person looks like, what he knows, or to clone that person. However, it can be used to uniquely identify that person with extremely low margin for error. In the same way, a message digest does not contain all the information in the original message; however, it can be used to verify that the message is [with a high probability] the same as the original message from which the digest was calculated.
Message digests can be used to verify the integrity of datathat the data has not been modified or tampered with, whether intentionally or by accident. A message digest is simply an advanced form of a checksum of the data. Since the message digest is with very high probability unique to a given piece of data, it is very hard to come up with another piece of data that would have the same message digest. In other words, it makes intentional (malicious) modification of data very hard, if not impossible, to do without its being noticed.
Note that message digests are equally useful for unencrypted data. Sometimes the data is not private in a sense that it would require it to be encrypted. However, at the same time it may be extremely important that the data is accurate and not tampered with. An example might be a stock quote: it is essentially public information so there is no need for encryption, but its integrity should be verified so it cannot be maliciously altered, potentially causing financial losses.
Message digest algorithms, such as MD5 and SHA, play an important role in public-key-cryptography-based messaging. Combined with single key and public key encryption, message digests provide for stronger authentication and integrity of the data.
Lets go back to our example when Rob sends an encrypted message to Mike. Our last version was one where we had a secret key
KS to encrypt the message M, and then we encrypted the key itself with double public key encryption, using Mikes public key and Robs private key. This mechanism would not necessarily notice if the data had been tampered with. It would simply come out corrupt. Lets now add a message digest algorithm, say MD5, into the picture; before the data is encrypted, Rob will calculate the message digest D:
D
= MD5(M)
The message will be encrypted just as before using the secret key
KS, but its the digest attached to the message that will be encrypted. After decryption, the message digest will be verified, and if tampering has occurred, it will be noticed.
The MD5 Algorithm
MD5 is a mathematical algorithm that produces a 128-bit (16-byte) signature, or a "fingerprint," for any piece of data that the algorithm is applied to. Furthermore, any such fingerprint is with very high probability unique to that piece of data, that is, it is very hard to come up with another piece of data that would have the same MD5 signature.
MD5 signatures can be used to verify the integrity of data, that the data has not been modified or tampered with, whether intentionally or by accident.
The amount of data given to MD5 does not matter; it can be applied to a single character as well as several megabytes of data, such as an entire encyclopedia. The result is always 128 bits.
The MD5 algorithm is irreversible; given just the MD5 signature there is no way to recover the data that was used to calculate that given MD5 signature. That is, you cannot "decrypt" an MD5 signature and get back the original data.
Therefore, MD5 signatures are used such that MD5 is applied to the data that is being verified, and then the two MD5 signatures are compared to each other. If they match, the data has not been modified [1] .
There are several other algorithms, such as
SHA, that perform a task similar to MD5 and that are cryptographically stronger (harder to "break").
CERTIFICATES
Public keys may be distributed freely without the risk of eavesdropping on the encrypted communication between the two parties of the secure session. However, it does not provide authentication by itself. That is, a malicious user Bob could generate his own key pair and pose as Rob, presenting his own public key as Robs. Mike could mistakenly trust that the public key is Robs and believe that he is really sending the message to Rob. Instead, he is sending it to Bob, encrypted with a key that Bob can decipher using his private key.
Certificates solve this problem. A certificate is a piece of data that associates identity with a public key. This data is digitally signed by a well-known authority, such as RSA or VeriSign.
Basically, the well-known authority has its own public and private keys,
Kpublicauthority and Kprivateauthority, respectively. The private key is well guarded. The public key is well known and trusted. It may actually be built into the software.
A user wishing to get a certificate will first generate his or her key pair,
Kpublicuser and Kprivateuser. The public key is sent to the certifying authority, along with the users information, user_info [2]. The certifying authority will calculate a hash of the users public key and associated information:
Digest
= Hash(Kpublicuser + user_info)
The digest is then encrypted with the authoritys private key:
Signature
= Kprivateauthority(Digest)
This encrypted piece of data is included as part of the issued certificate:
Certificate
= { Kpublicuser + user_info + Signature }
Now, someone wishing to authenticate a user or other entity will get the entitys public key, along with the certificate. The public key is verified by calculating the hash of the public key and other information in the certificate:
Digest
1 = Hash(Kpublicuser + user_info)
Then, the encrypted signature is decrypted with the certifying authoritys well-known public key:
Digest
2 = Kpublicauthority(Signature)
If the two digests
Digest1 and Digest2 match, the entitys public key is considered valid. Basically, the certifying authority testifies that the public key really belongs to the user, or other entity, indicated in the certificate.
This was a simplified overview of the theory of how certificates work. In practice, there may be subtle differences from the way outlined above.
SUMMARY
This chapter provided a brief overview of the most important aspects of public key encryption. After this, you will know the basic terminology and theory behind encryption-based security. While encryption techniques provide confidentiality, integrity, and authentication of data while its in the wire, there are other aspects of security that are equally important. The next chapter focuses on the aspects of making the internal network itself more secure.
Endnotes
1. With high mathematical probability. Theoretically, it is possible to have two pieces of data that yield the exact same MD5 signature. However, for practical purposes, this is extremely unlikely.
2. In this context, a "user" may actually be any entity that has a key pair, such as a secure Web server, or any party of secure communication. It is not limited to actual people.
REPORTS
Analyize In-Line NAC strategies and products.
ANALYTICS Plan and design your enterprise blade server deployments
InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Download Today