Protecting Your Data: It's Not Your Father's Encryption
Data loss is big news these days, and it's getting more and more common to hear stories about companies losing sensitive data. According to Privacy Rights Clearinghouse, a non-profit consumer advocacy organization, from January 2005 to February 2008, more than 900 incidents of data loss were publicly disclosed that resulted in exposure of the sensitive information of over 218 million people. This information has included names, addresses, phone numbers, driver's license numbers, bank account numbers, social security numbers, and credit card numbers.
Laptop computers are the biggest single source of data loss. They almost always contain sensitive information and they're lost with alarming regularity: roughly one in ten laptops end up lost or stolen each year. But laptops aren't the only cause of data loss. Other causes have included lost backup tapes, lost CDs, lost USB drives, lost memory sticks, erroneously sent e-mails, erroneously sent postal mail and accidental posting of sensitive data to a public web site. Almost any way in which it's possible to lose sensitive data has happened to someone.
At the same time, tighter data security and privacy laws are also making the loss of this data more and more expensive, and the public relations disaster that can accompany the public disclosure that sensitive data has been lost can be the single biggest cost of a high-profile data loss.
There are two ways to deal with the loss of sensitive data: you can try to stop the loss of sensitive data, or you can protect the data so that its loss won't cause any damage. If you can manage to do either of these then you won't need to worry about the headaches that accompany dealing with data loss incidents.
It is probably impossible to stop the loss of laptops and the many other ways in which data is lost, but it may be possible to protect data in a way that keeps the cost of the loss as low as possible. This can be done by using encryption, the use of complicated algorithms that scramble data in a way that it ends up looking like random bits instead of useful information. And because loss of encrypted data usually doesn't bring the same embarrassing public disclosure of the loss, encryption may appear to be a good solution to the problem of data loss. On the other hand, encryption has historically earned the reputation for being both difficult to use as well as expensive to support. Encryption may have once have been a cure worse than the disease, but new encryption technologies have made it a much more practical option than it was just a few years ago.
So don't be put off by what you might have heard in the past. The new encryption technologies that are now available are much easier to use than the technologies that were available even a few years ago. In particular, there are new technologies that now make it very practical to solve the problem of data loss and greatly reduce the chances that you'll end up on the list of data breaches maintained on Privacy Rights Clearinghouse's web site.
Using encryption can be tricky to do because of conflicting requirements. The data needs to be protected from unauthorized access, but it also needs to be easily accessible to authorized users. So the sensitive data has to be virtually impossible for some users to get but very easy for others to get, but it's very hard to do both of these well at the same time. The technology that makes this possible is what security vendors describe as "key management." There are also lots of interesting developments in this area, but they're unfortunately beyond the scope of this article.
The Levels of Data Encryption
As shown in Figure 1, encryption can take place on four different levels, and these levels form a "stack" just like the more familiar stack that's used in networking or communication protocols. In such a model, although there are always exceptions, data typically only gets passed from one layer to an adjacent layer. Encryption tends to get more expensive at the higher levels, while the coverage provided by the encryption also decreases at the higher levels. So while the lowest level might protect all of the data on a magnetic tape, the highest level might only protect sensitive data like a credit card number, leaving all other information unprotected. And while it's relatively cheap to encrypt backup tapes, it can require lots of changes to the way data is handled to implement a system that just encrypts credit card numbers, and this can make such an undertaking expensive. The new technologies that are described below, however, make encrypting at the higher levels much more practical than it once was, and this makes strategies for protecting sensitive data realistic that once were just too hard and expensive to implement.
Figure 1. The four levels of data encryption.
The lowest level of encryption is encryption of physical storage. This includes encrypting your disk and tape storage in way such that anything going into storage gets encrypted while anything coming out of storage gets decrypted. Then, if an encrypted disk or tape is lost or stolen the stored data is protected from unauthorized access. But even if is done, once the data leaves the physical storage it also loses the protection provided by the encryption.
Most of the work in key management is now focused on supporting encryption at the physical storage level because that's where the most pain is currently felt. This is mainly due to the issues with backup tapes, which need to be stored encrypted yet be readily available for disaster recovery or e-discovery purposes, but the need for better key management is gradually becoming felt in other areas as encryption becomes more commonly used for compliance reasons.
The next level up is encryption at the operating system level. Products that encrypt files or folders often work at this level. Its owner might label a file as needing encryption and then rely on the operating system to decrypt the file when an authorized user opens the file and encrypt the file again when the file is closed. This can be done fairly transparently, but it can also limit the useful scope of the encryption because the data loses the protection provided by the operating system once it leaves the control of the operating system. So if Alice has an encrypted file that she wants to send to Bob, the data will probably lose any protection provided by this level of encryption once it leaves Alice's computer. This approach may be a great way to protect the sensitive data on a laptop, but it may not protect the same data once it's written to a CD or copied to a USB drive. It's a good solution for some problems, but it's still not a perfect solution.
The next level up is database encryption. Most database products provide some version of this to protect the sensitive data that they store. One way to do this is to encrypt the entire database, so that any data written to the database gets encrypted and any data read from the database gets decrypted. Another way is to just encrypt the individual columns of a database that hold sensitive information. If not all of the information in a database is sensitive then it's more efficient to just encrypt the sensitive parts. In database of customer information, for example, there may be no reason to encrypt the customers' names, but their credit card numbers will need to be protected, and this can be done by just encrypting the column of the database where the credit card numbers are stored.
And just like in the case of encryption of physical storage or operating system level encryption, the protection provided by database encryption is lost when the data leaves the protection of the database. So although a database might protect all of the credit card numbers that are stored in it, once a credit card number is read from a database by an application, the credit card number is unprotected after it leaves the database.
The top level of the data encryption stack is the application level. At this level, individual applications protect data that they use. In this case, sensitive data is provided to an application unencrypted, is encrypted while it's being protected by the application, and then is decrypted after the application finishes with it. Application-level encryption is much more limited in scope that other levels in the encryption stack - it only protects a small subset of all of the data and leaves the rest of the data unprotected, but it focuses on the sensitive data, so that's not really a problem. Encrypting your e-mail is an example of application-level encryption. If you attach a document to an e-mail message and then send the message encrypted, the document is protected until it's opened by the recipient. Once the recipient decrypts the e-mail and saves the attachment, the protection provided by the encryption is lost.
Application-level encryption is also traditionally the most difficult and expensive to use, but that's no longer the case. New technologies finally make it practical, and that makes protecting sensitive data much easier than was possible in the past. It's also why you've probably heard the big security vendors talking about the promise of "data-centric security" in the past year or two, a model in which you change your focus from protecting your network to protecting the data that's in your network. In this model, you don't rely on firewalls and other technologies to keep hackers out of your network. Instead, you protect the sensitive data itself, and rely on the fact that any data that hackers might get access to is encrypted. This makes the data useless to them and its loss causes no damage to your organization. Full data-centric security isn't possible yet, but newer technologies make the first steps toward it possible today.
The Promise of Application-level Encryption
Suppose that you could encrypt all of your sensitive data so that the data would be protected no matter where it was. So instead of storing an account number, we could store an encrypted version of it instead. And if all applications that need to use the account number are able to decrypt it and use it, this will provide protection for the account number no matter where it is stored.
The sensitive data on laptops, for example, typically doesn't start on the laptops. Instead, it typically comes from a database, but the protection provided by the database is lost when the data leaves it. This leaves the data unprotected on the laptop, which can lead to its accidental disclosure if the laptop is lost. And even if the hard drive of a laptop is encrypted, when sensitive data is copied to a CD, the protection provided by the hard drive encryption is lost. If sensitive data could be encrypted at the application level, however, then the data is protected no matter where it goes, and it can never lose the protection provided by one of the lower levels in the encryption stack.
If sensitive data was always encrypted, then you wouldn't have to worry about it being exposed after it was read from a backup tape. And you wouldn't have to worry about it being in the clear if moved from one computer to another. And you wouldn't have to worry about it being in the clear after it was read from a database. In each case, the encrypted data would just be transparently used in place of the unencrypted data. This is the dream of data-centric security, and although we're not quite there yet, there are now technologies that let us take the first step in this direction.
One obstacle to encrypting data is that way that it's handled by applications. Databases, in particular, often use the data that's stored in them to sort data and search through it. So if a particular column in a database contains credit card numbers, for example, then it's easy to sort the data in the database by credit card number or to use a credit card number to look up other customer information.
Databases also expect a particular format for data, and as Figure 2 shows, encrypted data is often an entirely different format than the unencrypted data. In addition to being a different length, the encrypted data is also made of different characters that the decrypted data is. This combination can cause many applications to crash or return errors instead of handling the encrypted data gracefully, even with an operation as simple as storing a piece of data in a database.
Figure 2. The format of unencrypted data vs. AES-encrypted data.
Encryption can make it difficult to use database entries for searching and sorting. This is because some random data is used by an encryption algorithm when it's used to. So if you encrypt the same information twice, you'll get back two different encrypted versions of it because a different random value was used each time. This makes using encrypted database entries somewhat tricky, because the simple act of decrypting and then re-encrypting a single element can require re-sorting the entire database again. This happens, for example, if you encrypt one of the values that are used as an index for searching and sorting. So if you want to use their credit card number to look up customers in a database, encrypting the credit card number can cause problems.
The problems caused by changing the format of data as well as the use of encrypted data for searching and sorting are easily overcome by using a new encryption technology that goes by the somewhat cryptic name "FFSEM."
What is FFSEM?
"FFSEM" is an abbreviation of the more unwieldy term "Feistel finite set encryption mode," which refers to one of the building blocks that are used to create it, a Feistel network. This structure is named after its inventor, former IBM cryptographer Horst Feistel, whose early work in cryptography led to the invention of the venerable data encryption standard (DES) that became the very first standardized encryption algorithm.
FFSEM is a clever use of the structure that Feistel invented, and uses it to create an encryption algorithm that can actually keep the format of the data the same as it was before encryption. So data that is encrypted using FFSEM looks the very same as it did before it was encrypted and can easily be handled by the same applications that handle the unencrypted data. This is shown in Figure 3, where the encrypted data is different from the unencrypted data, yet still has the same format.
Figure 3. The format of unencrypted data vs. FFSEM-encrypted data.
FFSEM can be traced back to the work of cryptographers John Black and Phil Rogaway, who first suggested its use back in 2002. The technique remained relatively unknown until recently, however, when the increased security requirements of the Payment Card Industry Data Security Standard (PCI DSS) mandated additional protection of credit card numbers. But when the problems associated with encrypting credit card numbers using existing techniques started to get noticed, it led to renewed interest in the technology. It's now been submitted to the National Institute for Standards and Technology (NIST), where it's currently under consideration as a new mode of operation of the AES algorithm, the encryption algorithm that was chosen by NIST to replace the older DES encryption technology.
The most important property of FFSEM is that it preserves the format of data. This lets existing applications handle encrypted data as easily as they handle unencrypted data, and this makes implementing application-level encryption practical. This, in turn, makes it practical to protect data in a way that can dramatically reduce data loss. If a CD containing data that's encrypted with FFSEM is lost, for example, it won't be a cause for concern, because only authorized users will be able to decrypt the sensitive data. And if data that's encrypted with FFSEM is accidentally mailed to millions of customers, it won't actually reveal any sensitive information. So this new technology lets you protect data without having to change the dozens of existing applications that process the data. It's the best of both worlds.
The Bottom Line
So although data loss is now a serious problem, there are now strategies that can help dramatically reduce it as well as the technologies that can make these strategies realistic to implement. And although it may be several years until full data-centric security is available on a wide scale, the first steps towards it may be enough to help stop the massive data losses that we see today. It may be impossible to stop laptops from being lost or stolen or to stop sensitive data from being accidentally posted to public web sites, but by encrypting data in a clever way we can limit the damage caused that such incidents might cause while keeping the problems of handling encrypted data to a minimum.
About the Author
Luther Martin (firstname.lastname@example.org) is a Solution Architect at Voltage Security.