Openness vs. Privacy: The Important Role Data Redaction Plays In Data Privacy

Data privacy is a hot topic for all enterprises, both private and public, and data redaction often has an important role in these efforts. That said, redaction is a term that some IT organizations have never heard of. Even if they have, they would probably be hard pressed to define it or explain its importance to their organizations, but that situation is changing quickly as organizations realize that redaction offers a solution that balances the need for data openness with the need for data pri

David Hill

June 17, 2010

7 Min Read
Network Computing logo

Most people know that data privacy is a hot topic for all enterprises, both private and public, and data redaction often has an important role in these efforts. That said, redaction is a term that some IT organizations have never heard of. Even if they have, they would probably be hard pressed to define it or explain its importance to their organizations. But that situation is changing quickly as organizations realize that redaction offers a solution that balances the need for data openness with the need for data privacy.

First, a disclosure: This article is a modified version of an introduction commissioned for a vendor white paper. However, I feel that redaction is important for a larger audience, this piece is a general piece, and vendors are touched upon only in the "Competitive Landscape" section.

Understand first that the complexities in the "data explosion era" are not only about storing more data, but also about managing its use. Critical to proper information management is not only ensuring that only the right people have access to that information, but that they are able to use it for legitimate purposes. Potential conflicts exist because ensuring the availability and openness of information to which some are entitled necessitates restrictions preventing access others who are not entitled because of privacy and confidentiality requirements.

Take two illustrations from the U.S. Government: The intent of the Freedom of Information Act (FOIA) is to make government organizations more accountable for their actions by making information about those actions more available on demand. On the other hand, the U.S. government's Health Insurance Portability and Accountability Act (HIPAA) is designed to prevent the unauthorized disclosure of individuals' personal healthcare information.  However, though FOIA is about openness, individuals are not entitled to access sensitive personal information or national security information embedded in documents that could otherwise be made publicly available. On the HIPAA side, physicians or other healthcare professionals with a legitimate need to know should not be prevented from accessing electronic health records for valid and beneficial purposes, but do not need to see billing information that is unrelated to providing care for patients.

So what can technology do to enable or enhance the complex balancing act between openness and privacy? The first line of defense consists of access controls that manage who is entitled to see what information. One possible solution is in the use of data loss prevention (DLP) software, which can restrict the transmission (such as through e-mail) of sensitive information from one authorized user to another unauthorized user. Encryption is also an important means for ensuring data confidentiality. But these approaches, with some exceptions, tend to be blunt instruments which too often restrict access more than is really necessary. For example, DLP solutions can reject the transmission of an email if a document attached to it contains a name associated with a social security number. That may sound sensible, but what if the intended recipient needs other information contained in that document for legitimate purposes and has no interest in or need for just the name and social security number content.Here's where the value of redaction becomes clear. So what is redaction? Redaction has more than one meaning but here we are concerned with the business or legal definition in which redaction is the process of removing sensitive information, usually through the liberal use of black marking pens or whiteout fluid for paper documents and their electronic equivalents for digital documents.

Though by definition redaction does remove sensitive information, it is about not "throwing the information baby out with the privileged information bathwater." For technologically-enabled redaction to work properly, an IT solution must answer the question of how to "black out" or obscure confidential information while retaining non-confidential information.

So why is redaction so important? Consider two of the primary benefits it can provide:

  • Meet governmental regulatory compliance requirements, including those invoked in data privacy laws, without restricting the legitimate use of non-confidential information that is otherwise commingled with confidential information -- thus avoiding sanctions, penalties and costs associated with addressing a data breach after the fact, or embarrassing public exposure.

  • Share information with customers, partners, and other third parties without having to fear that they may be inappropriately exposed to sensitive information. This enables people to get the information they need to do their jobs or for other proper purposes. Note that this information is not necessarily subject to regulatory compliance but it can encompass data an enterprise wants to share in only a limited form, such as customer order or financial data, and intellectual property.

Automated software-based redaction must perform the physical black pen redaction (or whiteout) of sensitive text in a document.  For example, during World War II, soldiers' letters to home were censored in order to prevent inadvertently revealing military intelligence. This censorship was performed manually and very primitively as compared to today's requirements to manage redaction for vast volumes of ESI (electronically stored information).  These early physical processes did not scale and posed additional risk of inadvertent admissions, among other shortcomings.

In order to work properly, a modern software-based redaction solution must have characteristics that include the following:

  • No data may be lost -- even though a redacted copy needs to be made available as appropriate with the sensitive information removed, the original un-redacted version needs to be saved in its original form in a secure place or be able to be reconstituted with the proper links.

  • The redactions must be justifiable -- Rather than simply masking text with no marking, a generic label, such as the words Social Security Number for the redacted text may be inserted for readability.  This not only improves the ability to read the document, but, in effect, provides the underlying reason for the redaction (although a link might be necessary for further explanation).

  • The solution must scale to large numbers of documents -- To address the growing amount of electronically stored information, the solution must be capable of automating the process to tag suggested redactions, but also still allow for manual review (to accept or reject suggested redactions) as well as to make further redactions deemed appropriate.  This approach is designed to deliver the highest rates of accuracy.

These are only a sampling of the general characteristics that software-based redaction has to have.

Competitive Landscape
A number of companies offer standalone software-based data redaction solutions, including, but not necessarily limited to the following: Appligent Document Solutions (Redax), CSI (Intellidact), Extract Systems (IDShield), EDAC Systems, Inc. (VeriDact), IBM (Optim Data Redaction), Informative Graphics (Redact-It), and OnStream Systems (RapidRedact). With the exception of IBM, the players in the data redaction market are smaller companies. That could change though if other larger vendors see an advantage in either acquiring one of the smaller companies or deciding to develop the technology on its own.Mesabi Musings
Many companies assume that software redaction is a nice-to-have rather than a must-have solution. Sadly, those organizations are living on borrowed time and the clock is ticking. Why? In the United States, the December 1, 2006 modifications to the Federal Rules of Civil Procedure altered how ESI is managed in eDiscovery processes. That, followed by data privacy laws in many U.S. states and in the European Union, is rapidly changing the attitudes of organizations (including enterprises and governmental entities) which mistakenly believed that they actually "own" all their data and have the total ability to grant or deny access or use of selected information at all times. In short, that era has ended.

And as part of that process, enterprises must now add to the basic characteristics of data protection -- preservation, availability, responsiveness, and confidentiality -- the what and where of data. That means that they must know "what" information they have and "where" it is. Our concern from a redaction perspective is the "what" characteristic. Knowing "what" data you have in a generic sense, such as an e-mail repository, is not enough. Organizations must know at a fine-grained level -- such as the ability to look at the content inside an individual e-mail or attachment -- exactly what they have in order to meet compliance and eDiscovery requirements. (Of course, they also have to know when to redact.) This is the level where the benefits of redaction shine.

Can enterprises ignore the inevitable? Of course, but the old "ignorance of the law is no excuse" chestnut applies liberally and likely painfully. In other words, to borrow from the old Fram oil filter ads "you can pay now or you can pay later." Studied ignorance is also unnecessary, given the number of redaction solutions available. With that in mind, we suggest that this is a great time for organizations without software redaction solutions to examine and consider them.

About the Author(s)

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights