Information Security Today Home

New Books

Security for Service Oriented Architectures by Walter Williams; ISBN 978-1-4665-8402-0
Official (ISC)2 Guide to the CCFP CBK by Peter Stephenson; ISBN 978-1-4822-6247-6
The Frugal CISO: Using Innovation and Smart Approaches to Maximize Your Security Posture by Kerry Ann Anderson; ISBN 9781482220070
PCI Compliance: The Definitive Guide by Abhay Bhargav; ISBN 9781439887400
Security without Obscurity: A Guide to Confidentiality, Authentication, and Integrity by J.J. Stapleton; ISBN 9781466592148
Conducting Network Penetration and Espionage in a Global Environment by Bruce Middleton; ISBN 9781482206470

Data Classification

By J.J. Stapleton

Data classification is the practice of assigning information into predefined groups where each group has a common risk and corresponding security controls. This allows common control implementation and documentation that can be audited. The auditor can use a checklist approach to verify that the appropriate data protection controls are in place based on data classification requirements. The U.S. federal government, Department of Defense (DoD), and intelligence agencies have established their own data classification programs addressing national security needs.

For example, the Bell-La Padula (BLP) [10] security model formalizes security levels as consisting of the pair: (clearance, set of categories). Clearances are predefined groups assigned to individuals and objects, such as unclassified, confidential, secret, and top secret groups. Categories are organizational in nature, denoting membership in an organization, division, department, or office. Thus an individual might have a secret clearance and belong to categories A, B, and C, whereas another individual might have top-secret clearance and belong to categories A and C. In general, access is permitted only when the individual's security level is compatible with the object's security level. However, the BLP model does not address data integrity, and the security levels are a bit too rigid for the private sector.

Data Groups

Regarding the security levels, most organizations have similar but not necessarily the same classification groups as do federal governments, much less each other. The similarity is partially due to replication of what seems to work, plus the simple fact that many private sector security professionals have former military backgrounds. In general, too many data classification groups can overly complicate operational controls, and too few groups do not provide a reasonable protection. Any organization has at minimum two groups: public and nonpublic information. Examples of public information might include uniform resource locators (URL) to access Web sites for online services, telephone numbers to call service agents, and physical addresses of retail stores, bank branches, or offices. Nonpublic information is anything else not intended for general public consumption. However, "nonpublic" information is too broad a category to be meaningful, as the source of the information must be considered. At a minimum, organizations have at least three data origins: corporate, employees, and customers.

  • Corporate data are information about the organization, such as financial data, intellectual property, strategic plans, policy, practices, procedures, and offices. Some data are accessible by all employees, and some is limited to management, senior management, or specific job roles. Other data are shared with outside groups, such as government agencies, business partners, or service providers. Government agencies might include the Internal Revenue Service (IRS), the Securities and Exchange Commission (SEC), and the Federal Financial Institutions Examination Council (FFIEC), as examples. There is often a difference between raw data and processed data, where the latter have been analyzed, sanitized, or summarized for external consumption.
  • Employee data are information about individuals who work for the organization, which includes health-care informatics, salary, bonuses, benefits, and family and contact information. Employees include full-time, part-time, contractors, and con-sultants. Typically an organization's human resources (HR) department handles much of the employee financial and family data, and benefits can overlap with HR and external health-care providers. Some of the employee data are addressed by privacy laws. Employees also share their employment information such as the employer name, office address, e-mail address, and telephone numbers with friends, family, businesses, and government. Social media networks have also increased employee exposure to outsiders beyond family and friends.
  • Customer data are information about companies or individuals who provide revenue to the organization, such as account, transactional, and contact information. Customers are purchasers of goods or services provided by the organization, and can also be employees. Much of the customer data are addressed by privacy laws. Customers typically share some of their data, such as account and contact information, with other providers.

However, not all of the data elements represent the same risk. Risk is often defined as the impact of the vulnerability times the probability of occurrence. For data confidentiality, the related vulnerability is unauthorized disclosure of information, but the impact depends on the sensitivity of each data element. Further, the probability of occurrence is dependent on the controls relating to the data state, whether data in transit versus data in process versus data in storage. Regarding impact, disclosure of sensitive information (or loss of data integrity) can be summarized as follows:

  • Revenue loss: This may occur when the exploited vulnerability results in direct loss of funds including stealing physical money, illegitimate money transfers, or fraudulent payments such as counterfeit money, checks, credit cards, or debit cards. For example, a disclosed safe combination allows money to be stolen, a disclosed password allows wire transfers, or a dis-closed PIN allows fraudulent withdrawals.
  • Resource loss: This may occur when the exploited vulnerability adversely affects the infrastructure, resulting in loss of services or personnel, resulting in unanticipated costs to rec-ognize, diagnose, monitor, and fix problems. For example, a disclosed password allows routers, firewalls, servers, or other network appliances to be compromised.
  • Reputational loss: This may occur when the exploitation becomes public knowledge, affecting the status and confidence of the organization. The collateral damage to the organization might include losing new or existing customers, affecting business partners or business deals, or lowering public stock prices. Such disclosures might be from printed hard copy or electronic formats such as reports, e-mails, or attachments.
  • Legal liability: This may occur when the exploitation affects customers or business partners who initiate a lawsuit resulting in unanticipated costs in legal fees or hours. The discovery process alone may require hundreds to thousands of hours. Lawsuits sometimes require assistance from outside subject matter experts or other legal firms. And once the case goes to court, there may be additional settlement fees.

Corporations often use the terms proprietary and confidential data to protect their assets. Proprietary data are then information that has significant value to the corporation but that has limited distribution to employees, and may include distribution to outside groups such as existing or new business partners or government agencies. However, proprietary data tends to be ephemeral, with a shorter shelf life than confidential data. For example, plans for a new application might be proprietary until such time as the application has been announced and publicly released. Conversely, application algorithms or food recipes, which are more valuable for a longer period, might be confidential. Thus proprietary data and confidential data deserve their own classifications.

As discussed in Chapter 3 (Authentication), there is a significant distinction between authentication data and other nonpublic information. In general, unauthorized disclosure of authentication data allows illicit access to corporate data, employee data, or customer data. Therefore, authentication data deserves its own classification. Further, since authentication data are inextricably linked to user identifiers (ID) such as account numbers, user names, nicknames, and similar data, identification data are distinct from other nonpublic information. However, authentication data needs stronger controls than identification data; thus identification data deserves its own classification.

Customer information is not necessarily unique to each organization, as the same individual is likely a customer of more than one service organization. For example, cardholders use the same credit card at multiple merchants; bank customers may have accounts at more than one financial institution; and organizations have many business partners. However, customer information includes identifiers, such as account numbers, and authentication data, such as passwords, so these data elements would be included in identification data and authentication data. Other customer information can be included as confidential data. However, for audit purposes, the ability to "tag" data as customer information regardless of its data classification is beneficial.

As discussed in Chapter 7 (Key Management), cryptography keys must be securely administered throughout their life cycle consisting of key generation, key distribution, key usage, key backup and recovery, key revocation, key termination, and possibly key archive. Therefore, cryptographic keys deserve their own classification.

Table 2.1 presents a list of possible categories. Handling cryptography data with the same controls as authentication data puts keys at risk of compromise. An individual's passwords or PIN must be known by at least the person being authenticated, whereas symmetric and asymmetric private keys should never be known by anyone; otherwise, the key is considered to be compromised. Further, the password or PIN is typically stored using one-way functions to avoid unauthorized disclosure; however, keys must be recoverable and cannot be stored in such a manner, as this would make them useless. While it is true that both authentication data and keys must be encrypted during transmission, this only gives a one-in-three common attribute; thus, again, cryptography data and authentication data must have separate classification groups.

Managing authentication data the same as identification data may not be practical. Identification data are used by applications as the primary search field for employee and customer records and to process customer data. While it is technically possible to store identification data using one-way functions, the one-way result is essentially an alias for the original identifier. For example, if the identifier is an e-mail address or account number and the one-way result is an arbitrary string of alphanumeric characters, the new string becomes a token for the original identifier. Refer to Section 2.3.3 (Data Tokenization) for details. In general, if the identifier without sufficient authentication can be used to perpetrate identity theft or fraud, then it must be protected. Extreme care must be taken to avoid situations where the same data element is reused sometimes as an identifier and other times as authentication data. Regardless, it is prudent to encrypt identification data in transit similar to authentication data and cryptography data.

As discussed previously, many of the controls for confidential data are the same as for cryptography, authentication, and identification data; however, there are significant differences. Attempting to manage all confidential data at higher security levels such as dual control with split knowledge is problematic and overly expensive, yet lowering security controls for cryptographic keys and authentication data to the lowest common denominator puts an organization at significant risk. The control differences between identification data and confidential data are also different, such as monitoring using data loss prevention (DLP), data pattern recognition, and data sanitization methods.

Data Tagging

Data elements have attributes that include a data field name, format, and length. For example, the data field names "Account Number," "Acct Num," or "Card No" might mean the same data field across different systems. The data field format might be "N" for numeric, but the lengths might be different, such as "16" for a credit card number, "19" for the maximum card number length per ISO 7812 [113], or "VAR" for a variable-length field. Maintaining consistency across multiple applications, platforms, operating systems, and programming languages is a constant challenge to developers. Data tagging can be another managed field attribute or an explicit tag such as a database column or an XML field.

Similar to data field attributes and analogous to the BLP categories, data can be "tagged" to reflect its security classification and other associated information. For example, the PCI DSS [190] defines two data types: cardholder data and sensitive authentication data. Cardholder data are defined as Primary Account Number (PAN), Cardholder Name, Service Code, and Expiration Date. Data fields that contain PCI data can be tagged as "PCI" to manage them appropriately. However, data fields that contain the PAN might contain other account numbers that are not within the PCI scope, so tagging a field as "PCI" can be misleading if not managed properly. Likewise, tagging the field as "identification" or "confidential" would not necessarily be sufficient for PCI governance. Sensitive authentication data (S40) are defined as Full Magnetic Stripe Data, CAV2/CVC2/CVV2/CID, and PIN/PIN Block. Ironically PCI DSS v2.0 mentions passwords in thirty of its requirements and testing procedures, but it does not include passwords in sensitive authentication data, since they are not part of the regular payment system. However, cardholders often log onto online systems to purchase goods and services using their credit cards, so arguably customer passwords should be included in the PCI-sensitive authentication data. Regardless, an authentication data field might contain a wide variety of authentication data elements that might not always relate to PCI, so again the ability to distinguish PCI data from other authentication data elements would be beneficial for PCI governance.

Another interesting data tag might be "privacy" to recognize data elements that fall under international, federal, or state privacy laws. However, since many data fields are applicable to many regulations, multiple tags would be appropriate. For example, an account number is governed by several regulations including privacy laws, PCI (as discussed previously), Gramm-Leach-Bliley Act (GLB), and others. If the individual data fields are appropriately tagged, then resources that transmit, process, or store such data inherit the risk and, correspondingly, the controls to reduce risk. For example, an application that processes PCI data are subject to PCI compliance. Alternatively, attempting to manage tagged resources without tagging data fields is problematic, as changes to inputs or outputs affect the resources.

About the Author

Jeff Stapleton is author of Security without Obscurity: A Guide to Confidentiality, Authentication, and Integrity. The traditional view of information security includes three cornerstones: confidentiality, integrity, and availability; however the author asserts authentication is the third keystone. As the field has become more complex, novices and professionals need a reliable reference that outlines the basics. Rather than focusing on compliance or policies and procedures, Jeff takes a top-down approach. Providing insight from the Jeff's experience developing dozens of standards, the provides an understanding of how to approach information security from the bedrock principles of confidentiality, integrity, and authentication.

Subscribe to Information Security Today

Powered by VerticalResponse

© Copyright 2014 Auerbach Publications