It is inefficient to treat all data the same way when implementing protection for the data assets of an organization. Securing everything with few or no safeguards (controls) means sensitive data might get exposed. Securing everything at a high-security level is expensive and restricts access to noncritical data.

Data is the oil of the 21st century. Information, knowledge, and data are all closely related concepts, but each has its own role in relation to the other. In this article I will treat them as one. [1].

Why is the classification of data important?

It is inefficient to treat all data the same way when implementing protection for the data assets of an organization. Securing everything with few or no safeguards (controls) means sensitive data might get exposed. Securing everything at a high-security level is expensive and restricts access to noncritical data.

The classification of data is helpful to determine how much effort, resources, and money is allocated to protect data and control access to it. It is important to classify the protection level of data with the input from legal, regulatory, business, and technical experts.

Categorization is the process of organizing objects into groups with similarities. The similarities can be value, sensitivity, risk, vulnerability, or damage upon loss. Classification is used to provide mechanisms for processing, storing, or transferring data and also defines how data is removed or destroyed.

Declassification is required once an object no longer matches the similarities of other objects in the same category. Without declassification, resources are wasted.

The benefits of using data classification schemes:

  • It assists in identifying assets that are most critical and valuable to an organization
  • It demonstrates an organizations commitment to protecting valuable resources and assets
  • It helps in the selection of protection mechanisms
  • It is often required for regulatory and legal compliance
  • It assists in defining access levels for authorized users

picture representing data

The two commonly used classification schemes are government/military and commercial business/private sector classification levels.

Government and Military Classification Scheme

There are five levels of classification in this scheme. This scheme varies from country to country. [2]

High Top Secret
  Secret
  Confidential
  Sensitive but unclassified
Low Unclassified

Top Secret is the highest level of classification. The disclosure of top-secret data will cause serious damage to national security. This category of data/information is only provided on a need-to-know basis to users. Even users with top-secret clearance will not get access to data until the user has to know.

Secret is used for data of a restricted category. Disclosure of secret data will have a significant effect on national security.

Confidential classification is used for all data between secret and sensitive but unclassified. The disclosure of confidential data will have a serious effect on national security.

Sensitive But Unclassified is used for data that is for internal use or office use only. SBU is often used for data that violates the privacy rights of individuals.

Unclassified is data that is neither sensitive nor classified.

Revealing the classification label of classified data is already considered a violation of the protection level. Top Secret, Secret and Confidential data are therefore commonly called classified data.

Commercial Business And Private Sector Classification Levels

There are only 3 levels in this classification scheme. The scheme can be adapted to a specific use-case. The University of Berkley is, for example, categorizing data in 4 levels [3].

High Confidential/Private
  Sensitive
Low Public

Confidential is used for data that is extremely sensitive and for internal use only. A significant negative impact for the organization will occur upon disclosure of data in this category. Another term for this level of classification is proprietary. If proprietary data is lost, it can have an effect on the competitive advantage of a company.

Private is used for data that is of a private or personal nature. It is intended for internal use only. The disclosure of private data will have a negative impact on the company or individuals.

Sensitive is used for data that is not intended for the public.

Public is the lowest classification level. It is used for all data that does not match the higher levels.

Ownership of Data

Ownership has to be considered in the context of data classification. It can override all other forms of access control. Any operating system or application where files or other types of objects can be assigned to an owner has to be taken into consideration and a security policy has to be enforced to help categorize the content. An owner has full capabilities and privileges over an object they own. The ability to take ownership is often only granted to the most powerful user in an operating system (eg Linux root user or Windows Administrator) or to the creator of an object/file (Goole Docs).