Get a free assessment and gain valuable insights into your unstructured data to increase efficiency, minimize risk, and cut costs across your organization.
Every business has at least one thing in common—a lot of data. No matter what industry you are in, the fact of the matter is that you handle information on a daily basis, accumulating records and files of many different types. One of the trickier categories of data to deal with is unstructured data. Unlike structured data, information that is formatted and contained in tools like databases, unstructured data is information that is often found in individual files, stored in a variety of locations. Due to this, unstructured data is difficult to search, organize, and analyze. For example, unstructured data can look like:
- Live chat records
- Customer-generated content
- Ebooks and whitepapers
- Email correspondence
- Internal communications
- Media and multimedia (images, audio, video)
- Medical records
- Mobile data
- Social media content
- Text files
- Web server logs
- Website content
Unstructured data encompasses many different important types of information that are often very valuable and prone to security risks. The process of sensitive data discovery, where you search and identify information that should be restricted from unauthorized access, is vital. Without it, your business could run afoul of governmental regulations as well as compromising the privacy and security of your stakeholders. And then, once identified, how do you sort and organize the unstructured data in order to continue to know what is sensitive data and what is not?
Data classification levels are the answer to this question. Using these practices, you can establish an appropriate system and mitigate much of the risk of storing unstructured data.
Data classification is a way of defining and categorizing unstructured data for a business. Rather than storing files haphazardly in an unsecured way, data classification gives you the framework to sort, organize, and store unstructured data into an appropriate system.
Imagine that you have just moved from one apartment to another. When packing, you didn’t label or organize any of the boxes. Once you’ve arrived at your new apartment, it’s time to unpack. But how do you know which boxes to put in which room? Not only that, but how can you delegate the appropriate boxes to the right person? After all, setting up your kitchen to work for you is important—if everyone is pitching and helping to unpack, your brother might end up with a box full of cooking utensils that he won’t know how to put in a new space to suit your workflow.
There are two problems that surface when you start looking at data classification:
- How should you “pack” or store data? What data should be organized together?
- Who should be allowed to access each “box” or be able to use which pieces of data?
Data classification can help answer these questions, giving greater access to and control over unstructured data.
What Are Data Classification Standards?
Data classification standards are the definitions of the data categories that a business will use. There are different sets of standards that can be set, depending on the business and associated data that needs to be classified. For example, one of the most common standards of classification is based on how sensitive the data is—essentially, who should (or shouldn’t) be allowed to access it. Classification standards based on sensitivity levels often use three or four levels.
What Are the 3 Data Classification Levels?
When data is classified with three levels, typically this is organized into:
- Low sensitivity data: This is information that is meant for anyone to access and use. For example, your business’ social media pages are filled with low sensitivity data. The public is welcome to engage and consume this level of information as there is nothing being shared that is a security threat.
- Medium sensitivity data: Data that falls into this category should be considered for internal use only. While it doesn’t include highly confidential information, it still encompasses data that could be potentially harmful for unauthorized people to access. For example, inter-office emails or memos could fall into this category.
- High sensitivity data: This classification level encompasses business-critical data and customer-specific details. If this data were compromised, it could have serious business impacts. High sensitivity data could include:
- Personal Identifiable Information (PII) – any data that identifies an individual. For example, their name, address, social security number, or phone number.
- Protected Health Information (PHI) – any data that contains identifiable health information. For example, your medical records or billing information between you and your physician.
- Nonpublic Personal Information (NPI) – any data with personally identifiable financial information. For example, a bank account number or how much someone’s house is worth.
- Material Non-Public Information (MNPI) – any company data that has not been released to the public. For example, information like upcoming acquisitions or mergers that could also influence share price.
- Confidential, Regulated, and High-Risk Business Information: Any data that is particularly sensitive to a business. For example, a patent or other proprietary intellectual property. This can overlap with the other categories, as it includes similar data.
What Are the 4 Data Classification Levels?
Another way to establish data classification standards is to use four levels of classification:
- Public data: Just like low sensitivity data, this classification level is used for information that can be viewed, accessed and used by anyone.
- Internal data: This data level is specifically for all the information that is used internally. It correlates to the medium sensitivity level.
- Confidential data: Typically, this data is meant for internal use and is often limited to specific teams, individuals, or departments due to the confidential nature. This classification level can contain either medium or high sensitivity data.
- Restricted data: This is the highest level of sensitivity, and this info is carefully controlled. Access is only given to individuals who need the data to do their jobs.
What Is a Data Classification Framework?
A data classification framework is another way to describe a specific approach to setting data classification levels. This is the same thing as data classification standards. There are data classification frameworks that are used for different industries or by entities like the United States government.
There are many data classification examples due to the importance of this practice. Three examples of data classification examples are:
GDPR data classification levels
The European Union General Data Protection Regulation (GDPR) came into effect in 2018, impacting privacy and data protection practices globally. Data classification with GDPR uses the four data classification levels: public data, internal data, confidential data, and restricted data. In addition to using these levels, GDPR requires companies to delete any data that is unnecessary or not being used, so it is vital to understand what types of unstructured data your business actually possesses.
US government data classification levels
An excellent example of data classification levels, the U.S. government uses a three level method. The three levels are:
- Confidential – the lowest level of sensitivity, although this is still defined as information that would “damage” national security if the public were made aware of these details.
- Secret – the second highest classification, the unauthorized release of these details would cause “serious damage” to national security.
- Top secret – the highest level of sensitive information that would cause “exceptionally grave danger” if unauthorized parties accessed and used this data.
As you can see, these levels correlate to the three data classification levels discussed above, however, the sensitivity level of the data is higher overall.
HIPAA data classification levels
The Health Insurance Portability and Accountability Act (HIPAA) is legislation aimed at protecting individual’s health information. Unlike the other examples, HIPAA classification guidelines don’t have specific levels established. Rather, HIPAA requires grouping data according to the level of sensitivity and it is up to the entity holding the data to determine the classification levels they will use.
The most important reason to have a data classification policy is to ensure data security and privacy. After all, these aspects will suffer if time and attention is not dedicated to getting your classification levels right. If you don’t know where your data lives or how it needs to be protected, these are major risks. These issues lead to risks like:
- Data being lost in data silos
- Mishandling and accessing of data by unauthorized people
- Incurring fines and penalties for improper data handling and storage
- Potential lawsuits due to breached customer data
Assess your unstructured data risk
Get a free data sensitivity assessment and uncover the risks that may be lurking within your unstructured data.
The best place to start when creating a data classification policy is by asking a few questions:
- What types of data are you collecting, processing, and storing?
Every industry has specific regulations around the data that is used. It’s important to take stock of which regulations you will have to comply with in order to set a data classification policy that is right for you.
- What data do you already have and who will be conducting the data discovery process?
Before you start setting up your policy, you need to know what information is already held by your company and who will be in charge of discovering your current data situation.
- Who should be able to access specific information?
Think through how sensitive your data is and use the categories listed above to guide your process. Remember to consider whether each piece of data is intended for public, internal, confidential, or restricted access.
With these questions answered, it’s time to start developing rules for consistent and reliable data classification. Luckily, there are data classification tools that can assist you with this process.
DryvIQ empowers companies with the tools they need to classify, migrate, and manage unstructured data. We specialize in helping you uncover hidden risks and sensitive data with our advanced A.I.-driven platform that can analyze, classify, label, and catalog unstructured data. Not only that, but this work will be accomplished with accuracy, speed, and scale. Not sure how this could work for your business? Our free scanning assessment will provide content insights around data discovery and classification, sensitivity findings with vulnerability and risk by category, as well as recommendations for next steps. Get started today with DryvIQ to discover and secure your sensitive unstructured data.