Best Practices for Managing Your Hidden Unstructured Data Risk
Data risk management is a growing challenge for today’s organizations. Data is growing exponentially, with over 1,000 Petabytes created every day at current estimates. However, understanding this data is difficult due to this extreme volume and the variety of places where the data resides. Data typically resides in a combination of enterprise systems, cloud and on-premises repositories, and employee systems and devices – as well as third-party SaaS solutions or partner ecosystems.
But there’s a bigger challenge, as 80% of enterprise data workers use daily lives within documents, spreadsheets, and other file formats. The unstructured data within these files exposes organizations to substantial potential compliance, privacy, and legal risks. From corporate secrets to the personally identifiable information (PII) of customers and employees’ social security numbers, risk resides deep within unstructured files stored in many places throughout every enterprise.
It is almost impossible for IT teams to correctly identify, categorize, and secure unstructured data at this scale. As a result, organizations need a new approach and toolkit to manage this operational unstructured data risk.
However, enterprises can take the initiative to tackle this growing challenge by following the three steps below to continuously identify, quantify, and control risk within all types of unstructured data.
Identify and Understand the Unstructured Data Risk
The first step in any governance exercise is to ensure access to all the sources of data and potential risks. This need might sound simplistic, but managing access to all the CRM, ERP, ECM, and other enterprise data sources frequently causes IT departments significant challenges. Yet, you compromise any efforts to manage unstructured data risk without access to the complete set of data sources.
However, with access to the numerous data sources within the organization, the gravity of the risk-management challenge becomes clear. The sheer volume of files, the variety of file types, and the velocity of new files entering the business instantly render this an operation that cannot be done by human hand alone. There is simply too much data.
Not having a comprehensive understanding of enterprise data brings a host of adverse outcomes that comprise data risk. If you don’t have complete control over your data, you aren’t fully compliant with data privacy regulations. This scenario also means that critical data may not be adequately secured, bringing the risk of damage to the business from data theft, corruption, or loss.
Technology can help without a doubt. But, to effectively identify and understand the risk sitting within unstructured corporate data, we need a series of distinct steps.
To better understand the risk within data, we first need to determine the type of data we are working with. Automated risk management solutions can automatically discover data properties such as file type, size, location (both the storage system and the associated hierarchy), and user permissions. The discovery phase of this process determines the basic profile of the data before passing control to the classification engine.
Data classification automates the identification of risk by using pre-trained artificial intelligence (AI) models to review and compare your unstructured data to known data types using advanced pattern matching. Classification can identify key characteristics of unstructured data, such as:
- The type of document. Whether it’s a W-2, an invoice, etc.
- Whether a document includes PII data. For example, dates of birth, social security numbers, banking information, etc.
- Which language the document contains
Ultimately, a classification engine investigates unstructured data to determine if it contains sensitive information. Even more, it does this at high speed, on large volumes of files, without the need for human intervention. Classification then identifies data that includes sensitive information and flags it for further analysis.
Many data classification solutions apply very basic risk labels to analyzed data. While knowing how much data is “low, medium, or high” risk might be interesting, it is not particularly useful. Advanced data risk management tools go far beyond this to calculate financial risk for each file or document.
This procedure uses detailed mappings of risk profiles which define the risk of data being lost or exposed for each file type, against industry-specific regulations, and about the specific context or usage of the file. This process explores how the kinds of risk manifest themselves and how they have particular risk levels in different scenarios. For example, financial data should be restricted and have a high-risk profile unless it occurs in a shareholder financial report, in which case it is less at risk.
This task is riddled with nuances and permutations that would take human operators vast amounts of time. Yet, an AI can perform this task with ease, providing organizations with detailed information to balance their financial risk with the cost to control, manage, and mitigate it.
Identifying the levels of risk hidden in enterprise unstructured data is a huge step forward for many organizations. However, the risk still exists until organizations take action to mitigate against breaches and misuse. Data risk management solutions can intelligently apply specific mitigation actions against individual files based on their risk profile. Automation using configured workflows can trigger risk management actions such as:
- Flagging or re-classification of data
- Transfer to a new storage location
- Quarantine of data for a defined period
- Redaction of sensitive content, or creation of secondary, redacted versions of files
- Change of file permissions and/or provisioning
- Archiving or deletion of data and files
Actions can be applied automatically and at scale at the file level, but of equal importance is the analysis of the overall data risk profile of the organization. These actions can identify any widespread issues around security data management policies. Additionally, they may be used for infrastructure-focused storage and security decisions to improve enterprise-wide data security and protect data privacy.
It’s also worth noting that risk management is not a “one and done” project. Rather, it is a continual exercise in risk discovery, classification, quantification, and management. Any enterprise’s oversight and control of risk need continual re-assessment and action.
Proactive Control is Critical
Data risk is traditionally defined as the exposure to loss of value or reputation caused by an organization’s inability to secure its’ data assets. But with the rapid evolution of data protection regulations like the GDPR and CCPA, a more relevant definition of data risk needs to include exposure due to data loss, business downtime, regulation non-compliance, and loss of reputation due to poor data security and data privacy practices.
In today’s world, data risk is financial risk.
For the IT leaders and data owners tasked with data risk management, the focus is now shifting from gathering information and creating reports to continual governance, oversight, and proactive lowering of risk. Automated data risk management solutions provide comprehensive coverage in these areas, delivering the control and confidence for modern enterprises. Proactive data risk management is not a “nice to have” in 2022; it is necessary, makes commercial sense, and demands immediate attention.
So, what are you waiting for?