Discover Your Unstructured Data, Minimize Your Business Risk


Few risks in business are so overwhelming that they cannot be mitigated – or avoided entirely – if you see them coming. Unfortunately, the truly insidious threats are the unknown ones operating outside of your ability to plan.

This logic applies to everything from shuttle launches to scuba diving, but there are few places where it seems so applicable as the world of data management. The stakes might not be as high as an explosion or a tragic accident, but compliance breaches and the exposure of corporate secrets can be devastating in their own way.

There are several reasons why data suffers so severely from the spread of unknown risk, from the increasing number of data protection laws to the sheer quantity of information that enterprises generate and store. However, the most prominent of these reasons is that around 80% of the data stored on organizations’ servers is unstructured — and unstructured data is incredibly challenging to manage and protect.

Unstructured data comprises the mass of documents, spreadsheets, images, presentation decks, and other files that workers use daily. In essence, it’s everything that doesn’t fit neatly into the fields and tables of a database or dedicated application. Often, this data lacks a clear owner, has no audit trail of access and edits, and is not stored and secured appropriately.

The absence of structure and consistent formatting has historically rendered unstructured data extremely difficult – verging on impossible – to search and classify efficiently. As a result, it represents a vast unknown even though it resides within the boundaries of an organization’s network.

Fortunately, advanced data management platforms and developments in artificial intelligence (AI) and machine learning (ML) mean that organizations can finally shine a light on unstructured data and eliminate as many unknowns as possible during the discovery process.

Discovering Unstructured Data’s Unknown Dangers

The motivation to discover unstructured data extends far beyond tidying up the network for the sake of the IT team. Many risks can lie in unstructured data, and until organizations gain insight into this data, there’s very little they can do to tackle it.

Perhaps the most prominent and commonplace risk associated with unstructured data is related to regulations protecting the data of customers stored by enterprises. From Europe’s GDPR to California’s CCPA, there is an ever-increasing number of laws detailing how customer and partner data must be stored and processed.

Keeping tabs on this is manageable when it’s confined to customer databases and invoicing software (i.e., when it’s structured data). However, things change when personally identifiable information (PII) such as addresses or banking details are shared through email or copied into a document. At that point, the data becomes unstructured and thus more susceptible to being stored in inappropriate areas of an organization’s network or retained for longer than is allowed.

The consequences of this can be enormous. For example, the EU’s GDPR legislation allows it to fine companies up to 4% of their annual revenue if they breach these rules, with the CCPA issuing penalties of up to $700 per customer. This number can rapidly stack up for large organizations. But, of course, the consequences of a breach often reach beyond simple fines. Reputational damage and business disruption usually come hand-in-hand with penalties.

Beyond more conventional breaches, there are also questions of organizational security. For example, most businesses have operational secrets that range from financial figures to reports on products still in the prototyping phase. Naturally, this information needs to be kept secure at all costs. Still, if an organization doesn’t have visibility into its unstructured data, it’s hard to confidently declare that this is indeed the case.

Beyond the Basics of Data Discovery

Uncovering the documents and fragments of data that hold PII or corporate secrets depends on the ability to look beyond the basics of an organization’s files. For example, while it’s helpful to see a breakdown of the file types stored in various network elements, this doesn’t provide the level of insight needed to say the data is no longer unknown.

Moving beyond the basics is where the incredible advances in AI and ML are delivering value today. AI-powered data management platforms can provide much more detailed – and much more actionable – results.

AI classification of unstructured data can deliver visibility into:

  • Document type (resume, W-2, invoice, standard government forms)
  • Which users have access to the document
  • Whether the document contains PII
  • The language used in a document
  • If a document contains sensitive automation

Once all these attributes are discovered and classified, it becomes possible to act on this information with an automated data governance tool. The unknowns have become known, and if the organization finds any risks, it can begin planning to mitigate them.

Ongoing, Proactive Discovery

There is a tendency for organizations to view data discovery as a project. Like a fire drill or office re-shuffle, it’s a task they complete now and then, but once it’s complete, they forget about it until the next time it appears on the calendar. Or there’s a fire.

Similarly, it isn’t just a one-and-done process to discover unstructured data either. It’s certainly not something an organization wants to be rushing in the wake of a data breach or discovery lawsuit. Instead, organizations need to tackle discovery proactively and continuously.

Advanced data management platforms with deep AI capabilities can run in the background 24/7 to automatically monitor new files added to the network for sensitive data. In essence, it prevents an organization from dealing with unknown risks, as the AI can automatically spot any issues before they can cause harm.

This task is not only vitally important but also uniquely suited to the abilities of powerful AI platforms. The growth of regulations and the complexity of modern data management means that discovering unstructured data is a task beyond any individual — or even an entire IT team.

The need is clear: modern organizations have an incredible volume and velocity of data. Estimates put the typical amount of data in an average enterprise’s various systems at upwards of 300TB, and this is growing by more than 40% each year. There is simply no practical way for an organization of any scale to effectively shine a light on all its unstructured data.

Minimizing Risk Begins with Discovering Unstructured Data

As much as organizations might not think about the potential security breaches and compliance issues lurking in their network, ignorance is no defense against hackers or auditors. Instead, information is the most effective tool for tackling risk, the only way to gain this information is for an enterprise to intelligently discover unstructured data.

The only effective tools for discovery on any practical scale lie in modern solutions that leverage AI and other innovative technologies to monitor and discover unstructured data risk continuously. As a result, advanced data management platforms are rapidly becoming a must-have asset for organizations looking to avoid the perils of the unknown.

Icon D DryvIQ logo