Skip to main content

Discovery and classification: essentials for data governance

Michael Queenan at Nephos Technologies outlines what you need to know about data discovery and classification in order to achieve good governance

Effective data governance is fast becoming a prerequisite for any business focused on its potential to transform processes, decision making and performance. Without it, the integrity and accuracy of data cannot be guaranteed, strategic missteps are more likely and businesses are at greater risk of falling foul of compliance regulations, such as GDPR and CCPA.

In the rush to exploit data assets and minimise time to value, however, many businesses are failing to lay the foundations for success. In particular, the approach taken to data discovery and classification are quite literally make or break for investments in data governance.

No matter what the objectives behind a data governance strategy or project, organisations always need to begin by considering this vital question: How can you efficiently use, manage or protect your data assets unless you know what and where they are?

Without doubt, this is a logical starting point. But it is surprising how many organisations lose track of these details. To an extent, the growth in outsourced cloud computing, ‘as-a-Service’ tools and the general complexity of modern technology infrastructure is making this task more difficult.

The result is, however, that large companies in particular are at constant risk of losing track of valuable datasets, where they reside, who has access to them and how they should be protected. When that happens, it is much harder to focus on good governance, let alone derive business value from data.

Knowledge is power

The data discovery process is an essential first step on the road to effective governance. This requires an effective software solution with the ability to connect to data sources of any type and to identify data assets wherever they reside.

Without this capability, data governance projects can be seriously undermined from the outset. For instance, a security or privacy breach relating to an unsecured data asset brings a range of potentially serious governance and regulatory implications – even though the risks are avoidable.

Just as important are the data classification processes used by organisations to accurately identify each asset, and as a result, apply the appropriate levels of protection. Ideally, organisations will apply intuitive classification types based on well understood and defined rule sets, such as GDPR and CCPA sensitivity.

The same goes for correct identification of Personal Information (PI) and Personal Identifiable Information (PII).This is often a point at which the process goes wrong, mainly because organisations don’t understand the difference between the two.

PI data doesn’t identify a specific person and isn’t generally responsible for governance violations, yet some discovery tools fail to filter this data out. This can be a major headache for users, who may often have no choice other than to use a manual process to specifically isolate more sensitive PII data.

In technology terms, there are also major differences between the tools used to identify PI and PII datasets. An analogy is the difference between an Uzi and a sniper rifle – or in other words – scattergun vs precision. Getting these investments wrong can be both costly and time consuming.

Targeting cyber-security and data protection spend

It’s only when this work has been done, that organisations should be making decisions about what level of data protection and security should be applied to each particular data set.

At this stage, some teams then move back to the scattergun approach to apply security technologies such as least privilege management to every dataset they own – irrespective of its categorisation.

While zero trust can be an effective route to better security and protection, it can quickly become a very expensive option when implemented across the board. Instead, the focus should be on applying it to the most important data.

On the other hand, some companies apply zero trust in batches, and do so based on their perception of which business function creates and uses the most sensitive data – finance and HR are good examples.

The problem here is that this approach may miss other important data sources, and in modern organisations, where there can be thousands of systems within an overall architecture, this can result in potentially serious governance blind spots.

Instead, effective discovery and classification can give organisations a much better view on where to allocate their security and data protection budgets. As part of an effective governance strategy, this kind of mature approach is key and can deliver the combined benefits of robust compliance alongside the insight and intelligence that modern businesses everywhere require from their data.

Michael Queenan is CEO and co-founder of Nephos Technologies

Main image courtesy of


All rights reserved Teiss Recruitment Ltd.