GDPR - Data Discovery

GDPR Compliance Step-by-Step: Part 2 – Data Discovery

In truth, this should be called Data Discovery & Asset Management, because there’s absolutely no point having one without the other. Nor should these things not already be part of your standard practices.

It’s 2018 and I can think of very few businesses who don’t have data as some of their most critical assets. No businesses bothering to read my blog anyway. So if data assets are that critical, why don’t you already KNOW where all of your personal data is? Why don’t you already have a record of who has access to it, and what they are doing with it?

Assuming yours is like 99.9% of business out there, you don’t have these mappings, and the reasons are as myriad as the business themselves. And maybe it didn’t matter that much up to now, but now it does. Very much so.

Under GDPR you are responsible for:

  1. Determining your lawful basis for processing for each of your separate business processes (both internal and client facing);
  2. Implementation of data subject rights in-line with 1. (erasure, portability etc.);
  3. Data minimisation (during collection and data retention);
  4. Data confidentiality, integrity, and availability (all to defensible levels);
  5. ‘Legitimising’ all transfers of, and responsibilities for, data to third parties

…and so on.

How exactly can you perform any of these things if you don’t KNOW what you have and where it is?

So assuming you agree with me so far and you need to get started, you have 3 choices:

  1. Run a series of interviews and questionnaires with all unique departmental stakeholders to manually track this stuff;
  2. Run some form of data discovery technology to find the data on end systems / databases / other file stores etc.; or
  3. Do both at the same time

Clearly 3. is the best option because a) you’ll need to involve the departmental stakeholders at some point anyway, and b) no manual process will ever find the stuff you had no idea was even there (which will be a lot).

While I’m not going to go into step-by-step instructions on how to run the two main processes above – which are bespoke to each business – I will provide sufficient guidance for you to ask the right people the right questions:

Interviews and Questionnaires

At a bare minimum, you will need to collect this information from all the non-IT departmental verticals (HR, Sales, Finance etc.):

  1. All categories of data received (name, phone number, home address etc.) both ‘direct’ and ‘ancillary’;
    i. Direct Personal Data (DPD, my term) is any data that by itself can determine the data subject – e.g. email address, mobile phone number, passport number etc.
    ii. Ancillary Personal Data (APD, again, my term) is any data that in and of itself cannot be used to determine the data subject, but a combination of APDs may do so. Or the loss of this data along with the related DPD would make the breach significantly worse – e.g. salary, disciplinary actions, bonus payments, disability etc.;
  2. In which application, database, file store etc. this data is kept – not looking for mappings down to the asset tag, just a general understanding;
  3. Data retention related to each business process – e.g. all data categories related to the ‘x’ process must be retained for ‘y’ years post termination;
  4. List of third parties/third countries – if applicable, from whom do you receive the relevant categories of data, and to whom do you share them;
  5. Other – Ideally you should also be able to determine, per business process, whether; a) you are the controller or processor for each data category, and b) the data category is ‘mandatory’ (process won’t work without it) or ‘non-mandatory (nice-to-have)

Assuming the non-IT stakeholders cannot answer the below questions, you will need to speak to those who can (from IT, InfoSec, relevant third parties etc.):

  1. Location of data sources – physical location, including end-system (e.g. host name of server, AWS instance ID, etc.), data centre, address and country;
  2. Format of data – e.g. ‘structured’ (databases, spreadsheets, tables etc.), and ‘unstructured’ (Word documents, PDFs, photos, etc.)
  3. Data ownership – someone has to be responsible;
  4. Security controls – e.g. encryption / anonymisation / pseudonymisation, and everything related to ISO 27001/NIST et al;
  5. Network and data flow diagrams – a ‘baseline’ of how personal data should flow in your environment

Data Discovery

I’m really not sure how the phrase ‘data discovery’ got lumped together with ‘business intelligence’, but that’s the way it seems on Google at least. And no matter which data discovery technology vendor you ask, they can all provide everything you need for GDPR compliance. If you believe them.

For those of you who remember the PCI DSS v1.0 way back in December 2004, you will also remember the disgusting land-grab of every vendor whose technology or managed services fulfilled one of the 12 requirements. You will also remember the even more disgusting price-hikes by those technology vendors, especially by Tripwire, whom Visa was dumb enough to list as an ‘example’ of file integrity monitoring.

So guess what’s happening now? Because every data protection regulation requires you to know what data you have, data discovery vendors are crawling all over each other to push their wares down your throat.

But all data discovery tools are not the same, and unless you can fully define what results YOU need from one, stick to the manual process until you do. These are a few of the questions to which you need answers:

  1. Do I need a permanent solution, or can a one-time, consultant-led, discovery exercise suffice for now?;
  2. Should the solution be installed locally on self-installed server(s), an appliance, cloud-based?;
  3. Can the solution perform discovery on the end-systems, databases, AND traffic ‘on-the-wire’?;
  4. How will the solution manage encrypted files / databases / network protocols?;
  5. Does the solution need to accommodate Cloud-based environments?;
  6. Do I need an agent, or agent-less solution?;
  7. Should the solution be able to perform network discovery (NMAP-esque) and some semblance of asset management to guarantee coverage?;
  8. Do you need data-flow mapping as well?;
  9. Should the solution be integrated with some form of compliance management tool?;
  10. Is this solution to be self-managed or outsourced?

…and the list goes on.

As I’ve stated many times over the almost 5 years I’ve been blogging; “No technology can fix a broken process, it can only make a good process better”. Data discovery is no different, and its many and very significant benefits will only be realised when you ask the right questions.

Don’t know the right questions? Find someone who does.

[If you liked this article, please share! Want more like it, subscribe!]

4 thoughts on “GDPR Compliance Step-by-Step: Part 2 – Data Discovery

  1. Thanks Froud, I love your down to earth approach. Another great article, and cant wait for the next one.

  2. Would it be possible to give a list of some of these data discovery solutions? Of course I am not looking for your recommendation but rather a starting point in regards to some names.
    I have found some lists but they seem to be more for Business Intelligence, or KPIs.

Leave a Reply

Your email address will not be published. Required fields are marked *