The impact of GDPR on test data

Since the introduction of the General Data Protection Regulation (GDPR), a lot has been said and discussed about the use of production data for the testing of software applications, systems and processes. For a while now, we’ve been looking for the answers to a range of questions posed by ourselves and others upon this topic. Ultimately, we were able to narrow it down five key questions:

1. Can you use copies of production databases for software development or testing purposes?

The GDPR only provides the frameworks for such specific issues. The GDPR cannot therefore be regarded as a book which states in black and white what is and is not allowed. What the GDPR does state is that you must handle personal data with care and defines principles which you must take into account. The general tendency is that production data cannot simply be used for test environments and often it is not necessary at all. And if it is not necessary, you can say that the GDPR indicates that sufficient appropriate measures must be taken in the field of access and security.

Basically, most people agree that you shouldn’t use copies of production data for testing purposes. An edge case may be when the software is on the verge of production release, and may have to work against production data in order to prove against the real-life environment. It is therefore important that you can properly justify in which phase you will meet the due care requirements, without getting into a rut or using too much data. Proper weighing and accountability is therefore more important than a static ‘may’ or ‘may not’.

Large and complex environments make it more difficult to anonymize test data or to secure it in a more generic sense. Ultimately, with the GDPR it is a matter of weighing, which means that risks are weighed against the impact it has on the data subject and the organization if things go wrong. When organizations find that their IT landscape is so complex that they cannot apply data anonymization, they will have to be able to demonstrate this complexity and show that the utmost care has been taken. Given the current technical developments, a very strong business case will have to be created. The reality is that doing nothing is actually not an option.

2. Permission to use personal data

Although consent is a valid ground for processing personal data, there are practical objections to its use. Consent seems relatively easy, but there is a lot involved. Consent of the data subject must be specific. In short, the person who gives permission must be able to estimate, if not fully understand, what he or she is giving permission for. Think for which tests (performance, application, functional, etc.) the data is being used and for what period of time. Another practical objection is that it is technologically difficult to select and manage the group of people who agree and only place this group in a test environment / database.

Furthermore, there should be a healthy interest for an organization to have good data management practices and this should not include asking customers for permission to use their personal data for testing and development purposes.

3. Is a waiver an option under the GDPR?

A waiver means that for a specific software project, production data is temporarily used instead of masked or anonymized data. Again, this is not a solution that is specifically discussed in the GDPR. The GDPR is a regulation that ensures that data is carefully processed and protected. A temporary exception is not discussed in these regulations, so a waiver (also temporary) will have no basis under the GDPR and the principles of data protection will continue to apply to the data.

4. Is there a way to declare a set of data anonymized?

As stated before, you will not find an answer on whether or how you can anonymize test data. This security measure will have to be designed by the organization itself. There is, however, an opinion of the European Data Protection Board, formerly known as the Article 29 Working Group. This working group has written an opinion piece in which an explanation is given on a number of techniques, such as anonymisation or pseudonymisation of data and which conditions this must meet. This opinion piece was written in 2014, but it is still referenced and used.

Before a choice can be made for one or more of the above techniques, you must first determine what the purpose of anonymization is. Is the goal to result a test data set which is completely anonymous? Or, is the goal to take measures to comply with the privacy principle of ‘data anonymization’? Or, perhaps, an attempt to prevent a data breach where non-anonymized data is easier to trace? If the goal is to result a dataset which is completely anonymized, it can be a tough challenge due to e.g. spontaneous recognition cases. These are difficult to prevent, despite all the good anonymization techniques that are used. An example would be data such as the highest salary. This example is easy to solve with data generation, but often there are several such cases.

In short, depending on the purpose of using anonymization techniques, linkage attacks and spontaneous recognition cases are more or less important. The WP29 also talks about spontaneous recognition cases in which the ‘context’ is also important: in what context can the user or recipient of the data use (or misuse) this data? If the recipient can easily discover spontaneous recognition cases and use this information, then choices will have to be made to limit this risk.  

Ultimately, the GDPR enables organizations to make the right balance of interests. With the GDPR at hand, decisions can be made on how to keep test data representative while maintaining due diligence standards and preventing unauthorized access to data. So the target does not have to be completely anonymous.

5. Processor vs responsible

The basic principle of the GDPR is that the data controller is responsible for personal data. So when a data controller asks a third party to store personal data in, for example, their data center, the data controller is ultimately responsible for this. The responsibility does not simply rest with the data processor. But there is a form of nuance, because the data processor (the data center in this example) must carry out the work carefully. The security and performance of the service must be performed in a reasonable manner, for which the data processor is responsible. The data processor does have a security obligation and such obligations will be included in an agreement between the data controller and data processor.

Conclusion

Based on the above, the question about what extent organizations are obliged to anonymize test data under the GDPR starts to be answered. Although the ready-made answer will not be found in the GDPR, it is now the industry standard that test data must be anonymised. Equally, an organization must also consider whether this should be the case in any one phase of the process. If organizations can take measures in the field of data anonymization, then these measures must be taken. If data anonymization is no longer possible in a certain final phase in the test process, then there will have to be good reasons. Fortunately, there are sufficient examples available to obtain suitable test data even after anonymization.

Interested? Please feel free to contact us!

This article was written with senior IT/Privacy Lawyer Marie-José Bonthuis of Privacy1. Marie-José (CIPP/E, CIPM, CIPT, FIP) and her colleagues help various clients in the field of data reuse for (scientific) research, privacy assessments, etc. Her specialties: privacy in chains, blockchain and data science, anonymization , synthetic data.

Want to know more?

We’re just an email away!

  • This field is for validation purposes and should be left unchanged.

Data Masking

DATPROF Privacy

Data Automation

DATPROF Runtime

Data Discovery

DATPROF Analyze