The impact of GDPR on test data
Since the introduction of the General Data Protection Regulation (GDPR), a lot has been said and discussed the use of production data for the testing of software applications, systems, and processes. For a while now, we’ve been looking for the answers to a range of questions posed by ourselves and others on this topic. Ultimately, we were able to narrow it down to five key questions:
1. Can you use copies of production data, by implication encompassing personally identifiable information (PII), for software development and testing or not?
2. If you request permission from your customers to use their personal data for software development and testing purposes by including it in your terms & conditions, is that allowed?
3. What about a waiver or temporary exemption? To what extent can a waiver be given for these purposes?
4. Is there a way to declare a set of data anonymized?
5. Can you place the responsibility for the loss of data with the processor via a processing agreement?
1. Can you use copies of production databases for software development or testing purposes?
The GDPR only provides the frameworks for such specific issues. The GDPR cannot, therefore, be regarded as a book that states in black and white what is and is not allowed. What the GDPR does state is that you must handle personal data with care and defines principles that you must take into account. The general tendency is that production data cannot simply be used for test environments and often it is not necessary at all. And if it is not necessary, you can say that the GDPR indicates that sufficient appropriate measures must be taken in the field of access and security.
Basically, most people agree that you shouldn’t use copies of production data for testing purposes. An edge case may be when the software is on the verge of a production release and may have to work against production data in order to prove against the real-life environment. It is therefore important that you can properly justify in which phase you will meet the due care requirements, without getting into a rut or using too much data. Proper weighing and accountability are therefore more important than a static ‘may’ or ‘may not’.
Large and complex environments make it more difficult to anonymize test data or to secure it in a more generic sense. Ultimately, with the GDPR it is a matter of weighing, which means that risks are weighed against the impact it has on the data subject and the organization if things go wrong. When organizations find that their IT landscape is so complex that they cannot apply data anonymization, they will have to be able to demonstrate this complexity and show that the utmost care has been taken. Given the current technical developments, a very strong business case will have to be created. The reality is that doing nothing is actually not an option.
2. Permission to use personal data
Furthermore, there should be a healthy interest for an organization to have good data management practices for GDPR compliance and this should not include asking customers for permission to use their personal data for testing and development purposes.
3. Is a waiver an option under the GDPR?
A waiver means that for a specific software project, production data is temporarily used instead of masked or anonymized data. Again, this is not a solution that is specifically discussed in the GDPR. The GDPR is a regulation that ensures that data is carefully processed and protected. A temporary exception is not discussed in these regulations, so a waiver (also temporary) will have no basis under the GDPR and the principles of data protection will continue to apply to the data.
4. Is there a way to declare a set of data anonymized?
As stated before, you will not find an answer on whether or how you can anonymize test data. This security measure will have to be designed by the organization itself. There is, however, an opinion of the European Data Protection Board, formerly known as the Article 29 Working Group. This working group has written an opinion piece in which an explanation is given on a number of techniques, such as anonymization or pseudonymization of data, and which conditions this must meet. This opinion piece was written in 2014, but it is still referenced and used.
Before a choice can be made for one or more of the above techniques, you must first determine what the purpose of anonymization is. Is the goal to result in a test data set that is completely anonymous? Or, is the goal to take measures to comply with the privacy principle of ‘data anonymization’? Or, perhaps, an attempt to prevent a data breach where non-anonymized data is easier to trace? If the goal is to result in a dataset that is completely anonymized, it can be a tough challenge due to e.g. spontaneous recognition cases. These are difficult to prevent, despite all the good anonymization techniques that are used. An example would be data such as the highest salary. This example is easy to solve with data generation, but often there are several such cases.
In short, depending on the purpose of using anonymization techniques, linkage attacks and spontaneous recognition cases are more or less important. The WP29 also talks about spontaneous recognition cases in which the ‘context’ is also important: in what context can the user or recipient of the data use (or misuse) this data? If the recipient can easily discover spontaneous recognition cases and use this information, then choices will have to be made to limit this risk.
Ultimately, the GDPR enables organizations to make the right balance of interests. With the GDPR at hand, decisions can be made on how to keep test data representative while maintaining due diligence standards and preventing unauthorized access to data. So the target does not have to be completely anonymous.
5. Processor vs responsible
The basic principle of the GDPR is that the data controller is responsible for personal data. So when a data controller asks a third party to store personal data in, for example, their data center, the data controller is ultimately responsible for this. The responsibility does not simply rest with the data processor. But there is a form of nuance because the data processor (the data center in this example) must carry out the work carefully. The security and performance of the service must be performed in a reasonable manner, for which the data processor is responsible. The data processor does have a security obligation and such obligations will be included in an agreement between the data controller and data processor.
Based on the above, the question about what extent organizations are obliged to anonymize test data under the GDPR starts to be answered. Although the ready-made answer will not be found in the GDPR, it is now the industry standard that test data must be anonymized. Equally, an organization must also consider whether this should be the case in any one phase of the process. If organizations can take measures in the field of data anonymization, then these measures must be taken. If data anonymization is no longer possible in a certain final phase in the test process, then there will have to be good reasons. Fortunately, there are sufficient examples available to obtain suitable test data even after anonymization.
Interested? Please feel free to contact us!
This article was written in cooperation with senior IT/Privacy Lawyer Marie-José Bonthuis of Privacy1. Marie-José (CIPP/E, CIPM, CIPT, FIP) and her colleagues help various clients in the field of data reuse for (scientific) research, privacy assessments, etc. Her specialties are privacy in chains, blockchain, and data science, anonymization, and synthetic data.