Test data masking
Protect privacy sensitive data in non-production databases, comply with legislation and prevent data leaks in QA environments
Nowadays, more and more organizations use dozens of databases and applications for their processes. It´s common to copy those databases for other use than the primary process. The majority create multiple copies of those production databases for different purposes like development, testing, acceptance, training, outsourcing, etc. A lot of these databases contain privacy sensitive personal data or corporate critic and sensitive data. But how do you deal with this? In this solution article we inform you about test data masking in a broad sense.
1. Data protection with GDPR
Copying your database means that you now have to secure not one database but for example ten databases. That´s why most governments stated data privacy laws to protect the customers, civilians from wrongdoing. Not protecting privacy sensitive you risk the following:
- Not complying with data privacy laws and European Union directive concerning data protection
- Exposure of privacy sensitive data to unauthorized users
- Image loss because of bad publicity when data is leaked
- Customers that terminate their relation because of lag of trust in security
Privacy sensitive data
When is data personal or privacy sensitive? A name for example is personal, but not privacy sensitive. The city that you live in is also not privacy sensitive. It is public information. But the fact that you have a huge debt or a disease makes your data privacy sensitive. In this example, by separating name, city, disease and debt, the data cannot refer back to a certain person and therefor it is not privacy sensitive anymore.
2. Data masking definition
There are different terms used interchangeably for the definition of data masking, like data anonymization or data obfuscation. For the convenience, we use the term data masking.
Data masking meaning
Data masking is the process of hiding personal or privacy sensitive data. The mean reason is to ensure that the data cannot refer back to a certain person. There are different methods for masking data. The method you choose depends on the type of data you want to mask.
Scrambled data in testing
Anonymizing or scrambling production data within non-protection databases is used more and more. You still have your full database with ‘normal’ data, but all privacy sensitive is modified that it cannot be linked to the original individual.
3. Data masking methods
When you have determined what data should be masked or anonymized, you can choose your data masking technique. A common method is to shuffle data like first name and last name, so you get new first name / last name combinations. Another method to mask your data is to blank a column that you don’t need for testing. In that way, privacy sensitive data and all its risks can literally be removed. Scrambling data is another commonly used method to make data unrecognizable: it replaces characters by x and numbers by 1.
Tutorial: how to mask test data
What is data masking? Where should you start? What things should you keep in mind?
4. Data masking examples
When you start deploying data scrambling rules to test data you’ll end up with representative but unrecognizable test data. There are many techniques which can be used, as showed above. But an example of scrambled vs production data could be like this:
Data scrambling example
|ID||First Name||Last Name||Bank account|
As you can see the scrambled data looks as representative as production data. In this case we see really the simple test data. But it shows the possibilities of scrambling test data.
4.1. Data masking best practices
There are several best practices that are usable when you want to mask data. There are also several levels of best practices:
On a data level
On a data level, we mean what data should you mask. When do you mask enough data to become compliant but keep the test data as representative as possible so the test organization can still use the data as test data?
What’s important is that you should know where data is stored. If you know where and how data is stored you’re able to deploy data masking rules. An important best practices on a data level is: do something with data of birth and postal area. If these remain the same, research shows that you’re pretty identifiable.
On an organization level you’re able to discuss where data masking is executed. It is important that it is as secure as possible. Preferably we see it happening in a staging area for example.
4.2. Data masking tips
There are several tips to be given, but maybe the most important one is: try to start simple. We see many organization blowing up the data masking project. But just start in a simple manner and along the way improve your data masking rules. It can be turned into a big project, which is probably the case. But doing nothing is even worse. So even if your first masking run isn’t 100% perfect – it is better than nothing!
Some other tips: start with analysing where data is stored. And start discussing the masking rules with you CISO (Chief Information Security officer) or DPO (Data Protection Officer). Tell them that replacing data with only ‘xxxxxx’ isn’t going to help the business. Just discover where common grounds can be found. And if you’d like some help, don’t hesitate to contact us.
5. Data masking in your database
DATPROF applies to the software lifecycle of the database vendors
|Oracle||Version 11.2 and above|
|Microsoft SQL Server||Version 2008 (not for Runtime)
|DB2 LUW||10.5 and above|
|DB2 for i||7.2
(Generate for Runtime only)
* Check the Powershell module remarks
Want to try yourself?
With our 14-days free trial you can try DATPROF Privacy yourself. Mask, scramble, blank your own database and see how easy it can be to anonymize your test data.
Click on the download button and start today!
Mask your privacy sensitive data and use it for development and testing.