Protect privacy sensitive data and personal identifiable information with data masking in non-production databases, comply with legislation and prevent data leaks in QA environments
Nowadays, more and more organizations use dozens of databases and applications for their processes. It´s common to copy those databases for other use than the primary process. The majority create multiple copies of those production databases for different purposes like development, testing, acceptance, training, outsourcing, etc. A lot of these databases contain personal identifiable information or corporate critic and privacy sensitive data. But how do you deal with this? In this solution article we inform you about test data masking in a broad sense.
1. Data protection with GDPR
Copying your database means that you now have to secure not one database but for example ten databases. That´s why most governments stated data privacy laws to protect the customers, civilians from wrongdoing. Not protecting these personal identifiable information, you’ll risk the following:
- Not complying with data privacy regulations and European Union directive concerning data security
- Exposure of privacy sensitive data to unauthorized users
- Image loss because of bad publicity when data is leaked
- Customers that terminate their relation because of lag of trust in security
Privacy sensitive data
When is personal identifiable information personal or privacy sensitive? A name for example is personal, but not privacy sensitive. The city that you live in is also not privacy sensitive. It is public information. But the fact that you have a huge debt or a disease makes your data privacy sensitive. In this example, by separating name, city, disease and debt, the data cannot refer back to a certain person and therefor it is not privacy sensitive anymore.
2. Data masking definition
There are different terms used interchangeably for the definition of data masking, like data anonymization or data obfuscation. For the convenience, we use the term data masking.
Data masking meaning
Data masking is the process of hiding personal or privacy sensitive data. The mean reason is to ensure that the data cannot refer back to a certain person. There are different methods for masking data. The method you choose depends on the type of data you want to mask.
Scrambled data in testing
Anonymizing or scrambling production data within non-protection databases is used more and more. You still have your full database with ‘normal’ data, but all privacy sensitive is modified that it cannot be linked to the original individual.
3. masking methods
When you have determined what personal identifiable information should be masked or anonymized, you can choose your data masking technique within DATPROF Privacy. A common method is to shuffle data like first name and last name, so you get new first name / last name combinations. Another method to mask your data is to blank a column that you don’t need for testing. In that way, privacy sensitive data and all its risks can literally be removed. Scrambling data is another commonly used method to make data unrecognizable: it replaces characters by x and numbers by 1.
Synthetic data generation
Another masking method is generating synthetic data. This method replaces privacy sensitive information with synthetically generated data. The big advantage of this approach is that schema’s and structures of your original data are preserved. With the use of Deterministic Masking you know for sure that all data is being replaced with the same generated data consistently, regardless of which database or system the data is in.
Tutorial: how to mask data
Where should you start a project? What things should you keep in mind?
4. masking examples
When you start deploying data scrambling rules to data you’ll end up with representative but unrecognizable data. There are many techniques which can be used, as showed above. But an example of scrambled vs original data could be like this:
Data scrambling example
|ID||First Name||Last Name||Bank account|
As you can see the scrambled data looks as representative as production data. In this case we see really the simple test data. But it shows the possibilities of scrambling test data.
4.1. Data masking best practices
There are several best practices that are usable when you want to mask data. There are also several levels of best practices:
On a data level
On a data level, we mean what data (personal identifiable information) should you mask. When do you mask enough data to become compliant but keep the test data as representative as possible so the test organization can still use the data as test data?
What’s important is that you should know where data is stored. If you know where and how data is stored you’re able to deploy data masking rules. An important best practices on a data level is: do something with data of birth and postal area. If these remain the same, research shows that you’re pretty identifiable.
On an organization level you’re able to discuss where data masking is executed. It is important that it is as secure as possible. Preferably we see it happening in a staging area for example.
There are several tips to be given, but maybe the most important one is: try to start simple. We see many organization blowing up the data masking project. But just start in a simple manner and along the way improve your data masking rules. It can be turned into a big project, which is probably the case. But doing nothing is even worse. So even if your first masking run isn’t 100% perfect – it is better than nothing!
Some other tips: start with analysing where data is stored. And start discussing the masking rules with you CISO (Chief Information Security officer) or DPO (Data Protection Officer). Tell them that replacing data with only ‘xxxxxx’ isn’t going to help the business. Just discover where common grounds can be found. And if you’d like some help, don’t hesitate to contact us.
Compliancy project plan
A masking plan is critical to a successful anonymization project and that’s what this document is designed to help you with. Download the whitepaper for free!
5. Supported databases
DATPROF applies to the software lifecycle of the database vendors
|Oracle||Version 11.2 and above|
|Microsoft SQL Server||Version 2008
|DB2 LUW||10.5 and above|
|DB2 for i||7.2 | 7.3|
|PostgreSQL||9.5 | 9.6 | 10.5 | 11 | 11.2 | 11.6 | 12 | 12.1|
* Check the Powershell module remarks
Mask privacy sensitive data and generate synthetic test data with DATPROF Privacy. Try 14 days for free. No credit card required.