Test Data Generation
As an additional function for data masking
Using privacy sensitive (production) data for software testing is not only old-fashioned nowadays, it is also not allowed by the privacy laws and regulations like the GDPR and the HIPAA. But for the best software tests, you need test data that is representative, right? So how do you make sure your data is representative on the one hand, but on the other hand not traceable to a natural person?
Data masking with synthetic data
With data masking you can implement several masking rules like shuffle and blank to mask your data, but sometimes that is not enough to be absolutely sure that the data is no longer traceable. And when you use too many masking rules, the data might not be representative anymore. In these cases you can decide to use synthetically generated test data. You can replace privacy sensitive data like names, email addresses and bank account numbers with synthetic test data. This will also help you out in aligning your test data with your test cases.
Synthetic data is also called fake data, dummy data or example data. From our perspective we call this all synthetic data. And what we mean by synthetically generated test data is:
- test data that is derived from a seed file;
- randomly generated;
- or is generated based upon logic.
With DATPROF Privacy you are able to generate synthetic test data in different ways:
- Random string
- Random date/time
- Random number
- Random decimal number
- Sequential numbers
- Male First name
- Female First name
- Last name
- Country Code
- BSN (Dutch Social Security Number)
- Currency Code
- Currency Symbol
- Random value from seed file (Pick values from a custom CSV seed file)
- Regular expression (Generate values based on a regular expression)
- Weighted list (Generate values based on distribution, for example 40% Men, 60% Female).
How to generate test data for your database
When you’ve decided to use synthetically generated data for testing, you’ll need to know how to generate data so it fits your database. With DATPROF Privacy that is very easy. When you’ve connected DATPROF Privacy to your database, you just add a generation function like any other function in your masking template and generate data for that column in your database.
A great advantage of this approach is that all relationships between the tables remain unchanged. Your data structure remains functional and technical consistent, but you use synthetic data instead of privacy sensitive production data.
Of course we also support the generation of test data over a chain of systems.
The following database technologies are supported natively by the complete DATPROF suite: Oracle, Microsoft SQL Server, PostgreSQL, DB2 iSeries, DB2 LUW, EDB Postgres. We also offer non-native support, so if your database type is not on the list, don’t worry – just let us know.