Using production data for testing
Should you do it or not?
October 17, 2019 | Nynke Hogeveen
Many organizations have a test or QA environment that is connected to test/QA data sources – a database with test data. Some of these test databases contain fake data, made up by QA engineers. This fake data is either produced by hand or by self-built scripts. Yes; this seems pretty outdated, but it still happens a lot. However, this method causes certain problems: many production issues are due the lack of real(istic) test data. The fake data doesn’t contain every data issue present in production, which results in bad or even useless test results.
Testing with production data
To create software of the highest quality possible, you’ll need to keep the test environment as “in-sync” as possible with production. That’s why many QA teams copy complete production data to the QA data sources to catch more (preferably all) issues. But there are a few things to consider regarding this method:
- Does the data contain privacy sensitive information? If so, you need to mask, filter or simply remove this data due to privacy regulations.
- Can the test environment handle that much data? If not, you need to break apart the data or something…
- What happens if you need a new copy of production and it overwrites the earlier changes? Will it break your tests? You would need some sort of refresh option.
- Are there dependencies between data? Then you’d need to test all possible circumstances or settings.
Above points of attention show that testing with (a copy of) production data is not as easy as it sounds. In fact, it can be very risky to just copy production data to your test environment because of privacy sensitive information. Also storage and database license (costs) can become a serious issue. If you make multiple copies of production (one copy for every test team), the size gets out of control quickly and the bill runs high.
But does that mean you can’t use production data for testing? Luckily not!
Production data in test / development environments
It is pretty simple to use production data in your test environment, as long as you take privacy sensitive information and sizing into account. For these two problems, there are very good solutions: masking privacy sensitive data with DATPROF Privacy and subsetting data with DATPROF Subset.
Mask production data
With DATPROF Privacy you easily make your test data anonymous. By masking or scrambling the data, DATPROF Software enables you to mask sensitive data so it can’t be traced to a person anymore. For example you can shuffle first and last names, you can blank fields, generate a new SSN, bank acount numbers, create your own masking rules and many more. It also makes sure that data is consistent over multiple applications and databases.
Subset production data
With its patented algorithm, DATPROF Subset extracts specific selections (even less then 1%) out of production database. You can specify and filter which data you want made available in your subset. You can add extra filters, transform data with column expressions and add extra dependencies or custom foreign keys. This way the subset contains all the issues present in production, but storage isn’t a problem anymore. With subsets you can enable every test team with a test data set of its own.
Fake data won’t help you create high quality software. It doesn’t contain production data issues you want to discover. So then should you use production data for testing? Absolutely, but under the condition that you mask and subset your data before you use it for testing. Otherwise you’ll get in trouble because of the privacy regulations and/or storage problems.
Do you want to know how DATPROF can turn things around at your organization? Contact us without obligation. We’re happy to help!