Using production data for testing
Sharing our considerations
Many organizations have a test or QA environment that is connected to a test/QA data source – the database with test data. Some of these test databases contain fake data that is made up by QA engineers. This fake data is either produced by hand or by self-built scripts. Yes; this seems pretty outdated, but it still happens a lot. However, this method causes certain problems: many production issues are due the lack of real(istic) test data. Dummy data doesn’t contain every data issue present in production, which may result in bad or even useless test results.
Testing with production data
To ensure software of the highest quality possible, you’ll need to keep the test environment as “in-sync” as possible with production. That’s why many QA teams copy complete production data to the QA data sources to catch more (preferably all) issues. But there are a few things to consider regarding working with production data:
- Does the data contain privacy sensitive information? If so, you need to mask, filter or simply remove this data due to privacy regulations.
- Can the test environment handle that much data? If not, you need to break apart the data or something…
- What happens when you need a new copy of production and it overwrites the earlier changes? Will it break your tests? You would need some sort of refresh option.
- Are there dependencies between data? Then you’d need to test all possible circumstances or settings.
Above points of attention show that testing with (a copy of) production data is not as easy as it sounds. In fact, it can be very risky to just copy production data to your test and dev environments because of possible privacy sensitive information. Also storage and database license (costs) can become a serious issue. If you make multiple copies of production (one copy for every test team), the size gets out of control quickly and the bill runs high.
But does that mean you can’t use production data for testing?
It is pretty simple to use production data within your test environment, as long as you take compliance and sizing into account. For both of these problems, there is a very good solution: masking privacy sensitive data with DATPROF Privacy and subsetting data with DATPROF Subset.
Mask production data
You don’t want to risk a fine for breaking privacy laws like the GDPR. With DATPROF Privacy you easily make your test data anonymous. By masking or scrambling the data, DATPROF Software enables you to mask sensitive data so it can’t be traced to a person anymore. For example you can shuffle first and last names, you can blank fields, generate a new SSN, bank acount numbers, create your own masking rules and many more. It also makes sure that data is consistent over multiple applications and databases.
Subset production data
With its patented algorithm, DATPROF Subset extracts specific selections (even less then 1%) out of production database. You can specify and filter which data you want made available in your subset. You can add extra filters, transform data with column expressions and add extra dependencies or custom foreign keys. This way the subset contains all the issues present in production, but storage isn’t a problem anymore. With subsets you can enable every test team with a test data set of its own. Plus it’s great for the performance and refresh time when you only work with small subsets instead of full copies.
Fake data won’t help you create high quality software. It doesn’t contain production data issues you want to discover. So then should you use production data for testing? Absolutely, but under the condition that you mask and subset your data before you use it for test and dev. Otherwise you’ll get in trouble because of the privacy regulations and/or storage problems. In short: manage your systems, their security and automation for optimal TDM.
Do you want to know how this approach with DATPROF tools and techniques can turn things around at your organization? Contact us without any obligation. We’re here to help and answer all of your questions!
What is production data?
Production data is information that is persistenly stored and used to conduct day-to-day business tasks and processes.
Can I use production data for testing?
Yes, you can. But only if you mask the privacy sensitive data to comply with privacy regulations like GDPR, PCI and HIPAA.
Why not just generate synthetic data?
Fake data won’t help you create high quality software. It doesn’t contain production data issues you want to discover.
Start with better Test Data Management today
Extract small reusable subsets from large complex databases and speed up your testing.