5 Reasons to subset your test database

People have several reasons to subset their test database. Some of these reasons have to do with speed, others with certain test data criteria and performance. Before we dive into the most heard reasons, we need to have a common sense of what we call test data subsets: A test data subset is a smaller sized extracted, referential integer, data set from a production / live database to a non-prod environment.

So let’s continue with the top 5 reasons for data subsetting.

1. Non-prod environments grow 3 times faster than production

Many organizations decide that the ‘lower’ environments such as development, test and acceptance, shouldn’t grow anymore. There are many situations in which is decided that non-prod gets limited storage space. Because of this, you’ll need to use your data storage and infrastructure more efficiently.

The need for storage is increasing, especially with trends like ‘internet-of-things’ and big data. In the current state when production is growing with 1 terabyte of data, non-prod databases are increasing with 3 because we copy the database to acceptance, testing and development databases. To manage and decrease the data in non-production, you can start a data subsetting project.

2. Generated test data doesn’t yield valid test cases

Some test teams generate or manually create their own test cases or test data. Synthetic test data has some pros and cons compared to ‘real’ data. The pro is that it is very useful for the development of a new function or adding new products to an application. The cons are that generating test data with the same variation as production database with all of its history requires a lot of creativity. For example, telephone numbers changed, bank account numbers, etc. And manually creating test data for a data model with over 500 tables is nearly impossible. You’ll want your highly educated developers and testers to do something more useful.

The most important reason for organizations to choose subsetted data over synthetic test data is because they need to be able to trust their test data. Production-like data (or rather, a selection of it) is so much more reliable than ‘fake data’.

3. It is too intensive to generate data for a data model with over 1.000 tables

Having a large data model, for example over 1.000 tables, is a great reason to start using a subsetting technology. Why? Because creating useful test data for such a data model is, to say the least, challenging.

Generating synthetic test data is possible when you have less than 200 tables. We wouldn’t reccomend it as a stand-alone solution, but with this kind of sizing it is possible. For more than 500 tables, generation still can be done, but creating useful test data is getting more difficult. When the number of tables is growing, generating qualitative test data is getting nearly impossible. Theoretically it probably be done but your results aren’t credible. For organizations with large data models subsetting technology can make the difference. And the test data is useful!

4. TEST AUTOMATION demands proper test data

Lately more clients ask us to create a test data subsets for test automation. Many teams already use or have started thinking about automated testing. Implementing this is moving forward towards a more adult test organization.

Many teams choose an automation tool, implement it and start using it. Without using the anecdotes about a fool and a tool, later on they discover that they don’t know how to provide data that can be used for test automation. But they need data. So they grap back to generated test data (with all its disadvantages) or they use a full sized copy of production. Which is often highly inefficient due to its size. The ideal assent in this situation is to use a subset for automation: less test data, more (and quicker!) results.


5. Shorten the idle time of batch processes

As a last reason many organizations test batch processes. In general a batch process can take up to 24 hours or even longer. One of the reasons for this long process is the use a full sized copy of production to test their batch. The improvement of test data subsets in this process has an effect straight away. So creating a subset of production will improveme this significantly.

Maybe your will recognize some reasons, maybe you have other reasons. Either way we hope this blog will help you and your organization! If you have any questions, don’t hesitate to contact us.

logo datprof subset

Watch this technical product demonstration and learn how to subset your non production databases (manually or automated) with the help of DATPROF Subset.

  • Hidden

Data Masking


Data Automation


Data Discovery