5 reasons to subset your test database

People have various reasons for subsetting their test database. Some relate to speed, others to specific test data criteria and performance. Before we explore the most common reasons, let’s establish a shared understanding of what we mean by ‘test data subsets’: A test data subset is a smaller-sized, extracted, referential integer dataset from a live production database to a non-production environment.

Now, let’s proceed with the top 5 reasons for data subsetting.

1. Non-production environments grow 3 times faster than production

Many organizations have decided that ‘lower’ environments such as development, testing, and acceptance should no longer expand. In numerous situations, it’s determined that non-production environments will have limited storage space. As a result, it becomes essential to utilize your data storage and infrastructure more efficiently.

The need for storage is on the rise, particularly with trends like the ‘internet of things’ and big data. Currently, when production data grows by 1 terabyte, non-production databases expand by 3 terabytes because we replicate the database across acceptance, testing, and development environments. To effectively manage and reduce data in non-production environments, you can initiate a data subsetting project.

2. Generated test data doesn’t always yield valid test cases

Some test teams opt to generate or manually create their own test cases or test data. Synthetic test data has its advantages and disadvantages when compared to ‘real’ data. The advantage is that it is very useful for developing new functions or adding new products to an application. The disadvantages include the challenge of generating test data with the same variations as a production database, including all its historical changes, such as telephone numbers and bank account numbers. Moreover, manually creating test data for a data model with over 500 tables is nearly impossible. It’s more valuable to have highly educated developers and testers focus on other tasks.

The most compelling reason for organizations to choose subsetted data over synthetic test data is the need for trustworthy test data. Production-like data, or a selected subset of it, is far more reliable than ‘fake data’.

3. It’s incredibly challenging to generate data for a data model with over 1,000 tables

Having a large data model, with, for example, over 1,000 tables, is a compelling reason to consider using subsetting technology. Why? Because creating useful test data for such a complex data model is, to say the least, arduous.

Generating synthetic test data is feasible when you have fewer than 200 tables. While we wouldn’t recommend it as a stand-alone solution, it’s possible with this level of complexity. However, when you have more than 500 tables, data generation becomes increasingly difficult. As the number of tables grows, generating high-quality test data becomes nearly impossible. Theoretically, it may be achievable, but the results are not trustworthy. For organizations with large data models, subsetting technology can be a game-changer, providing valuable and credible test data.

4. Test automation demands proper test data

Recently, more clients have been requesting test data subsets for test automation. Many teams are already using or considering automated testing, which represents a step forward toward a more mature testing organization.

In many cases, teams choose an automation tool, implement it, and start using it, only to realize later that they lack the knowledge of how to provide suitable data for test automation. They need data, so they often resort to generated test data (despite its disadvantages) or use a full-sized copy of the production data, which is often highly inefficient due to its size. The optimal approach in this situation is to utilize a subset for automation: less test data results in more efficient and quicker testing outcomes.


5. Reducing idle time in batch processes

Another compelling reason why many organizations test batch processes is to address the issue of lengthy processing times. In many cases, batch processes can consume up to 24 hours or even longer. One of the contributing factors to this extended duration is the use of a full-sized copy of production data for testing. The introduction of test data subsets into this process can have an immediate impact on its efficiency. Creating a subset of production data can significantly improve batch processing times.

You might recognize some of these reasons, or perhaps you have other motivations. Regardless, we hope this blog has been helpful to you and your organization! If you have any questions, please don’t hesitate to reach out to us.

Book a demo

Schedule a product demonstration with one of our TDM experts.

Book a demo

"*" indicates required fields

TDM Platform

The right test data in the right place at the right time. Masked, generated, subsetted, virtualized and automated at the push of a button.