What is test data?

Test data: many organizations use a copy of their production data for testing. Healthcare organizations, insurance companies, financial institutions and government institutions, corporate organizations; they all need test data to develop and test the quality of software. But in most cases their (production) data consists of personal and sensitive information and the databases are often huge and therefore inconvenient for testing.

So test data is needed in these cases, but what is it and how is it created?

The definition of test data

Data used for testing purpose, that’s the short definition. A slightly more detailed description is given by the International Software Testing Qualifications Board (ISTQB): “Data created or selected to satisfy the execution preconditions and inputs to execute one or more test cases.”

There is a lot of attention for test methods, but how to handle the data you need is addressed less often. That is actually quite strange since software development and testing stands or falls on carefully prepared test data. In order to test a software application effectively, you’ll need good and representative data set. The ideal test set identifies all the application errors with a smallest possible data set. In short, you need a relatively small (test) data set that is realistic, valid and versatile.

The creation of test data

Test data can be created 1) manually, 2) by using data generation tools or 3) it can be retrieved from existing production environment. The data set can consist of synthetic (fake) data, but preferably it consists of representative (real) data. The latter will provide the best software quality. And that is what we all want ultimately.

So beware with dummy data, generated by a random name generator or a credit card number generator for example. These generators provide you with sample data that offers no challenges to the software being tested. Of course synthetic data can be used to enrich and/or mask your test data.

“The ideal test data identifies all the application errors with a smallest possible data set.”

Test data preparation

Test data preparation is a very time-consuming phase in software testing. IBM’s research in 2016 showed that 30-60% of the tester’s time is spent on searching, maintaining and generating test data. The main reasons for this are:

  1. Testing teams do not have access to the data sources
  2. Delay in giving production data access to the testers by developers
  3. Large volumes of data
  4. Data dependencies/combinations
  5. Long refreshment times

1. Testing teams do not have access to the data sources
Especially with the GDPR in place, access to data sources is limited. Only a few employees are able to access the data sources. The advantage of this is that the chance of a data breach is reduced. The disadvantage is that test teams are dependent on others and that long waiting times arise.

2. Delay in giving production data access to the testers by developers
Agile is not yet being used everywhere. In many organizations multiple teams work on the same databases. Besides that it causes conflicts, the data set often changes and doesn’t contain the right (up to date) data when it’s the next team’s turn.

3. Large volumes of data
Compiling test data from a production database is like searching for a pin in a haystack. You need the special cases to perform good tests and they are hard to find when you have to dig in dozens of terabytes.

4. Data dependencies/combinations
Most data values are dependent on other data values in order to get recognized. When preparing test data, these dependencies make it a lot more complex and therefore time-consuming.

5. Long refreshment times
Most testing teams do not have the facility to self-refresh the test data. That means that they have to go to the DBA to ask for a refreshment. Some teams have to wait for days or even weeks before this refresh is done. Some teams never get a refreshed test data set.

Test data management

Because TDM can be complex and expensive, some organizations stick to old habits. The test teams (have to) accept that:

  • Test data isn’t refreshed often (or ever);
  • It doesn’t contain all the data quality issues present in production;
  • A high percentage of bugs/faults in test cases is related to the data.

That is a shame and totally unneccessary because it doesn’t have to be complex and test data management pays for itself. It saves you a lot of time and money. In addition, it ensures good tests and therefore high quality software.

the TDM Solution

Check how you could be solving your test data bottleneck with test data automation! Create an easy to use process for your test data needs.

A quote that fits your organization

If you want to know what it costs to use DATPROF software within your organization, please fill in this form and we will get back to you as soon as possible with a customized quotation.

  • This field is for validation purposes and should be left unchanged.

Data Masking


Data Subsetting


Data Provisioning


Data Discovery


logo pictogram datprof

Keep me posted

Want to be the first to receive updates on webinars, seminars and other news regarding test data management? Join our mailing list.

You have successfully joined our mailing list!