What is test data?

Let’s talk about test data; there are some important skills everyone should learn. Healthcare organizations, insurance companies, financial institutions and government institutions, corporate organizations; they all need data to develop and test the quality of software and applications. But in most cases their (production) data consists of personal and privacy sensitive information and the databases are often huge and therefore inconvenient for testing. That’s where test data comes in. But what is it and how is it created?

The definition of test data

“Data used for testing purposes.”

That’s the short definition. A slightly more detailed description is given by the International Software Testing Qualifications Board (ISTQB):

“Data created or selected to satisfy the execution preconditions and inputs to execute one or more test cases.”

There is a lot of attention for testing methods like security testing, performance testing or regression testing. Testing agile and test automation are also hot topics these days. But how to handle the data (automated or not) which you need for testing software is addressed less often. That is actually quite strange since software development and testing would stand or fall on carefully prepared data cases. You can’t use just some data or just a random test case. In order to test a software application effectively, you’ll need good and representative data set. The ideal test set identifies all the application errors with a smallest possible data set. In short, you need a relatively small (test) data set that is realistic, valid and versatile.

How to create test data

Data can be created 1) manually, 2) by using data generation tools or 3) it can be retrieved from existing production environment. The data set can consist of synthetic (fake) data, but preferably it consists of representative (real) data (for security reasons this data should of course be masked) with good coverage of the test cases. This will provide the best software quality and that is what we all want ultimately.

“The ideal test data identifies all the application errors with a smallest possible data set.”

So beware with dummy data, generated by a random name generator or a credit card number generator for example. These generators provide you with sample data that offers no challenges to the software being tested. Of course synthetic data can be used to enrich and/or mask your test database.

Test data preparation in software testing

The preparation of data for testing is a very time-consuming phase in software testing. IBM’s research in 2016 showed that 30-60% of the tester’s time is spent on searching, maintaining and generating data for testing and development. The main reasons for this are the following:

  1. Testing teams do not have access to the data sources
  2. Delay in giving production data access to the testers by developers
  3. Large volumes of data
  4. Data dependencies/combinations
  5. Long refreshment times

1. Testing teams do not have access to the data sources
Especially with the GDPR, PCI, HIPAA and other data security regulations in place, access to data sources is limited. As a result only a few employees are able to access the data sources. The advantage of this policy is that the chance of a data breach is reduced. The disadvantage is that test teams are dependent on others and that long waiting times arise.

2. Delay in giving production data access to the testers by developers
Agile is not yet being used everywhere. In many organizations multiple teams and users work on the same project and thus on the same databases. Besides that it causes conflicts, the data set often changes and doesn’t contain the right (up to date) data when it’s the next team’s turn to test the application.

3. Large volumes of data
Compiling data from a production database is like searching for a pin in a haystack. You need the special cases to perform good tests and they are hard to find when you have to dig in dozens of terabytes.

4. Data dependencies/combinations
Most data values are dependent on other data values in order to get recognized. When preparing the cases, these dependencies make it a lot more complex and therefore time-consuming.

5. Long refreshment times
Most testing teams do not have the facility to self-refresh the test database. That means that they have to go to the DBA to ask for a refreshment. Some teams have to wait for days or even weeks before this refresh is done.

How to prepare test data for testing: Test Data Management (TDM)

Because TDM can be complex and expensive, some organizations stick to old habits. The test teams (have to) accept that:

  • Data isn’t refreshed often (or ever);
  • It doesn’t contain all the data quality issues present in production;
  • A high percentage of bugs/faults in test cases is related to the data.

That is a shame and totally unneccessary because it doesn’t have to be complex and TDM pays for itself. Simple techniques help you to save a lot of time and money. In addition, it ensures good tests and therefore high quality software.

the TDM Solution

Check how you could be solving your TDM bottleneck with test data automation! Create an easy to use process for your data needs.

FAQ

What is the definition of test data?

Short: “Data used for testing purposes.” A slightly more detailed description is given by the International Software Testing Qualifications Board (ISTQB): “Data created or selected to satisfy the execution preconditions and inputs to execute one or more test cases.”

How is test data created?

Data can be created 1) manually, 2) by using data generation tools or 3) it can be retrieved from existing production environment.

What does the ideal test data do?

The ideal test data identifies all the application errors with a smallest possible data set.

Get in touch with our experts

Contactform

  • This field is for validation purposes and should be left unchanged.

Data Masking

DATPROF Privacy

Data Automation

DATPROF Runtime

Data Discovery

DATPROF Analyze