Test data management
There’s an ever-growing need for software development to be better, faster, and cheaper. With the rising end-user demand and unrelenting competition, implementing a proper testing strategy is critical. An effective testing strategy includes a range of components, including Test Data Management (TDM).
Test data management is one of the big challenges we face in software development and QA. It is important that test data is highly available and easy to refresh to improve the quality and ultimately the time to market of your software. Another reason why TDM is a challenge is legislation like the GDPR. So how do you manage your test data?
What is test data management?
First, let us have a unified understanding of the term: TDM is creating, managing, and provisioning realistic test data for non-production reasons, like training, testing, development, or QA. It ensures that the testing teams get test data of the right quality in a suitable quantity, proper environment, correct format, and appropriate time. In other words:
“The right test data in the right place at the right time”
Test data management involves a number of activities, including identifying and selecting the appropriate data for testing, preparing the data for use in testing, and managing and storing the data throughout the testing process. The goal of test data management is to ensure that the data used for testing is accurate, relevant, and up-to-date, and that it properly reflects the real-world conditions in which the software or system will be used. This can help to improve the quality of testing, and ultimately the quality of the software or system being tested.
The need for proper test data
Almost every developer is convinced you first have to test any new product to know if they live up to the expectations instead of ruining the name of the company by releasing unstable software. For that reason, test drivers do an endless amount of laps driving around new concept cars. And in the same way software testers are trying out the latest versions of new applications.
In order for an application to work, it is in need of fuel, just as a car is. An application is built for processing information/data. No data means no processing. This means there is a need for test data: the fuel of an application in a test environment. Years back, test data was limited to a few sample input files or a few rows of data in the database. Today, companies depend on robust sets of test data with unique combinations that yield high coverage to drive the testing.
The attention for test data is however surprisingly low, as the tester will gather the data as needed for suiting the test cases to be executed. The application is in a test environment, and in that particular environment, data is present, which means the tests can be executed. If only it would be as simple as that…
To make a final comparison with a car; if you pour diesel into a petrol-powered car, you probably won’t end up far. On top of that, if you don’t know you made that particular mistake you probably end up taking the engine apart in an effort to find out why it is not functioning properly, only to end up finding out it was due to the wrong fuel. Test data can be just like that. To be able to assess the result of any given test for correctness, you need to be absolutely sure the input given to the application is valid.
Software testing variables
One can say testing is made out of three variables:
1) test object
2) environment
3) test data
If you want the testing process to run smoothly, with which is intended that only defects are found regarding the software under test, you will need to control and manage both other variables. Complicating factors during test execution will arise when you are not in control of test data.
To be in control of data in any given environment test data management is a necessity. Test data is not limited to one object, environment, or testing type, but influences the whole of applications and processes in your IT landscape. It is therefore a necessity to think about test data management and lay down a policy.
Test data management strategies
TDM encompasses data generation, data masking, scripting, provisioning, and cloning. The automation of these activities will enhance the data management process and make it more efficient. A possible way to do this is to link the test data to a particular test and feed it into automation software that provides data in the expected format.
How to: test data management
Not all environments require all activities, but the next components are the basics for most TDM strategies.
Data discovery
Data discovery helps to determine where privacy-sensitive information is located in your database. It also helps in discovering any data anomalies or pollution within your database. Data insight helps select the cases you need for your tests.
Data masking
One of the most urgent components of TDM is the
anonymization or obfuscation of privacy-sensitive data. Most data protection rules and regulations prohibit organizations from using personally identifiable information for testing.
Synthetic data generation
Instead of using masking rules, you can replace existing (privacy sensitive) data with synthetically generated data with generation rules. This is useful for names, dates, IBAN, SSN, etc.
Data subsetting
With the use of subsets it gets much easier to give every team their own test database, the need for data storage is decreased and idle times are significantly reduced.
Test data provisioning
Both software quality teams and DBA teams would benefit greatly from the easy test data distribution. A TDM portal makes it possible to self-refresh test data sets at the push of a button. The only waiting time is the actual (technical) data processing time.
Test data automation
If you want to automate your test process, you’ll need automated test data. And for automated test data, you’ll need a test data automation tool. A tool that provides ready-made data, integrated with other software testing tools.
Benefits of test data management
There are several reasons to start with TDM. Frequently heard reasons are 1) we need to do something with data anonymization or synthetic data generation because of privacy laws and 2) we want to go to market faster, but large environments are holding us back. Test data management has more benefits.
Find data-related bugs
The importance of test data obfuscation can be found in the fact that 15% of all bugs that are found are data related. These issues occur e.g. data quality issues. Masking data helps you to keep these 15% data-related issues in your test data set to make sure that these bugs are found and solved before you go to production.
Shorter time-to-market
The need to go to market faster is a point of discussion in many organizations now. Therefore they are looking at methods like DevOps, Agile and Continuous Testing. But in many cases, the technical infrastructure is holding them back. The soft side of agile is ”pretty” easy. But the hard part is the infrastructure. We still all live in a waterfall era. Because your test databases cannot cope with the number of agile teams.
Faster data refresh
Nowadays it sometimes takes up to 1 or even 2 weeks before a test database is refreshed. In the fast software delivery development of today that should be unacceptable. And to make things even worse, it takes more than 3 persons for the same refreshment. So there is a lot of time wasted in your software delivery process.
Test data management software
Being in control of test data is getting more important. With the help of subsetting technology, you can deploy smaller-sized sets of test data to an environment. These are flexible and sizing is not an issue anymore. With a good test data management platform, you can easily give every test team their own (masked) test data set which they can refresh on demand. This approach is not only great for efficiency and performance, but also the strategy that will help your business grow and become industry-leading.
The DATPROF software suite consists of several products that allow its customers to realize test data management solutions. The heart of the suite is formed by DATPROF Runtime. This is the test data provisioning platform where execution of DATPROF templates place. In a typical test data management implementation the most frequently used tools are:
- DATPROF Analyze for the purpose of analyzing and profiling a data source;
- DATPROF Privacy for the purpose of modeling masking templates;
- DATPROF Subset for the purpose of modeling subset templates;
- DATPROF Runtime for the purpose of running generated code, templates, and the distribution of datasets.
The patented DATPROF suite is designed to minimalize effort (hours) during each stage of the lifecycle. This translates directly into its high implementation speed and ease of use during maintenance.
Start today
Discover & Learn
Discover your data and gain analytics of the data quality by profiling and analyzing your application databases.
→
Mask & Generate
Enable test teams with high quality masked production data and synthetically generated data for compliance.
→
Subset & Reduce
Subset the right amount of test data and reduce the storage costs and wait times for new test environments.
→
Provision & Automate
Provide each team with the right test data using the self service portal or automate test data with the built-in API.
→
FAQ
What is test data management?
TDM is the process of getting or creating realistic test data for non-production purposes. It ensures that the software testing teams get the right quality test data in a suitable quantity, proper environment, correct format, and appropriate time: the right test data in the right place at the right time.
How does test data management work?
It is the full process of making sure that test data becomes easily accessible and readily available. You can think about if test data can be made available due to privacy regulations, so it needs to be protected first before it can be used.
Another part is how you make test data easily available if you have large databases as we have nowadays. How can you create small agile sets of test data in support of your software delivery process?