Test data best practices
How to handle test data in modern software development
Over the past decade we have had a lot of conversations with organizations about test data management. Many of these organizations have similar problems. That gave us the idea to share some best practices, to deal with test data quickly and smoothly. Handy, especially in these times of modern software development.
5 test data best practices
Discover and understand your data
It is crystal clear that there is a massive lack of database knowledge. In IT everyone knows a lot about ‘code’, but databases are just those trivial things in which data is stored. Talking about databases is not cool – a stark contrast to everyone claiming that data is the new gold.
A big improvement would be to remove this lack of data(base) knowledge. Sure, everyone understands that databases store data. But do you also know how and where? Customer data like first and last names are logically stored in a customer table. But where are the address details? And what about date of birth, bank account numbers? In some occasions these are stored in the same customer table, but just as often this information is stored in another table (too). Next to the tables there are also things like foreign keys and indexes; the data model. Sometimes the data model is neatly formalized but in other cases the data model is not even available in the database. In short: much information is unknown to organizations that want to start managing their test data. So the first best practice is; know your data and your data model.
Protect privacy sensitive data
Once you have an understanding of the information that is stored inside your database, it is time to take protective measures. Almost every country in the world has data protection legislation in place to make sure that customer data is protected. In general, collected customer data may only be used for the initial purpose. By default testing and development work is not included which means that this customer data may not be used for this purpose. But you need production(-like) data for reliable testing, don’t you? There’s a simple solution: mask or anonymize the data. Make sure that personal, privacy sensitive data is anonymized and cannot be traced back to a natural person.
Ask the DBA for help
Despite all multidisciplinary teams of today, we unfortunately still too often see organizations that are given little or no thought about a proper test data team with at least one database engineer or administrator. This does not help in solving your test data management problems. If you want to discover and fix errors early in the software development process, you not only have to test the application on its code, but you also need representative test data; data corresponding to production. Software teams often think that they can make an estimate of the data they need.
But as outlined in best practice number 1, we regularly encounter situations where there is a lack of knowledge in this area.
If a test team doesn’t have a database administrator (DBA), this role is fulfilled by another department. And this directly causes many delays and problems. Requesting a new database (a production copy or something else) immediately creates challenges. On the one hand because DBAs have other (primary) tasks, on the other hand because they are often involved too late in the software delivery process. The latter often means that things have to come out of the woodwork, but the database administrator has no clue what has happened and what is expected. As a result, teams get data(bases) that often do not meet the requirements in the first instance, resulting in even more delays.
When using test data solutions, the DBA still has an important role to play since a database should be prepared for the software delivery process. This preparation of a database should actually be an automated process, but starting this process is often done by DBA and not by the software team, with valid reasons. Again, our experience is that involving database engineers at an early stage provides enormous benefits when implementing test data management solutions. When database administrators take a closer look at our test data tools and especially once they know how it works, they quickly see the benefits. Some of these benefits for database administrators are that they do not have to suddenly set up databases in test environments. And more importantly for them, they know that our tools handle databases well. The advantage for the software development teams is that after our tools have been setup, that software teams now can automatically provision test data with our tools.
In short, try to get early involvement of database administrators or similar roles in the software development process and make sure that test data can be obtained automatically instead of a lengthy acquisition process.
Use micro databases
If all previously mentioned practices are linked together and combined with micro databases, a huge step can be made in the speed and effectiveness of testing and test data. Before such micro databases can be realized, there needs to be an understanding of data in databases and how it can be linked to test cases. In addition, a bridge must be built with DBA to realize automation.
When it’s clear which data should be used for testing (because there is more understanding), these specific cases can be extracted from a production database with subset technology. If only this selection of test cases is used for testing, a very minimal “micro” database remains. From experience we know that customers often start with a 50% copy, then this is refined to 25% to ultimately environments of less than 1% of production.
Testing with these incredibly reduced databases has a huge impact on the speed at which these database environments can be made available.
Ensure easy test data distribution
Once you’ve reduced the test databases, you can be like Oprah Winfrey: “You get a database, you get a database, everyone gets a database.” This would never be possible if you’re working with full copies of production – imagine the storage costs! But with a mature test data management solution in place dreams become reality.
With the realization of micro databases it is possible to give every team their own database to use for test and/or development. This also solves a lot of frustration; no more test teams corrupting the dataset that another team had carefully put together, leaving the data useless for them.
So this last best practice is: provide small “micro” databases to teams to speed up the test and development processes and stop wasting time on waiting. Waiting is SO last century.