How to save thousands of dollars on test data storage
April 24, 2020 | Nynke Hogeveen
Many organizations have a lot of data stored in their non-production environments. This will cost them unnecessary amounts of money if it is test data in the cloud or on-prem. The easiest way to save the most money is to store less data in your non-production environments. That may sound impossible. You need all this data in your development, testing and acceptance databases, right? Wrong!
“The easiest way to save the most money is to store less data in your non-production environments.”
High storage in DTAP
Typical situations look like what’s visualized in picture 1. The production database is copied to development, testing and acceptance databases. This method is often preferred because it is an easy method. Another frequently heard argument for this approach is that only full copies of production contain all test cases.
Picture 1. Using copies of production in non-production environments
The problem with this method is that non-production environments are growing 3 times as fast as production. If your production database is growing with 1TB, your non-production is growing with at least 3TB. There’s not much you can do about the size of your production data. That’s why you have to manage and decrease the data in non-production environments to save money.
Subsets of production
Instead of using copies of production for development, testing and acceptance, you can give every team a test data subset of, for example, 10% of the production data. Data subsetting means extracting smaller sized, referential intact, sets of data from a production database or a so called ‘master test data set’ to a non-production environment.
Masked test data
If your data contains Personally Identifiable Information (PII), you are obliged to mask this data before you may use it for development and testing to comply with privacy rules and regulations like GDPR, PCI and HIPAA. That’s why we typically see that organizations make a copy of production, mask this copy and use this masked data set as ‘Master Test Data’.
Also read: Data Masking
Picture 2. Mask and subset test data to save on storage
Picture 2 shows how you can transform the typical method in picture 1 into a money-saving approach, also respecting privacy rules and regulations. Only one full copy of production is made. This full copy is being masked and used as ‘Master Test Data’. From this masked data set, small subsets are extracted to development, testing and acceptance databases.
Imagine you are using 10% subsets for development, testing and acceptance. In this situation you’d only need 130% (one full copy and 3 subsets of 10%) of the production data in your non-production environments instead of 300%. So if you have 50TB in production, you’d now have 65TB in non-production instead of 150TB. Do the math…
Specific test cases
You may think: “Great, these subsets, but what about my specific test cases?” Well, creating subsets is not just picking some random data out of production. With the help of DATPROF Subset and its patented algorithm, all relationships within the schemas are preserved. You and your dev/test teams decide which test cases you need for a certain test and the software does the rest. At the click of a button you have your own referential integer subset.
Also read: How to subset test data?
Shorten the time-to-market
For even more speed and convenience in your CI/CD pipeline and a shorter time-to-market, you can automate this entire subsetting process and give every dev or test team access to their own (masked) subset. They can refresh their own test data set whenever they want without bothering other teams or waiting for the DBA’er to make the refresh. Much shorter waiting times, no teams that bother each other: in addition to saving a lot of money with subsets, you also get happy test and dev teams!
Start your 14 day free trial
Join the growing amount of DATPROF users by subsetting your non production databases with DATPROF Subset. No credit card required.