Test data in the cloud

How to make it affordable

April 14, 2020 | Bert Nienhuis

Lots of organizations are investigating the endless possibilities of cloud computing. For almost every letter of the alphabet it seems there is an “as a Service” variant available! From storage, hardware, databases, networking to software, almost everything can be bought from platforms like Microsoft Azure and Amazon AWS. Within minutes you can spin up new fast and large virtual machines and destroy them even faster, but how can you make this affordable?

An expensive opportunity

Migrating large organizations with on premise systems to the cloud can be quite difficult. Even investigating the different pricing options of the various instance types is not easy. Do you need a D2sV3 or E2sV3 and how does it compare to a r5d.large? In Azure alone there are more than 200 different instance configurations available. Broadly speaking, the pricing is dependent upon the number of CPU cores, memory and max disk capacity. Storage costs typically depend on the size, type (SSD or HDD) and the replication options you select.

When you are moving to AWS or Azure it’s not only your production data that goes into the cloud, but also your test and development environments. Most organizations that migrate to the cloud are already using Agile/DevOps methodologies. This ‘new’ way of working requires a different approach to handling test data. Each team should have their own test data and should not depend on other teams that use the same test data. The cloud can make available as many environments as you require, but even in the cloud these copies of production instances are extremely costly. Maybe you don’t require expensive disaster or replication features, but the overall configuration should probably match the production specification as closely as possible.

Work with the right amount of test data

Fortunately working with full size test data copies is a thing of the past. Nowadays we can extract specific subsets, even anonymized, from large complex production databases for testing, development and training.

The real benefit of working with compact, good quality and safe test data is that you can truly unlock the full potential of the cloud. Working with the right amount of test data can easily save thousands of dollars since your development and test instances require less CPU power, less memory and less storage. Your Dev and QA teams become more efficient by controlling their own test data and can test more in less time. This leads to better quality software that can be released faster for the ultimate benefit of the business.

Data subsetting

Test data subsetting is extracting a smaller sized – referential integer set of data from a ‘production’ database to a non-production environment.

