Cloud Test Data Management
Lots of organizations are investigating the endless possibilities of cloud computing. For almost every letter of the alphabet it seems there is an “as a Service” variant available! From storage, hardware, databases, networking to software, almost everything can be bought from platforms like Microsoft Azure and Amazon AWS. Within minutes you can spin up new fast and large virtual machines and destroy them even faster, but how can you make this affordable and how do you manage your test data in the cloud? Before we go into that, let’s go over the terminology first.
Working in the cloud is sometimes a bit different than on-premise solutions. In case of test data management the reason for these differences is that you can always connect to on premise databases, but with ‘the cloud’ that’s not always the case.
In general, we distinguish two different types of ‘cloud’:
- A database in the cloud
- SAAS, cloud or hosted applications
Databases in the cloud
If it is only a database deployed in the cloud, the test data can be managed by DATPROF. We currently support MySQL, SQL, Oracle, Postgress, MariaDB, EDB, DB2 navitely and we support AzureSQL, and Aurora. These databases can be managed by the DATPROF toolset. The main difference between cloud and on premise databases is that the connection string is different.
SAAS, Cloud or hosted applications
It gets more interesting when you think about applications in the cloud or a SAAS application. From a test data management perspective, we see two different strategies of hosted/SAAS applications:
- Some applications are available in the cloud and you can still connect to the database under the application
- The application is available in the cloud, but you cannot connect to the database (e.g. Salesforce)
Both these examples of applications are mentioned as SAAS applications by software vendors. We differentiate between these two examples. Because if you can connect to the database, DATPROF can manage your test data. In this case from our perspective, there is not much difference between a database in the cloud or an application in the cloud.
DATPROF Supported cloud databases
The second example about Cloud or SAAS applications is more challenging. The reason is that our preferred approach is to connect directly to databases to manage the test data. This is not allowed with these suppliers. But there are some alternative routes. What we can do is extract the data to another database, run test data processes (subsetting and/or masking) and inject the data back into the database. With this process it is also possible to manage your test data in these environments as well.
Cloud data expensive?
Migrating large organizations with on premise systems to the cloud can be quite difficult. Even investigating the different pricing options of the various instance types is not easy. Do you need a D2sV3 or E2sV3 and how does it compare to a r5d.large? In Azure alone there are more than 200 different instance configurations available. Broadly speaking, the pricing is dependent upon the number of CPU cores, memory and max disk capacity. Storage costs typically depend on the size, type (SSD or HDD) and the replication options you select.
When you are moving to AWS or Azure it’s not only your production data that goes into the cloud, but also your test and development environments. Most organizations that migrate to the cloud are already using Agile/DevOps methodologies. This ‘new’ way of working requires a different approach to handling test data. Each team should have their own test data and should not depend on other teams that use the same test data. The cloud can make available as many environments as you require, but even in the cloud these copies of production instances are extremely costly. Maybe you don’t require expensive disaster or replication features, but the overall configuration should probably match the production specification as closely as possible.
The right amount of test data
Fortunately working with full size test data copies is a thing of the past. Nowadays we can extract specific subsets, even anonymized, from large complex production databases for testing, development and training.
The real benefit of working with compact, good quality and safe test data is that you can truly unlock the full potential of the cloud. Working with the right amount of test data can easily save thousands of dollars since your development and test instances require less CPU power, less memory and less storage. Your Dev and QA teams become more efficient by controlling their own test data and can test more in less time. This leads to better quality software that can be released faster for the ultimate benefit of the business.
Test data subsetting is extracting a smaller sized – referential integer set of data from a ‘production’ database to a non-production environment.