Unlocking Cost Savings in Data Storage

Many organizations find themselves grappling with excessive data storage costs in their non-production environments. Whether it’s test data residing in the cloud or on-premises servers, these expenses can quickly add up. But what if we told you that the most effective way to save thousands of dollars is by storing less data in your non-production environments?

At first, it might sound counterintuitive. After all, you rely on this data for development, testing, and acceptance purposes, don’t you? The truth is, you can achieve significant cost savings without sacrificing the data you need.


High Storage Usage in DTAP Environments

Many organizations face a common challenge in their DTAP (Development, Testing, Acceptance, and Production) environments, which is visually represented in Picture 1. This challenge arises when the production database is replicated across all other stages of the DTAP pipeline. This approach is favored for several reasons.

1. Simplicity: Copying the production database to each subsequent stage is a straightforward and easy-to-implement method. It simplifies the deployment process.

2. Test Coverage: A commonly cited argument in favor of this approach is that only full copies of the production database contain all possible test cases. This ensures comprehensive testing of applications and systems.


However, there are trade-offs to consider with this approach. High storage usage is a significant concern, as maintaining full copies of the production database across multiple environments can be resource-intensive and expensive.

In Picture 1, you can see the visual representation of this data replication across DTAP stages.

Picture 1. Using copies of production in non-production environments

Addressing the Challenge of Rapid Data Growth in Non-Production Environments

One common challenge faced by organizations is the rapid growth of data in their non-production environments, which often outpaces the growth in their production databases. For instance, if your production database expands by 1TB, your non-production databases may balloon by at least 3TB. Managing this exponential growth is crucial for cost control.

The size of your production data is largely beyond your control. Therefore, the key to saving money lies in effectively managing and reducing data in your non-production environments.

To tackle the storage issue while maintaining comprehensive testing, organizations frequently employ data optimization techniques such as database virtualization and data subsetting. These techniques offer a practical approach to curbing storage requirements in non-production environments.

1. Database Virtualization: Database virtualization involves creating virtual or lightweight representations of databases in non-production environments. This approach significantly reduces the storage footprint while maintaining essential data for testing. It’s an effective way to strike a balance between test coverage and storage efficiency.

2. Data Subsetting: Data subsetting entails extracting only the necessary subsets of data required for specific testing scenarios. By focusing on relevant portions of the data, organizations can minimize storage requirements without compromising the quality of testing.

Implementing these techniques allows you to optimize your DTAP environments for cost-effectiveness while ensuring that testing remains thorough and comprehensive. By proactively managing non-production data growth, you can save valuable resources and maintain the agility of your development and testing processes.

How does subsetting work?

Instead of using copies of production for development, testing and acceptance, you can give every team a test data subset of, for example, 10% of the production data. Data subsetting means extracting smaller sized, referential intact, sets of data from a production database or a so called ‘master test data set’ to a non-production environment.

How does virtualization work?

Database virtualization works by creating virtual copies of your databases, which are distinct from the actual databases, providing users with a secure and isolated environment for various tasks.

Here’s how it works:

1. Virtual Copies: Database virtualization creates replicas of your databases that are separate from the original. These virtual copies mimic the structure and data of the source database, allowing users to interact with them as if they were working with the real database.

2. Isolation: Users can make changes, conduct tests, and run simulations on these virtual copies without affecting the source database. This isolation ensures that any modifications or experiments remain contained within the virtual environment.

3. Snapshotting: You can capture a snapshot of the data at a specific point in time. This snapshot represents a separate version of the database, frozen in time. It allows you to work with a static dataset for analysis or testing.

4. Rollback: If needed, you can easily roll back to a previous version of the virtual database snapshot. This is particularly useful when testing or experiments lead to unexpected results, and you want to return to a known state.

5. Efficiency: Virtual copies are typically much smaller in size than the original databases. This reduces storage requirements and enhances efficiency when working with these copies. Smaller datasets also improve performance during testing and development tasks.

Database virtualization streamlines various aspects of database management and testing by providing a flexible, safe, and efficient way to work with data without impacting the actual production database. It’s a valuable tool for database administrators and developers looking to optimize their workflows and ensure data integrity.

Mask your data, too!

If your data contains Personally Identifiable Information (PII), you are obliged to mask this data before you may use it for development and testing to comply with privacy rules and regulations like GDPR, PCI and HIPAA. That’s why we typically see that organizations make a copy of production, mask this copy and use this masked data set as ‘Master Test Data’.

Picture 2. Mask and subset test data to save on storage

Picture 2 shows how you can transform the typical method in picture 1 into a money-saving approach, also respecting privacy rules and regulations. Only one full copy of production is made. This full copy is being masked and used as ‘Master Test Data’. From this masked data set, small subsets are extracted to (virtual) development, testing and acceptance databases.

Imagine you are using 10% subsets for development, testing and acceptance. In this situation you’d only need 130% (one full copy and 3 subsets of 10%) of the production data in your non-production environments instead of 300%. So if you have 50TB in production, you’d now have 65TB in non-production instead of 150TB.

And then again, virtualize these subsets instead of copying them.

Do the math…

Specific test cases

You may think: “Great, these subsets, but what about my specific test cases?” Well, creating subsets is not just picking some random data out of production. With the help of DATPROF Subset and its patented algorithm, all relationships within the schemas are preserved. You and your dev/test teams decide which test cases you need for a certain test and the software does the rest. At the click of a button you have your own referential integer subset.

Shorten the time-to-market

To expedite your CI/CD pipeline and achieve a faster time-to-market, consider automating the subsetting process. This allows each development or testing team to access their personalized masked (and virtualized) subset, offering the flexibility to refresh their test data independently without relying on others or DBAs for data updates. The result? Reduced wait times and a harmonious work environment where teams operate seamlessly. In addition to substantial cost savings through subsetting, you’ll also enjoy the added benefit of content and motivated development and testing teams.

Book a demo

Schedule a product demonstration with one of our TDM experts.

Book a demo

"*" indicates required fields

TDM Platform

The right test data in the right place at the right time. Masked, generated, subsetted, virtualized and automated at the push of a button.