The 5 biggest disadvantages of data virtualization
April 29, 2019 | Maarten Urbach
Nowadays, data virtualization is portrayed as THE new development in DevOps world. Established names such as Delphix offer this “solution”. It promises developers and QA engineers to be able to quickly set up new, safe, test environments: something that’s very interesting in a world where we want to bring software to production faster than ever.
This big “hype” of data virtualization ensures that we increasingly encounter customers who try to combine their test data management with data virtualization, sometimes at any cost. To be able to make a good and well-considered choice in this regard, we believe it is important to also highlight the other side of the coin and to point out the disadvantages of data virtualization. We wrote this blog as a service to help our customers because while we encourage innovation, hypes don’t always have benefits for every industry and user. There are some real important terms you should keep in mind.
Despite the fact that significant time is saved in setting up “test” environments using “virtual databases” (VDB), a lot of time is still lost during the process. Most of the time QA engineers are busy to find and prepare test data. Searching in a large database of several million if not billions of records isn’t going to shorten this time. Recent research shows that we spend 46% of our time on analytics, searching and finding test data. And what actually happens under water in a standard business is that a 100% copy production is put down and this is then delivered to the various teams with VDBs.
Deltas are another interesting USP of data virtualization. The deltas in the VDBs are quite interesting, but the deltas to the original source becomes a lot more complicated. Because with data virtualization you have to put a copy of production once on a server of the data virtualization supplier. But with rapid development the production data changes with some regularity. Many suppliers of data virtualization, such as Delphix, indicate that by keeping track of deltas at the source, they can update the “production duplicate”. But that could turn out to be very problematic. We don’t know of any organizations that simply allow a connection from test to prod. In short: this will not work. And what a huge time loser, these refresh frequencies. This sometimes takes weeks. So you are going to lose this time when refreshing and that hurts pretty much.
A complete copy of production is ultimately deposited on the “data virtualization” server, although it is compressed, and therefore you save on storage. After this has taken place, 1, 2, 3 or up to 10 teams start working with a so-called VDB. All these teams start working on the VDB and query the server. The amount and severity of these different processes (depending on the amount of QA (teams)) will generate particularly high network traffic, with all possible costs.
4. Single point of failure
What happens with your business when the Delphix Server is down? Then all VDBs on your platform are down, providing access to the data sources is not possible anymore and no one can do their job! Your entire operational system is down, your test data management and data governance is gone. In short: there are only a few environments that you can fall back on for data. This is quite a risk for the delivery of your products.
5. Batch processing
If you want to run large batch processes for a large database on a “data virtualization” server, this will lead to major conflicts with your colleagues. The virtual database of your team members won’t like this…
DATPROF’S TEST DATA AVAILABILITY SOLUTION
Another way of data virtualization, such as DATPROF offers, works as follows: we use subset technology to realize small, filled databases from the source environment. To achieve this, we first need a source environment. The source can be a prod environment, but more often this is a production duplicate, such as a fall back or an acceptance environment. When this source is found, we will extract test data from it and place it in test data environment. This sounds very simple, but of course this method of test data generation has its challenges too.
It takes time to realize a first subset. Our experience figures show that this varies from 15 minutes to sometimes a few hours. Sometimes 15 hours or more. Fortunately, in many cases we can still considerably reduce the number of hours. Then this subset gives you a huge advantage: you can restore a 200 GB subsetted test database (instead of the original 20+ TB) in a few minutes. This is possible by using backups and recovery processes or snapshotting. This therefore saves you an enormous amount of time and you have the right test cases in it.
Unknown data model
To make a good subset, you need to know your data model and foreign keys are needed. We regularly come across applications where these keys are not stored in the data model itself. They are then stored in an application table or they aren’t saved at all. However, they are necessary for the successful realization of a subset. What we do then is that we import the foreign keys, for example if they are stored in an application table. We can also use our foreign key discovery module. They are then known in DATPROF Subset and we can realize a subset.
Which solution do you prefer?
In summary, both solutions have their advantages and disadvantages. We hope that this comparative article is helpful in the search for the right data management solution! If you want to know more, don’t hesitate to contact us.
What is data virtualization?
Data virtualization is an approach that can create and distribute virtual copies of data for different goals. The data itself is not copied; only changes of the data are saved.
What is the greatest advantage of data virtualization?
The data itself is not copied; only changes of the data are saved. This way hardware and infrastructure costs can be saved.
What is the greatest disadvantage of data virtualization?
A disadvantage of data virtualization is often the complexity of the implementation and still using all of the data instead of only the relevant data.
Start with better Test Data Management today
Extract small reusable subsets from large complex databases and speed up your testing.