The 5 biggest disadvantages of data virtualization
April 29, 2019 | Maarten Urbach
Nowadays, data virtualization is portrayed as THE new development in DevOps world. Established names such as Delphix offer this “solution”. It promises developers and QA engineers to be able to quickly set up new, safe, test environments: something that’s very interesting in a world where we want to bring software to production faster than ever.
This “hype” of data virtualization ensures that we increasingly encounter customers who try to combine their test data management with data virtualization. To be able to make a good and well-considered choice in this regard, we believe it is important to also highlight the other side of the coin and to point out the disadvantages of data virtualization.
Despite the fact that significant time is saved in setting up “test” environments using “virtual databases” (VDB), a lot of time is still lost. Most of the time QA engineers are busy with finding and preparing test data. Searching in a large database of several million if not billions of records is not going to shorten this time. Recent research shows that we spend 46% of our time searching and finding test data. And what actually happens under water is that a 100% copy production is put down and this is then delivered to the various teams with VDBs.
Deltas are another interesting USP of data virtualization. The deltas in the VDBs are quite interesting, but the deltas to the original source becomes a lot more complicated. Because with data virtualization you have to put a copy production once on a server of the data virtualization supplier. But with rapid development to production, the production data changes with some regularity. Many suppliers of data virtualization, such as Delphix, indicate that by keeping track of deltas at the source, they can update the “copy production”. But that could turn out to be very problematic. We do not know of any organizations that simply allow a connection from test to production. In short: this will not work. And what a huge time loser, these refresh frequencies. This sometimes takes weeks. So you are going to lose this time when refreshing and that hurts pretty much.
A complete copy of production is ultimately deposited on the “data virtualization” server, although it is compressed, and therefore you save on storage. After this has taken place, 1, 2, 3 or up to 10 teams start working with a so-called VDB. All these teams start working on the VDB and query the server. The amount and severity of these processes (depending on the amount of QA (teams)) will generate particularly high network traffic, with all possible costs.
4. Single point of failure
What happens when the Delphix Server is down? Then all VDBs are down and no one can do their job anymore! In short: there are only a few environments that you can fall back on. This is quite a risk.
5. Batch processing
If you want to run large batch processes for a large database on a “data virtualization” server, this will lead to major conflicts with your colleagues. The virtual database of your team members will not like this…
DATPROF’s test data availability solution
Another way of data virtualization, such as DATPROF offers, works as follows: we use subset technology to realize small, filled databases from a source environment. To achieve this, we first need a source environment. The source can be a production environment, but more often this is a copy production, such as a fall back or an acceptance environment. When this source is found, we will extract test data from it and place it in test data environment. This sounds very simple, but of course this method of test data generation also has its challenges.
It takes time to realize a first subset. Our experience figures show that this varies from 15 minutes to sometimes a few hours. Sometimes 15 hours or more. Fortunately, in many cases we can still considerably reduce the number of hours. Then this subset gives you a huge advantage: you can restore a 200 GB subsetted test database (instead of the original 20+ TB) in a few minutes. This is possible by using backups and recovery processes or snapshotting. This therefore saves you an enormous amount of time and you have the right test cases in it.
Unknown data model
To make a good subset, foreign keys are needed. We regularly come across applications where these keys are not stored in the data model itself. They are then stored in an application table or they are not saved at all. However, they are necessary for the successful realization of a subset. What we do then is that we import the foreign keys, for example if they are stored in an application table. We can also use our foreign key discovery module. They are then known in DATPROF Subset and we can realize a subset.
Which solution do you prefer?
In summary, both solutions have their advantages and disadvantages. We hope that this comparative article is helpful in the search for the right solution! If you want to know more, don’t hesitate to contact us.