How to subset test data

Generate test data with the help of subsets

What is data subsetting?

Subsetting is copying a part of the data from one database (source) in to another database (target). Therefore, you need a source to provide the data. This is typically a production database or a golden copy; a bigger dataset. The target database is typically a development or test environment.

Next you’ll need a classification of tables. To do this, DATPROF Subset will import metadata; the table definitions and foreign keys. The classification determines what the tool will do with these tables. Possible classifications are Full, Subset, Empty and Unused. Full tables will be copied entirely, subset tables will be subsetted, empty tables will be left empty and unused tables will remain untouched.

Classification of tables

  • Full: these are typically domain tables and such
  • Subset: these are mostly tables containing transactional or process data
  • Empty: these tables mostly contain logging or the data is not necessary in the target environment
  • Unused: tables containing environment specific data fall into this category, i.e. user tables

So you have to determine what to do with the tables. What will be your start table? The start table is the beginning of your subset. This table will be filtered using a startfilter. This filter determines what data will be part of the subset. Normally this table contains functional relevant data, i.e. personal data or insurance policy.

Subset process

Based upon the classification DATPROF Subset will generate a subset process. This process can be visualized as a process model. This model shows the order in which the tables will be subsetted. This also gives insight to fix any errors you might have made in classifying the tables.

Having done the above you are ready to start the first subset-run. DATPROF Subset will now start copying data. First the target environment will be truncated. Then the copying of data will commence. The start table will be filtered an subsequently the next tables in the process will filtered based on the data in the first table.

Process and data model

DATPROF Subset can visualize the data model and a process model. The data model is generated by the metadata or manual adding FK’s. The colors of the tables depend on the classification of the tables.

A path is determined through the database based on the DATPROF Subset algorithm and the classifications of the tables. This will create a process model. The visualization shows the subset sequence. The visualization gives an insight of the relationships between the tables and which table is responsible for filling another table.

How to subset your test data?

When using DATPROF Subset you will notice that it is possible to start a run in different scenarios. This tutorial explains the difference between those scenarios and will help you decide when to use which scenario.

(Re)Create

The recreate scenario simply copies the source tables and foreign keys to the target environment. In case a source table already exists, Subset will recreate this table. But keep in mind, this scenario does not add the triggers, constraints and indexes! All existing data in the target will be lost. This scenario is especially interesting for simple ‘table only’ subsets which do not require application logic.

(Re)Fill

In case you have already created the DDL of the source environment in a target environment or you just have a clean target environment, the refill scenario enables you to empty (not delete!) the target environment and fill it with your subset. Using this scenario, DATPROF Subset expects the presence of the objects in the target database and in contradiction to the recreate scenario, leaves all the application logic such as triggers and constraints alone.

Append

The append scenario is usable to append new data to a table. This scenario does not take care of unique data and therefore duplicate data can be created. This may cause problems with application constraints but can especially be useful to ‘blow up’ your database to facilitate performance tests. This scenario is mostly used in situations where you would like to test whether a database architecture still performs when your existing data doubles.

Unique Append

The unique append scenario basically does the same as the normal Append scenario with one major difference; it does take care of uniqueness of data! Using the unique append only unique data will be added and existing data within the target database will simply be skipped. This scenario is often used to create a subset from different perspectives, for example a customer based subset and a product based subset. This scenario may also be used to add a specific test case to an already existing subset.

Conclusion

DATPROF Subset offers a variety of different scenarios to create a proper subset, choose your scenario carefully as the recreate might destroy the data which should have been appended. Part of our DATPROF Subset Training (which can be done on request) is an in-depth explanation of all the different scenarios with real world examples while also giving you the opportunity to experience the difference hands-on!

Book a demo

Schedule a product demonstration with one of our TDM experts.

Book a demo

"*" indicates required fields

TDM Platform

The right test data in the right place at the right time. Masked, generated, subsetted, virtualized and automated at the push of a button.