Test data generation – the basics

This describes the basics of subsetting. What is subsetting? Where should you start? What things should you keep in mind?


In essence subsetting is copying a part of the data of one database (source) in to another database (target). Therefore, you need a source to provide the data. This is typically a production database or a golden copy; a bigger dataset. The target database is typically a development or test environment.

DATPROF Subset then needs a classification of tables. To do this DATPROF Subset will import metadata; the table definitions and foreign keys. The classification determines what the tool will do with these tables. Possible classifications are Full, Subset, Empty and Unused. Full tables will be copied entirely, Subset tables will be subsetted, Empty tables will be left empty and Unused will remain untouched. So, you have to determine what to do with the tables. What will be your start table? The Start table is the beginning of your subset. This table will be filtered using a startfilter. This filter determines what data will be part of the subset. Normally this table contains functional relevant data, i.e. person data or insurance policy.

To give you some direction for the classification of tables:

  • Full: these are typically domain tables and such
  • Subset: these are mostly tables containing transactional or process data
  • Empty: these tables mostly contain logging or the data is not necessary in the target environment
  • Unused: tables containing environment specific data fall into this category, i.e. user tables

Based upon the classification DATPROF Subset will generate a subset process. This process can be visualized as a process model. This model shows the order in which the tables will be subsetted. This also gives insight to fix any errors you might have made in classifying the tables.

Having done the above you are ready to start the first subset-run. DATPROF Subset will now start copying data. First the target environment will be truncated. Then the copying of data will commence. The start table will be filtered an subsequently the next tables in the process will filtered based on the data in the first table.





Don't miss anything

Signup for our newsletter