DATPROF makes synthetic test data available

DATPROF now also creates synthetic test data

15 APRIL, 2016 – HARALD KIKKERS

DATPROF Subset and DATPROF Privacy have enabled our users to generate reduced and anonymous test sets from production data for several years now. In some cases it is desirable to fill certain parts of test sets with synthetic data, in other words, data that is not derived from production data. It is fictional or invented data, in our case generated by a rules-based, automatic process.

This page first describes how test sets are generated from production databases. Consequently, the generation of synthetic test data is described,  followed by descriptions and download links for a number of free to use download files with synthetic data (at your own risk).

Generation of anonymous test sets from production databases

With DATPROF Subset and DATPROF Privacy, presentable test sets can be generated from production databases in a structured and controlled manner. (Please refer to the diagram below for a possible architecture for generating anonymous test sets from a production database).

In certain cases, synthetic data is needed to supplement or fill the test sets or to replace sensitive data in test sets that are deduced from production databases.

Generation of synthetic data

DATPROF Integrate is used for automatic generation of the synthetic data. Specific meta-data templates contain the desired data structures and rules the synthetic data must comply with. These templates can be customized and expanded.

The generation can be fed with raw text files, such as lists with random names, last names, street names, place names, etc. Parameters can be used to select for numbers and divisions (refer to the diagram below for the generation of synthetic data).

Fictional medical institute

The download files published on this site are generated with DATPROF Integrate. The environment included two databases of a fictional medical institute. One database (PAS – Patient Planning System) contains the planning for patients and the other database (HCS – Health Care System) contains the patient status and the results of the treatment (see screen shot below with the data structures).

The download files

You can download two zip files with the following hyperlinks. The terms and conditions at the bottom of this page apply to the use of these files.

The first zip file contains the following:

  • txt (approx. 10,000 fictional name and address records)
  • ctl (Oracle CTL loader-template)
  • cre (Oracle CREATE TABLE template

Download link PAS_PAT_PATIENT: PAS_PAT_PATIENT.ZIP 

The second zip file contains the following:

  • txt (ca. 45.000 fictional care records)
  • ctl (Oracle CTL loader-template)
  • cre (Oracle CREATE TABLE template)

Download link HCS_CAR_CARE_RECORD: HCS_CAR_CARE_RECORD.ZIP

Terms and Conditions for the use of download files with synthetic data

The files with synthetic data DATPROF provides via download links on this page may be used freely. The use of the files is entirely at the risk of user. The data may contain values or combinations of values that may also occur in reality. DATPROF accepts no liability for any damage arising from the use of this data in any way or form.

Subscribe to our newsletter

Recieve free updates on new blogs, webinars and tutorials

Let us know how to reach you. We keep you updated on the latest developments concerning test data, test data management, subsetting and masking. You can unsubscribe at any time.

Data Masking

DATPROF Privacy

Data Subsetting

DATPROF Subset

Data Provisioning

DATPROF Runtime

Data Discovery

DATPROF Analyze