15 April, 2016 by Harald Kikkers

DATPROF makes synthetic test data available

DATPROF now also creates synthetic test data

DATPROF Subset and DATPROF Privacy have enabled our users to generate reduced and anonymous test sets from production data for several years now. In some cases it is desirable to fill certain parts of test sets with synthetic data, in other words, data that is not derived from production data. It is fictional or invented data, in our case generated by a rules-based, automatic process.

This page first describes how test sets are generated from production databases. Consequently, the generation of synthetic test data is described,  followed by descriptions and download links for a number of free to use download files with synthetic data (at your own risk).

Generation of anonymous test sets from production databases.

With DATPROF Subset and DATPROF Privacy, presentable test sets can be generated from production databases in a structured and controlled manner. (Please refer to the diagram below for a possible architecture for generating anonymous test sets from a production database).

Test data architectuur subseting and masking

In certain cases, synthetic data is needed to supplement or fill the test sets or to replace sensitive data in test sets that are deduced from production databases.

Generation of synthetic data

DATPROF Integrate is used for automatic generation of the synthetic data. Specific meta-data templates contain the desired data structures and rules the synthetic data must comply with. These templates can be customized and expanded.

The generation can be fed with raw text files, such as lists with random names, last names, street names, place names, etc. Parameters can be used to select for numbers and divisions (refer to the diagram below for the generation of synthetic data).

Synthethische test data

Fictional medical institute

The download files published on this site are generated with DATPROF Integrate. The environment included two databases of a fictional medical institute. One database (PAS – Patient Planning System) contains the planning for patients and the other database (HCS – Health Care System) contains the patient status and the results of the treatment (see screen shot below with the data structures).

Synthetic test data

The download files

You can download two zip files with the following hyperlinks. The terms and conditions at the bottom of this page apply to the use of these files.

The first zip file contains the following:

  • txt (approx. 10,000 fictional name and address records)
  • ctl (Oracle CTL loader-template)
  • cre (Oracle CREATE TABLE template

Download link PAS_PAT_PATIENT: PAS_PAT_PATIENT.ZIP 

The second zip file contains the following:

  • txt (ca. 45.000 fictional care records)
  • ctl (Oracle CTL loader-template)
  • cre (Oracle CREATE TABLE template)

Download link HCS_CAR_CARE_RECORD: HCS_CAR_CARE_RECORD.ZIP

Terms and Conditions for the use of download files with synthetic data

The files with synthetic data DATPROF provides via download links on this page may be used freely. The use of the files is entirely at the risk of user. The data may contain values or combinations of values that may also occur in reality. DATPROF accepts no liability for any damage arising from the use of this data in any way or form.

Written by Harald Kikkers

Created and bootstrapped DATPROF as an entrepreneur. Always searching for new techniques and concepts to improve the use of data. Started playing the guitar, into BBQing and loving Denmark for many reasons.

Leave your reply

  • We won't publish your e-mail address, we only use it for verification purposes.

More blogs

Don't miss anything

Signup for our newsletter