At DATPROF, our development team keeps a close eye on new and emerging technologies—always looking for ways to make test data provisioning faster, more compact, and more secure.
One of the most talked-about technologies right now is AI. It’s a powerful, all-encompassing technology that promises to reshape industries at an incredible pace.
The question on everyone’s mind is:’ how exactly will it impact my field?’
Within the test data community, one of the questions that is surfacing is: is AI the best way to generate test data?
In this article, I’ll attempt to contribute to the answer in the following paragraphs:

Bert Nienhuis – Chief Product Officer
The most ambitious uses of AI in test data generation
There are several ways AI is being applied in the world of test data. In my opinion, the two most ambitious approaches are:
- Training an AI model on production data and using that model to generate test data.
- Using generative AI models—such as large language models—to directly generate synthetic test data.

Current best practices in AI for test data
In this article, I’ll focus on the first use case: training a model on production data and using it to generate test data. I’ve based this exploration on publicly available documentation from two prominent vendors offering this solution.

Today, several companies provide tools that claim to train and generate “tabular data.” But are these solutions truly ready for enterprise-level use? Or even practical at all?
Based on public benchmarks, we can start to understand how feasible and scalable these solutions really are. I’ve chosen not to name the companies directly so we can focus on the content of the examples:
Case 1:
One provider allows you to train a model on production data.
In one test, training two tables—one with 5,000 rows and another linked tablewith 1,037,854 rows—took 15 hours using 64 CPUs and 256 GB of RAM.
When scaled down to 12 CPUs and 128 GB of RAM, the training time ballooned to 90 hours.
Case 2:
Another vendor provides benchmarks for various AI models across datasets of different sizes. Under the “Large Datasets” category, they report:
- A 743MB file with 4.9 million rows and 42 columns took 6 hours to train.
- A 154MB file with 1.4 million rows and 15 columns required 3 hours.
- A 311MB file with 27,000 rows and 1,300 columns took 26 hours.
These figures reflect only the training time. Data generation time would be additional—though likely faster, it still adds overhead.
The verdict: there’s still a long way to go
So, will AI revolutionize test data management? Based on what I’ve seen so far, we’re not there yet. At this point, I wouldn’t say AI is the best way to generate test data.
For now, AI’s role in test data management is more supportive than foundational.
For example, with DATPROF Privacy, synthetic test data can be generated directly in the database based on specific requirements and rules. In a recent benchmark on average hardware, we generated 100 million rows for an Oracle table with five columns in just 17 minutes…
Keep in mind, most enterprise environments involve multiple production systems, large databases, and thousands of tables. While AI-generated data might offer value for smaller or niche datasets, it’s not yet scalable enough to replace established methods like data masking, subsetting, or rule-based generation.
For now, AI’s role in test data management is more supportive than foundational. It can be useful for tackling specific challenges—like analyzing small, complex datasets or accelerating parts of test data workflows—but in my opinion I would not advise to replace the core techniques that enterprises rely on.
Interesting sources
- Abhaya. (2024, 14 november). AI-Driven Test Automation: A Comprehensive Guide to Strategically Scaling for Large Applications. Medium. https://medium.com/%40abhaykhs/ai-driven-test-automation-a-comprehensive-guide-to-strategically-scaling-for-large-applications-50e727125f8b
About Bert
I write for test managers and test teams about new developments in the test data industry. Want to stay updated?
Hit the subscribe button 👉