In this previous blog post, I discussed the concept of generating test data by training AI on production data. While it’s an intriguing technology, it’s important to recognize that generating entire databases at scale using AI is often a step too far—especially given the existence of far more efficient techniques. 

In recent years, Generative AI (GenAI) has rapidly evolved, using models like large language models (LLMs) to create synthetic data. But how suitable are these models for generating test data? In this article, I’ll explore the capabilities and limitations of GenAI in the context of test data generation. 

Bert Nienhuis – Chief Product Officer

Why use GenAI for test data? 

One of the biggest advantages of GenAI is the ability to generate data using natural language prompts. 

For example: 

“Generate 100 rows with ID, first name, last name, email, and birthdate in CSV format.” 

In about two minutes, you’ll receive a well-structured dataset like this: 

This ease of use makes GenAI an attractive tool. However, there are three key challenges to consider when using GenAI for large-scale test data generation: 

  • Performance at scale 
  • Cost at scale 
  • Prompt engineering complexity 

Three challenges

The first challange is that of performance at scale. Although LLMs are getting faster, generating large volumes of data remains relatively slow. For example, GPT-4o, one of the faster models, processes around 193 tokens per second. In our earlier example, each row is roughly 20 tokens. To generate 1 million rows would require about 20 million tokens, taking approximately 28 hours. 

Even if you self-host an LLM with powerful hardware—say 2000 tokens/second—generating 1 million rows would still take nearly 3 hours. 

In comparison, DATPROF Privacy generated 100 million synthetic rows in an Oracle database using mid-range hardware in just 17 minutes. That’s 100x more data in a fraction of the time. 

Cost at scale is the second challenge. Generating large datasets using commercial LLM APIs can become prohibitively expensive. 

  • Generating 1 million rows with 5 columns using GPT-4o: ~$200 
  • Using GPT-4.5 for the same: ~$2,000 
  • For a wider table (15 columns, ~50 tokens/row), generating 10 million rows: 
  • GPT-4o: ~$5,000 
  • GPT-4.5: ~$75,000 

Alternatively, running an LLM on-premise requires a major hardware investment. A server with 4× NVIDIA A100 80GB GPUs alone costs upwards of €60,000—just for the GPUs. 

And keep in mind: many organizations require far more than just a few million rows.

Another challenge is prompt engineering. LLMs are typically controlled via system and user prompts. A simple request like: 

“Generate 1,000 rows of customer data with name, address, and email.” 

 

…is easy enough. 

But as data requirements become more complex—think data types, formats, domain-specific rules, date ranges, dependencies between attributes—the prompts must become equally complex. You’ll find yourself encoding what is essentially a rule-based system into natural language, which quickly becomes inefficient and unmanageable. 

For structured, large-scale test data, you ideally want a system that explicitly defines the constraints your generated data must meet, rather than relying on an increasingly complicated prompt. 

How can GenAI effectively support enterprise test data management?

At scale, using LLMs alone to generate test data offers little benefit compared to traditional rule-based generators. It’s slower, costlier, and often less reliable. However, that doesn’t mean GenAI has no role to play. 

Imagine combining the intelligence and flexibility of GenAI with the speed, control and maintainability of rule based test data generators.

At DATPROF, we see GenAI as a productivity tool—an assistant that can streamline and simplify specific tasks for test data engineers. 

Imagine combining the best of both worlds: 

  • The intelligence and flexibility of GenAI 
  • The speed, control, and maintainability of rule-based test data generators 

That’s where the real value lies. 

For example, GenAI can be used to: 

  • Generate seed data like 100 unique Japanese first names, which are then used by a rule-based generator to create millions of customer records 
  • Write complex SQL filters to exclude specific records from anonymization 
  • Assist in setting up a data generation project 

In these scenarios, GenAI acts as a smart assistant—augmenting rather than replacing existing tools. 

My conclusion 

GenAI is not a silver bullet for test data generation—especially not at scale. But when used strategically, it can significantly enhance productivity and flexibility for test data professionals. 

Used in tandem with proven test data software, GenAI can help teams move faster, reduce manual effort, and improve test data quality—particularly for as part of a hybrid workflow. 

The future of test data management isn’t GenAI—it’s GenAI working alongside subsetting, masking and rule bases synthetic data generation. 

About Bert

I write for test managers and test teams about new developments in the test data industry. Want to stay updated?

Hit the subscribe button 👉

Newsletter (Bert)

First name(Required)
Last name

Thanks for reading, good luck and until next time 👋