We’re thrilled to unveil a major new capability in DATPROF Runtime 4.14: file masking data masking, starting with support for Parquet files.
As organizations continue to move toward modern, distributed data architectures, securing data across diverse sources becomes more critical than ever. This release bridges the gap between structured databases and the ever-growing world of file masking by extending our powerful masking engine to file formats. Starting with Parquet.

Bert Nienhuis – Chief Product Officer
1) Deterministic & Conditional Masking
A common use case: only mask phone numbers for users outside of the EU, or redact emails only if the opt-in flag is false. This is now easily configurable using masking conditions in the UI.
- Deterministic Masking: Keep data consistent across systems. For example, a masked first name in a database will match its counterpart in a file.
- Conditional Masking: Apply complex rules to mask data based on dynamic conditions or business logic.
2) Initial format support: Parquet
We’re starting strong with Parquet, a widely adopted columnar storage format. CSV, JSON, and other formats are coming soon! Parquet was chosen first due to its growing popularity in data lakes, analytics pipelines, and cloud-native ecosystems. Its columnar format also allows highly efficient read/write operations, making it ideal for masking large datasets.
3) High-performance file processing
Our new file masking engine is designed to handle gigabytes of data in minutes. Whether you’re working with large Parquet datasets or entire directories, processing is fast, scalable, and reliable.
4) Pattern-based file selection
Define filename patterns (e.g., customer_*.parquet) to apply masking templates to multiple files at once, simplifying bulk data operations. For instance, if your data lake writes daily snapshots as sales_20250601.parquet, sales_20250602.parquet, etc., a single pattern like sales_*.parquet lets you mask all of them without manual selection.
5) Translation files
Maintain data consistency across multiple systems using translation files, enabling deterministic masking even when datasets are processed separately.
Example:
Original, Masked
12345,AB123
67890,CD456
These files ensure the same original values are always masked the same way across systems, useful in test environments where relational consistency matters.
6) Workflow integration for DevOps
You can now embed file masking seamlessly into CI/CD pipelines or Workflow, ensuring that test data is masked automatically during deployment workflows. This makes it easy to maintain compliance and automation in modern
DevOps practices. Use Runtime CLI or REST APIs to integrate masking tasks into your GitLab, Jenkins, or Azure DevOps pipelines.
7) Extensive function library & custom logic
Access 40+ built-in masking functions, from simple replacements to advanced data generators. And if that’s not enough, you can write custom expressions using SQL syntax to create highly specific masking rules. Custom expressions example: concat(substr(FIRST_NAME, 1, 1), ‘.’, LAST_NAME, ‘@datprof.com’).
Why this matters?
The ability to mask files, starting with Parquet, directly within DATPROF Runtime opens the door to fully integrated, format-agnostic data protection. Whether you’re preparing test data, managing a data lake, or working in a hybrid environment, this release is a great step toward secure, compliant, and efficient test data management.

Start masking your Parquet files today and get ready for even broader file
format support coming soon!