Software > Software-News > Scalable data redaction: Data protection-compliant redaction of sensitive documents!

Scalable data redaction: Data protection-compliant redaction of sensitive documents!


IRI DarkShield finds and protects PII in structured and unstructured data, even in dark data!

With the exponential growth of data volumes in areas such as research, software testing, data science, advanced analytics and artificial intelligence, the vulnerability to data breaches is also increasing. Today, companies process billions of data records, extensive log and event data, and huge archives of unstructured documents. Traditional data protection measures, manual redaction processes, or selective masking solutions are neither scalable nor operationally manageable for these requirements. Data protection in big data environments is therefore primarily a technical scaling and integration issue.

IRI DarkShield was developed precisely for this context and represents a high-performance redaction and data masking engine for large, heterogeneous data landscapes. The focus is on the automated, reproducible and rule-based removal or masking of sensitive content from structured, semi-structured and unstructured data – regardless of format, source or storage location.

Technical detection and classification of sensitive data!

At its core, DarkShield combines several detection technologies to achieve a high hit rate with a low false positive rate. These include:

  1. Rule-based pattern matching and regular expressions
  2. Configurable dictionaries and reference lists
  3. NLP methods for context-sensitive identification of personal data
  4. Machine learning-based methods for recognising complex or variable data patterns

These technologies enable the identification of sensitive content such as PII, PHI, financial data or proprietary information, even in large, unstructured text and document collections. Recognition is designed to be independent of language, data source or file type and can be expanded to suit specific organisational requirements.

Scalable processing and performance: DarkShield is designed for high data volumes and parallel processing. Optimised scan engines, multi-threading and horizontal scaling across multiple computing nodes enable even very large data sets to be processed efficiently. The architecture supports load balancing scenarios via REST and Java APIs and can be operated in distributed environments such as Hadoop or cloud infrastructures.

This technical scalability is crucial to prevent data protection measures from becoming a bottleneck in ETL, archiving or analysis processes. At the same time, processing remains deterministic and reproducible – a key aspect for audits and regulatory compliance.

Broad format and source support: A key technical feature of DarkShield is its comprehensive support for a wide variety of data formats. These include relational and file-based structures as well as modern big data formats and classic Office documents, including:

  1. Structured and semi-structured formats such as Parquet, JSON, XML, CSV, EDI
  2. Unstructured content such as PDFs, Word and Excel documents, log files
  3. Image and scan formats (e.g. TIFF, scanned PDFs) using OCR

Processing can be carried out across local file systems, cloud storage (e.g. S3, Azure), Hadoop environments or hybrid architectures.

Rule-based masking and redaction logic!

DarkShield allows fine-grained control of masking and redaction logic. Organisations can define:

  1. Which data types are to be redacted
  2. How masking is performed (e.g. replacement, tokenisation, blacking out)
  3. Whether data is processed irreversibly or contextually
  4. How different sets of rules are applied depending on the target system, purpose or user group

These sets of rules are versionable, reusable and can be rolled out consistently across different systems and processes.

Automation, orchestration and integration!

DarkShield offers extensive automation and integration options for productive use. Using the integrated scheduler in IRI Workbench or CLI and API interfaces, editing can be integrated as an integral part of:

  1. ETL and ELT pipelines
  2. Backup and archiving processes
  3. DevOps and CI/CD workflows
  4. Data science and AI training pipelines

This makes data protection a continuous, technically controlled process rather than a one-off manual measure.

Compliance, auditability and governance: In addition to performance and scalability, DarkShield explicitly addresses governance and compliance requirements. Audit trails, rule-based processing and reproducible results support compliance with data protection laws such as GDPR, HIPAA, CCPA and other international regulations. At the same time, the usability of the data for analysis, testing and AI purposes is maintained.

Strategic significance: Overall, IRI DarkShield positions itself not as an isolated editing tool, but as a technical platform for scalable data protection in data-driven organisations. Through a combination of powerful detection, high processing speed, broad format support, automation and deep integration into existing data architectures, DarkShield enables the secure, compliance-compliant use of large amounts of data – even in complex big data, analytics and AI scenarios.

Efficiency meets experience: For more than four decades, our software solutions have been supporting companies in data management and data protection – technologically leading, reliable in productive use and applicable across all industries.

In use since 1978: Numerous well-known companies, service providers, financial institutions and state and federal authorities are among our long-standing customers.

Maximum compatibility: Our software supports both classic mainframe platforms (Fujitsu BS2000/OSD, IBM z/OS, z/VSE, z/Linux) and modern open system environments such as Linux, UNIX derivatives and Windows.

Source: JET-Software GmbH
Press release from 20 Jan. 2026 about the software DarkShield
DarkShield
Links and contact:
Video appointment
request
Online demonstration
request meeting
Information
directly to the product website
Software exposé
request URL