eDiscovery data processing is the set of technical steps that transform raw electronically stored information (ESI) into a structured, reviewable dataset. It encompasses data collections, culling, deduplication, format normalization, and ingestion into a data processing platform or data processing software, with the goal of producing a defensible, cost-efficient document set for legal review or production.
In the context of litigation, regulatory response, or internal investigation, eDiscovery data processing refers to the workflow that takes collected ESI through a series of technical transformations before attorneys begin review. ESI, as defined under FRCP Rule 34(a), includes any information stored in any medium from which it can be obtained. Processing sits between data collection and document review in the Electronic Discovery Reference Model (EDRM) and is one of the highest-leverage points for controlling both cost and risk.
Processing is not a single action. It involves multiple coordinated steps, each of which affects the quality, completeness, and defensibility of the final production. Organizations that treat processing as an afterthought often face inflated review costs, missed documents, and challenges to their production methodology.
According to EDRM and Gartner research on information governance, the volume of enterprise data subject to legal hold has grown dramatically with the adoption of cloud collaboration tools, messaging platforms, and distributed work environments. The implications are direct:
Processing decisions made early in a matter compound throughout the lifecycle. A well-configured data processing platform establishes the foundation for everything that follows.
The table below maps the core stages of an eDiscovery workflow, the key activities at each stage, and the primary goal each stage serves.
| Stage | Key Activities | Primary Goal |
|---|---|---|
| Data Collection | Identify custodians, collect from endpoints, cloud apps, email, digital communications | Preserve and capture all potentially relevant ESI |
| Processing | Deduplication, filtering, format normalization, culling by date/custodian | Reduce volume; prepare data for review |
| Ingestion | Load into data processing platform or review tool with metadata intact | Enable search, tagging, and linear/predictive review |
| Review | Attorney review, privilege log, responsiveness decisions, TAR/CAL workflows | Identify responsive, privileged, and producible documents |
| Production | Bates numbering, redactions, load file generation, format conversion | Deliver compliant production set to requesting party |
Data collections should be scoped precisely before any technical collection begins. This requires identifying custodians, data sources (email servers, cloud storage, collaboration tools, endpoints), and relevant date ranges. Collection should be forensically sound where required, preserving metadata and ensuring chain of custody documentation.
A key consideration is the growing volume of digital communications data management. Platforms such as Slack, Microsoft Teams, Google Chat, and enterprise social tools generate significant volumes of potentially relevant ESI that must be handled by eDiscovery processing tools capable of preserving threading, reactions, attachments, and user context.
Once data is collected, processing begins with volume reduction. Common techniques include:
Understanding how these decisions affect your final population is critical. Key eDiscovery data processing metrics such as native file count, processed file count, exception rates, and deduplication ratios provide transparency and support defensibility.
Processed data must be converted into formats compatible with the review platform. This typically involves generating text extracted files, creating TIFF or PDF images where required, and building load files (DAT, OPT, or similar) with all associated metadata fields.
Metadata preservation is critical at this stage. Fields such as sent date, author, recipient, file path, and custodian assignment must be accurately mapped to review platform fields to support meaningful search and filtering.
With processed data loaded into the review environment, attorneys can apply filters, run keyword searches, apply technology-assisted review (TAR) or continuous active learning (CAL) workflows, and code documents for responsiveness and privilege. Production then involves applying Bates numbering, redacting privileged content, and delivering compliant load files to the requesting party.
A financial services firm receives a regulatory information request requiring production of all internal communications related to a trading desk over a 36-month period. The data processing team collects from email, Teams, and a legacy archiving system. After deduplication and date filtering, the review population is reduced by 60 percent before attorney review begins. Load files are generated to the regulator's specified format with required metadata fields.
A corporate compliance team conducts an internal investigation following an employee whistleblower complaint. Time is a factor. Using a configured data processing platform, the team collects from targeted custodians, applies keyword filters, and loads data for review within 48 hours. Processing logs are preserved as part of the investigation record.
A technology company faces class-action litigation. A significant portion of relevant ESI resides in Slack and Google Chat. The processing workflow must normalize these sources into reviewable format while preserving thread structure, user attribution, and timestamps. Digital communications data management capabilities within the processing platform ensure that reviewers see conversations in context rather than as fragmented individual messages.
Effective eDiscovery data processing requires the right combination of workflow design, technology, and governance discipline. Whether you are managing litigation response, regulatory inquiries, or internal investigations, the decisions made during processing determine the defensibility and efficiency of everything that follows.
If your organization is evaluating how to strengthen its processing workflows, connect with the Onna team to explore how Onna's platform supports end-to-end eDiscovery data processing, from data collections through production. You can also schedule a demo to see the platform in action.