The complete guide to modern eDiscovery

The evolution of eDiscovery over the years has been nothing short of transformative. From reams of paper stuffed into bankers' boxes, we quickly moved to data housed on hard drives, and then to internal servers. Fast forward less than twenty years, and more than half of all corporate data now lives in the cloud, dispersed across a multitude of collaboration, communication, and content applications (think: instant communication tools like Slack, document management platforms like Google Docs, and issue tracking and project management tools like Jira).

The role of eDiscovery has also broadened significantly in this time. It’s no longer merely a supporting actor in litigation but has become an indispensable player in regulatory activities, records management, data privacy, and information governance.

However, one challenge remains consistent: the overwhelming volume and diversity of data involved in eDiscovery. Even before the global surge in online activity triggered by the pandemic, organizations and legal professionals were wrestling with massive amounts of enterprise data spread across disparate locations — whether it was in file cabinets, traditional email servers, or on Jane from HR's desk.

Complicating the issue further, over 90% of all newly generated enterprise data is unstructured — lacking any predefined format or model — with this percentage growing by a staggering 55 to 65 percent annually.

Adding to the complexity, court decisions and rules surrounding this data create new precedents, prompting organizations to rethink their approach to discovery altogether and placing the responsibility on legal professionals to tackle the search, collection, analysis, review, and production of such data, all within the confines of tight budgets and limited headcount. A walk in the park, isn't it?

This guide aims not to teach you the basics of eDiscovery but to empower legal professionals like yourself to effectively navigate this ever-changing practice. The goal is to help you reduce costs, gain efficiency, and circumvent risks by emphasizing two fundamental principles of legal operations: optimizing processes and making smart investments in technology.

So, let’s begin.

Disrupting eDiscovery: Collaboration tools create new challenges

Traditionally, the process of discovery followed a simple trajectory, albeit time-consuming and laborious. To inspect an employee's files, you would directly comb through their file cabinet or hard drive. When it was time to gather that data, identifying the custodians and the document locations was straightforward.

However, everything changed in 2006 when the United States Supreme Court amended the Federal Rules of Civil Procedure (FRCP) to include electronic records. At the time, discoverable electronically stored information (ESI) primarily consisted of emails and electronic documents that were not too different from their paper counterparts. Little did we know the extent to which these electronic sources would grow and develop, not to mention how fast.

Although cloud computing was already on the rise before 2020, the onset of the coronavirus pandemic kicked its growth into overdrive. Organizations rapidly adopted cloud-based apps such as Slack, Zoom, and Microsoft Teams to maintain productivity in an increasingly remote work environment — a trend that continued even after the pandemic.

Today, the collaboration data landscape bears little resemblance to that of a few years ago. Large companies with over 2,000 employees now depend on an average of 211 apps, spanning everything from collaboration, communication, and content to cloud platforms, developer tools, and security tools. For context, this figure stood at 163 apps just four years ago.

While a diverse suite of best-of-breed tools promotes agile communication and collaboration, it also creates data silos. One team may use Zoom for video conferencing, while another prefers Google Meet. Some departments might rely on Google Drive for file sharing, others on Dropbox. Slack may be the messaging tool of choice for one team, while another opts for Teams. The list goes on.

Looking through this lens, it's not just that there's a vast amount of data spread across multiple repositories. This data is also being generated, shared, and modified at a staggering rate. A recent study by IDC projects that the Global DataSphere, which gauges the volume of new data created, captured, replicated, and consumed each year, will more than double between 2022 and 2026. This rapid growth intensifies the pressure on enterprise organizations to manage and safeguard data while finding ways to activate it for business purposes.

The added complexity of generative AI

We cannot ignore the impact of generative artificial intelligence (gen AI) on this matter. While AI has been a part of eDiscovery in the form of natural language processing and sentiment analysis for over a decade, the wider adoption of generative AI technologies, such as ChatGPT, is set to further expand the enterprise data pool.

Gen AI models can produce text, images, audio, and even video content, mimicking human-level creativity and realism. However, the issue of data ownership becomes more complex with gen AI. Traditional ownership hinges on the notion that a creator or entity creates something and then holds certain rights to it. But who owns data generated by an AI model? Is it the company that owns the AI software? The person or team using the software to generate data? Or possibly the entity that trained the AI model, given that its outputs result directly from its training?

While there are many questions still to be answered around the use of AI and the impact it will have on eDiscovery, all of this is just one piece of a larger puzzle. These technologies not only produce vast amounts of new data but also introduce a wide variety of data types, with the significant majority being unstructured.

The unstructured data challenge

Unlike structured data, which is highly organized and easily decipherable by machine learning algorithms, unstructured data — often dubbed ‘user-generated data’ — cannot be processed and analyzed via conventional data tools and methods.

This type of information is unique, in that it doesn’t conform to a predefined data model or organization. This lack of structure adds complexity to its processing and analysis, setting it apart from structured data.

Unstructured data sprawls across different types, such as:

1. Collaborative data

This category encompasses data produced by online collaboration platforms such as Slack or Microsoft Teams, productivity suites like Google Workspace, and project management tools like Jira or Zendesk. These applications facilitate coordination, resource sharing, progress tracking, and instant feedback, typically encompassing features like task management, file sharing, screen sharing, and version control.

2. Communication-driven data

This type stems from publicly accessible platforms such as Twitter, Facebook, and Instagram, email services like Gmail and Outlook, and communication tools like Slack, Teams, and Zoom. These platforms offer numerous channels for exchanging information, including images, emojis, GIFs, audio, video, and more.

3. Content-focused data

Data from applications centered on content creation, editing, organization, and sharing falls under this category. Examples include document editing tools like Google Docs, file storage and sharing services like Box and Dropbox, and wikis such as Confluence and Notion.

As mentioned earlier, the prevalence of unstructured data is rapidly increasing, accounting for over 90% of newly generated enterprise data and expanding at an annual rate of 55 to 65 percent. But what does this mean for eDiscovery?

Given that the average enterprise company uses roughly 211 apps, a substantial volume of unstructured data emerges that requires effective management. From Slack threads brimming with custom emojis and GIFs to Zoom meetings enriched with breakout rooms and digital whiteboards, wikis in Confluence, and even chat logs on ChatGPT, this unique data type creates hurdles for searching and analyzing. Manual assessment of such data becomes a strenuous and time-consuming endeavor, which can be impractical without implementing the right strategy.

While all of this data doesn't change the actual process of eDiscovery, it does change the approach. Why? Because in a world increasingly dominated by digital information and collaboration technologies, discovery can challenge almost any organization's business processes.

What is modern eDiscovery?

While "modern eDiscovery" might suggest something new, it's worth noting that this concept has been maturing alongside the pulse of technology for years now. Just as changing aesthetics redefine "modern home decor" (we’re looking at you, shag rugs), "modern eDiscovery" has undergone its own set of changes, consistently keeping pace with tech advancements and the relentless growth of data.

You can think of modern eDiscovery as an enhancement of conventional eDiscovery processes, achieved by incorporating advanced technologies like artificial intelligence (AI), machine learning (ML), and natural language processing (NLP). Its goal is to refine the discovery process, improve accuracy, and minimize expenses, as these technologies become critical in handling the rapidly expanding volume of electronic data.

However, to succeed as a litigator in today's climate, it's essential to also understand the data sources where evidence resides, including the various data types and metadata within each source.

But before we delve into the application of modern eDiscovery, let's examine how these new data types influence court proceedings and gain a deeper understanding of the current state of the courts.

The verdict on modern eDiscovery: 5 insights from the courts

Courts are increasingly encountering cases where incomplete data sets result from the lack of proper data collection or processing, highlighting the fact that the quality of a review cycle depends solely on the data set used. In other words, "garbage in, garbage out." Here are a few of the most notable cases:

Red Wolf Energy Trading, LLC v. BIA Capital Mgmt.

Courts have increasingly called out and penalized attorneys for failing to properly preserve and produce ESI during eDiscovery. For example, the court in Red Wolf Energy Trading, LLC v. Bia Capital Management, LLC issued case-terminating sanctions against the defendant for delaying and failing to produce key documents, including numerous Slack messages containing “smoking gun” communications.

Takeaway: This case emphasizes the importance of data governance in litigation. Red Wolf had entrusted the storage of their data to a third party. However, their lack of control and governance over this data resulted in significant delays and failure in the production of critical documents during the discovery phase. The court's reaction underscores that control and custody of data cannot be delegated, even if it is stored by a third party. A more stringent data governance process, including monitoring of the data storage and regular checks to ensure the data is easily retrievable, would have mitigated this issue.

DR Distribs., LLC v. 21 Century Smoking, Inc.

The court in DR Distributors, LLC v. 21 Century Smoking, Inc. sanctioned the defendants and former defense counsel $2.5 million for a series of missteps, including failing to arrange for the collection of relevant and responsive web email, neglecting to ensure that auto-delete was suspended on email accounts, and for retaining an eDiscovery vendor that had been previously accused of incompetence.

Takeaway: This case highlights the importance of early planning and due diligence in eDiscovery. The defendants in this case failed to collect relevant information, allowed automatic deletion of potentially important emails, and hired a questionable eDiscovery vendor. A better understanding of their ESI, information governance, record retention, and legal hold processes from the onset would have avoided these missteps. Additionally, audits and stringent checks on vendor competence, data collection, and auto-delete policies also would have produced better results.

Nichols v. Noom Inc.

In Nichols v. Noom, the Judge denied the plaintiffs’ motion for reconsideration over production of hyperlinked documents, citing proportionality and Rule 1 concerns: “To start, the Court does not agree that a hyperlinked document is an attachment. When a person creates a document or email with attachments, the person is providing the attachment as a necessary part of the communication. When a person creates a document or email with a hyperlink, the hyperlinked document/information may or may not be necessary to the communication.”

Takeaway: This case is a lesson on the importance of considering all aspects of ESI in a discovery plan. The plaintiffs failed to get the court's approval on the production of hyperlinked documents, which they deemed necessary. A comprehensive ESI protocol that considers all potentially relevant data — including hyperlinks — and anticipates possible objections or roadblocks would have yielded better results.

Drips Holdings, LLC v. Teledrip LLC

In Drips Holdings, LLC v. Teledrip LLC, after the legal hold was issued, the defendants altered Teledrip’s Slack retention settings from indefinite to seven days, which led to the deletion of all relevant communications on the platform. In response, the plaintiff filed a motion on the spoliation of evidence, and the court issued a mandatory adverse jury instruction.

Takeaway: This case underscores the importance of consistent and credible data retention policies, particularly in light of potential litigation. The defendants altered their retention settings after the legal hold was issued, which led to the deletion of relevant communications. Establishing and adhering to robust retention policies prior to litigation, and refraining from altering these policies in a way that appears evasive, would have prevented this issue.

In re Keurig Green Mountain Single-Serve Coffee Antitrust Litig.

The federal magistrate court presiding over In re Keurig Green Mountain Single-Serve Coffee Antitrust Litigation held that Keurig did not preserve relevant ESI on over two dozen laptop computers, nine downright lost. However, the plaintiffs failed to establish that the failure to preserve data was intentional, so Keurig avoided the most severe sanctions.

Takeaway: In this case, the lack of effective information governance led to a failure to preserve relevant data on several laptops. While Keurig was spared the most severe sanctions, it still had to bear substantial costs. The lesson here is that maintaining an effective information governance program can not only preserve crucial evidence but also avoid costly sanctions. Regular audits of data preservation and storage policies, tracking of equipment, and more stringent control over data can mitigate such issues.

These cases underline the importance of effective eDiscovery and information governance in legal proceedings. Missteps in data preservation — such as insufficient data governance, inadequate early planning, failure to account for all forms of ESI, alterations to data retention policies, and mishandling of pertinent data — can lead to severe penalties, sanctions, and unfavorable judgments.

The consistent lesson is clear: when it comes to eDiscovery, proper preparation, foresight, and strict adherence to established policies and procedures are paramount.

“It is no longer amateur hour. It is way too late in the day for lawyers to expect to catch a break on e-discovery compliance because it is technically complex and resource-demanding.”

- Donald R. Lundberg

Shifting the pendulum: Modernizing your approach to eDiscovery

The truth is that eDiscovery challenges are merely the tip of the iceberg in the larger problem of unstructured data. It may not come as a surprise that only 23.9% of companies consider themselves data-driven, as revealed by the 2023 Data and Analytics Leadership Executive Survey conducted by NewVantage Partners. Even fewer, just 20.6%, believe they have successfully nurtured a data-centric culture within their organization.

As apps continue to proliferate, new data types emerge, and information becomes more isolated, this problem will grow even more complex. While it's natural to focus on individual data collections and investigations, it's important to take a step back and address the underlying cause. How well do your people, processes, and technology measure up?

If you're ready to take a proactive approach to data management or strengthen your current method, consider these four fundamental strategies:

1. Deeply understand your data environments

Modernizing your eDiscovery approach requires a comprehensive understanding of where your data originates and how it is stored. Achieving this involves a three-step process:

Step 1: Create a data map through cross-functional collaboration

To lay the foundation for successful eDiscovery, it is essential to know the exact locations of your data. This includes not only traditional sources like company servers but also emerging platforms such as cloud storage, social media, mobile devices, IoT devices, and third-party applications.

By developing a data map, you can gain a detailed understanding of where your data is located, its characteristics, and how it is used. This knowledge allows you to be better prepared for future litigation and ensures that no potential evidence is overlooked during the discovery process.

Your data map doesn't have to be anything fancy — a simple list or spreadsheet is a great place to start. Begin by classifying your data based on relevance and sensitivity, and then document how it moves within and outside your organization.

Tip: Regularly review and update your data map to keep up with changes and additions in your business platforms.

Step 2: Understand data formats and types

Data comes in different forms: structured, unstructured, and semi-structured, and each type may require a different approach during eDiscovery. From emails and documents to threaded conversations and social media posts, understanding the nuances of each data type is pivotal to ensure precise and efficient processing, review, and analysis.

Step 3: Decode metadata

Metadata, often referred to as "data about data," is an essential yet often overlooked aspect of eDiscovery. Metadata provides valuable contextual information, such as the file's creation date, the creator's identity, and the last access or modification time. A deep understanding of the metadata associated with your data sources enhances the eDiscovery process and strengthens your case strategy.

2. Foster collaboration across departments

Effective eDiscovery cannot happen in a vacuum; it requires joint efforts between legal, IT, compliance, and business departments. The foundation of eDiscovery, or the left side of the Electronic Discovery Reference Model (EDRM), really comes down to one thing — being able to find and access the information you need quickly and accurately. In other words, leveraging a strong information governance (IG) program.

However, to build this foundation, it's essential to involve more than just legal and IT. One approach to aligning stakeholders from different departments is to form a team where everyone has a seat at the table. This helps create well-rounded information governance policies and prevents inconsistencies among departments down the road.

3. Implement and maintain proper information governance

Information governance (IG) is a comprehensive approach to managing an organization's data throughout its lifecycle. With conversations taking place across multiple platforms, such as emails, Slack messages, and project management tools, a holistic view of technologies is necessary to understand the context. However, without fundamental IG principles, investments in data and privacy protection technologies may fall short.

The approach to information governance can vary depending on whether an industry is regulated or non-regulated, considering the nature of the data and compliance requirements. Unfortunately, many non-regulated industries exhibit attitudes toward data management that fall somewhere along a spectrum between two extremes: proactive and reactive governance.

blog-image-complete-guide-to-modern-ediscovery-proactive-vs-reactive-governance

Fortunately, to lean toward a proactive approach to governance, creating an IG policy is an excellent place to start.

What is an IG policy?

An IG policy outlines how an organization manages and secures data to ensure regulatory compliance and mitigate risks. It should specify retention schedules, access controls, data destruction policies, and more.

These policies must be regularly reviewed and updated to keep up with changes in regulations and technology. Routine audits ensure policy adherence and maintain the high standard of information governance required for effective eDiscovery.

Once the "what" (the rules to be followed) is established, it's time to determine the "how" (the system that implements and manages those rules), also known as an information governance framework.

What is an IG framework?

An IG framework refers to an organization's comprehensive approach to managing and protecting its information. It encompasses not only policies but also procedures, roles, metrics, and tools used for information management.

Fortunately, the Information Governance Reference Model (IGRM), an extension of the EDRM, provides a valuable roadmap for conceptualizing and designing an IG framework while emphasizing interdepartmental collaboration.

4. Invest in advanced technology

In the ever-evolving eDiscovery landscape, investing in advanced technology is no longer a luxury but a necessity. AI and machine learning-powered eDiscovery tools can accelerate data analysis, identify patterns and relationships, and filter out irrelevant data, significantly reducing the time and cost involved in the eDiscovery process.

Cloud-based eDiscovery solutions offer scalability, agility, and improved cost management. However, investing in technology is an ongoing commitment to continuous learning, adaptation, and upgrades as the technology landscape evolves.

Consider the following key factors when making technology investments:

In sum: modernizing your eDiscovery approach is crucial for enhancing efficiency, accuracy, and cost-effectiveness within your organization. This process revolves around gaining a thorough understanding of your data landscape, fostering interdepartmental collaboration, enforcing robust information governance, and investing in innovative technology.

Remember that the journey doesn't end with implementation; it requires regular auditing, adaptation, and upgrades to keep pace with the evolving data landscape, regulatory changes, and technological advancements. By staying proactive, your organization will not only be prepared for future litigation but also enjoy optimized business operations, reduced risk, and improved compliance.

About Onna

Connect. Find. Act.

When faced with litigation, internal investigations, audits, and more, the task of producing data quickly becomes arduous when it’s scattered and siloed across your cloud apps.

Onna is a data management platform that aids corporate legal and IT teams in extracting valuable insights from their unstructured data. By employing AI and machine learning, Onna automatically classifies and categorizes information, simplifying data organization and eliminating unnecessary duplication.

Enterprise companies utilize Onna to streamline eDiscovery processes by promptly identifying, collecting, preserving, and searching data from popular enterprise applications like Slack, Google Workspace, O365, Confluence, and Jira, all within a unified interface.

Sound like the solution you’re looking for? See it in action.

Back to the top

Mastering Digital Communications Software for Regulatory Needs

eDiscovery

5 min read

Understanding Key eDiscovery Data Processing Metrics