Your internal ESI playbook: A critical checklist for managing cloud data in eDiscovery
Given the volume and diversity of data involved in litigation and investigations today, particularly unstructured data, creating an internal playbook can help manage and navigate the process of identifying responsive information in electronic form.
This internal playbook is designed to help you and your team sort through a wide range of considerations before beginning to identify, preserve, and collect electronically stored information (ESI) from cloud applications. This resource will guide you in framing your strategy for identifying, preserving, collecting, and exporting data while ensuring compliance. It can also serve as a springboard as you prepare to agree on an ESI protocol required by the Federal Rule of Civil Procedure 26(f) and other applicable state and local rules.
The key is to remember that while your ESI playbook must be comprehensive, it must also be flexible, as no two cases are alike. Ask yourself the following questions as you work through managing discoverable data and updating your ESI protocol to include cloud apps. Note that these questions, while extensive, are not exhaustive. As you consider these questions, you may find that they lead to further inquiries as you go.
Additionally, understand that developing your ESI playbook is not a one-time task. As technology evolves or as new tools are added to your tech stack, it’s essential to perform frequent assessments of your guidelines. This proactive approach ensures you are well-prepared to meet imminent deadlines and allows you to make informed decisions in advance — thoughtfully and without the pressure of time constraints.
Step 1. Identification
- Which types of cloud applications (e.g., collaboration platforms, email services, repositories) are likely to be relevant to the matter?
- Who is involved in the matter? Consider the parties, any witnesses, and data custodians.
- How is each application used within the organization, and how does this usage impact the relevance of the data?
- Which data sources are likely to contain only irrelevant data that should be excluded?
- Do you have transparency regarding the data for early case assessment and prior to review? Is there a method for federated searching or gaining visibility?
- How can the amount of data relevant to the matter be identified early in the process?
- How will you ensure that the scope of collection is proportional to the importance of the case?
- Do you have a data map to help you identify and understand the requisite sources of information considered for collection?
Step 2. Preservation
- What steps have you already taken to preserve the information? Are you aware of the existing retention settings for each cloud application?
- Have you considered external channels and how they will impact collections based on both internal and external retention settings?
- What other automated measures could you implement to preserve information?
- How will you ensure the integrity of both the data and its metadata during preservation?
- How will you address the preservation of information stored online or in the custody of third parties?
- What tools and methods are available for preserving data from these cloud applications? Should you use in-place preservation (IPP) or collect the data?
Step 3. Collection / Processing
- For each data source, which data collection method or tool offers the best balance of efficiency, thoroughness, and reliability?
- Are there limitations to data collection from specific sources? If so, how will you address these limitations?
- Where and how will you store collected data during the matter's pendency?
- What tools and techniques are necessary for collecting data from each relevant application? Are there any platform APIs that can assist when standard tools are insufficient?
- What types of data will be collected (e.g., emails, chats), and what will be excluded (e.g., deleted items, edited messages)?
- How will you manage parent-child relationships between documents and their attachments?
- How will you address modern/linked attachments?
Step 4. Handling metadata
Metadata is data that provides information about other data. In the context of ESI, metadata offers valuable details about the creation, modification, and usage of electronic files. Allowing metadata to be lost or modified can result in spoliation, with its associated penalties.
Depending on the cloud application, there are different forms of metadata available. Similarly, each application stores data differently. Understanding these distinctions for each application is an important component in identifying what can be captured and how it will impact subsequent reviews.
As you consider the sources of potentially discoverable data within your organization, think about whether and how you should preserve and collect metadata. Below is a list of common metadata fields found in today's most widely used cloud applications:
- Slack application data, including:
- Messages: Timestamps, sender and receiver IDs, message content, message type (e.g., a reply, file upload, etc.), reactions, edits, deletions, thread information, and mentions of other user IDs
- Files: File name, file format, timestamps of the file’s creation, modification, and access, file versions, and file sharing details
- Channels and conversations: Channel names, descriptions, and member user IDs
- User activity logs: Login and logout timestamps, user status updates, and user profile information
- Integration and app usage: Details about third-party integrations and apps, including installations
- Workspace: Workspace name, user IDs of workspace members, admin and moderator user IDs, and workspace settings
- Additional content: Slack canvases, huddles, audio and video clips, GIFs, emojis, and custom emojis
- Google Workspace application data, including:
- Gmail: In addition to the general email list below, check for read/unread status, spam/junk status, emails marked as important or starred, and any labels assigned to the email
- Google Drive: File names, file types, file timestamps, sharing details, and file versioning data
- Google Chat and Meet: Message timestamps, sender and receiver IDs, message content, meeting date, time duration, and participants
- Google Calendar: Date, time, location, attendees, reminders, and notifications
- Google Vault: Deleted files from Google Drive, deleted email messages, and discarded drafts from Gmail, as well as email or file origin
- Microsoft Teams application data, including:
- Chat and channel messages: Message timestamps, sender and receiver IDs, message content, message type, reactions, GIFs, images, edits, deletions, and mentions of other user IDs
- Meetings: Date, time, duration, participants, chat, recording names and file formats, meeting notes, and shared documents
- Files and attachments: File names, file types, file timestamps (i.e., creation, modification, and access), file versions, and sharing details (who shared files and with whom)
- User activity: Logins and logouts, user status, and device information
- Zoom application data, including:
- Meeting ID, date, time, and duration
- Participant names and other identifying information
- Message timestamps, sender and receiver IDs, and content
- Meeting audio and video recordings, including names and file formats
- Individual, group, and channel chat data
- Meeting transcripts
- Confluence application data, including:
- Entire Confluence sites, specific spaces, and pages
- HTML content, comments, and attachments on pages
- Labels for pages and attachments, and their ancestors
- Author, creation, last update, and previous version details
- Ancestors of files, labels, space ID, space name, and space type
- Jira application data, including:
- Issues and their descriptions
- Issue name, creator, and updater
- Current status, type, and progress of the issue
- Priority, resolution, and due dates
- User comments and summary details
- Components involved, original estimates, and time spent
- Votes, issue links, and any attachments associated with the issue
- Zendesk application data, including:
- Ticket ID, name, and type
- Organization and ticket status
- Assignee, requester, and group
- Ticket priority and list of tags
- Tickets and ticket attachments
- Labels and ancestors for pages/attachments
- Social media: User IDs, timestamps for posts and comments, geolocation data, and media information about any images and videos (e.g., resolution, format, etc.)
- Cloud storage: File names, file types, upload date, modification date, and access logs
- Databases: Table names, field names, data types, and indexed fields. Consider legacy systems that are archived but not actively used in daily work
- Emails:
- Sender and recipient information, including "to," "from," "cc," and "bcc" fields
- Message identifiers such as subject, date, time, and message ID (the unique identifier for the message)
- Information about the email’s route, including X-mailer (the email client or software used to send the message), "received" (routing details), read/unread status, SMTP server information, delivery status notifications, return path (for bounce-back messages), and information about forwarded messages and replies
- Details about any attachments
Note: You might also want to consider whether data migration has occurred at any point. Migrating data from one system to another can lead to different metadata being captured, and it's important to be prepared to explain any discrepancies that may arise.
- Documents (e.g., Word, PDF, spreadsheets, or txt files):
- Basic information, such as author and file name
- Information about the document’s history, including when the file was created, modified, and accessed
- Document properties, such as word count and time spent editing the file
- Revision information, including any comments and tracked changes
- Types of attachments, including links and embedded documents
- Identifiers, such as file name, format (e.g., XLSX, CSV, etc.), path, size, author, spreadsheet’s history, including when it was created, modified, and accessed
- Specific worksheet information, including the names of each sheet, the number of rows and columns in each, and the status (hidden or visible) of individual sheets
Step 5. Exporting ESI
- For each type of ESI, which format should you use for export? Options may include native format, PDF, CSV, RSMF, or other file types
- If you decide to export the data in its native format, what additional steps must be taken to ensure the data is appropriately and effectively exported?
- For each data source, consider the context (e.g., exporting messaging in single chats versus a 24-hour window of time)
- Which metadata will you include in your export for each data source?
- What, if any, production requirements or deadlines has the court or regulatory authority established?
- Will you use a rolling production schedule?
- How will you manage ongoing productions as new data emerges?
- Is there a method for cloud transfer, or do you need to consider additional time for downloads/uploads?
The importance of data mapping for successful eDiscovery
And there you have it — a critical checklist for managing cloud data in eDiscovery. We hope this playbook has given you insights into the different types of cloud applications emerging in eDiscovery today and the kinds of questions you should consider as you develop your eDiscovery strategy.
Remember, while an internal ESI playbook can outline your strategy for identifying, preserving, collecting, and exporting data, successful eDiscovery ultimately hinges on knowing what data you have and where that data is stored.
If you don't already have one, consider developing a data map. This map should track how data moves from its source through processes and systems until it reaches its destination (either for archival or deletion). Data mapping will not only help you minimize the collection and retention of unnecessary data but also ensure you are prepared to respond to any requests or challenges that may arise.
We believe that combining a dynamic, regularly updated ESI playbook with a detailed data map is key to navigating the challenges of modern eDiscovery, ultimately leading to more successful outcomes in litigation and investigations.
Did you like this checklist? Click here to download the resource as an Excel sheet.