Cutting review costs: How to overcome collection challenges for Google Workspace data

Madelyn Paule

Senior Growth Marketing Manager

March 22, 2023

In today's fast-paced business environment, companies rely on a wide variety of applications to optimize their daily operations. Google Workspace has gained notable traction, ranking as the third fastest-growing app in 2022 alongside Microsoft 365 and AWS. The solution offers excellent version control, expansive data storage capacity, and seamless real-time collaboration on shared documents.

While Google Workspace provides exceptional collaboration capabilities, it also introduces unique challenges in terms of data collection, preservation, and legal compliance. Given that data generated within Workspace may be subject to discovery, Google developed Google Vault as an auxiliary service. This solution equips organizations with the tools to manage and retain data in Workspace and other Google Cloud products. However, extracting Google Workspace data from Google Vault can present various obstacles.

Let’s take a closer look at six common Google eDiscovery challenges and uncover effective strategies to overcome them.

Challenge #1: Data quantity

As Google Workspace continues to grow in popularity among businesses, an increasing amount of potentially relevant data is being generated and stored within the platform. With over 3 billion users and 8 million paying customers as of 2023, legal and IT teams face a significant challenge in managing these large, multifaceted datasets. While Google Vault provides data retention and archiving capabilities, it does not allow for modifying or deleting data, potentially creating roadblocks for teams that need to redact or remove certain information from the archive.

Solution

Advanced data management solutions that can seamlessly integrate with Google Workspace and offer granular control over data retention, modification, and deletion while ensuring compliance with regulatory requirements and data protection standards.

By proactively selecting what Google Workspace data to collect and preserve, businesses can avoid costly and reactive collections in the future. This approach not only minimizes processing and review costs but also ensures quick and easy access to data when you need it.

Challenge #2: Early case assessment

To streamline the review process and minimize the volume of irrelevant data that needs to be reviewed, it is crucial to identify and collect potentially relevant data at the outset. However, when using Google Vault for early case assessment (ECA), users are limited to collecting data from only one source at a time. As a result, your team may end up duplicating efforts to collect data from various sources, which can delay the assessment of risks.

Solution

A comprehensive ECA solution that offers advanced features such as tagging, custom fields, and filtering. These features can significantly enhance your team's ability to collect, search, and review data quickly from specific users and files based on date range and type. Consider selecting a tool with robust connectors to key platforms like Google Workspace, Microsoft 365, Slack, Zoom, and Zendesk. This will further enable your team to access and review relevant data from multiple sources simultaneously, reducing costs and saving time.

blog-image-google-workspace-data-early-case-assessment

Challenge #3: Preservation

When an organization receives a legal hold, they face several challenges that can put them at risk of data spoliation and non-compliance. One of the biggest challenges is finding and using different versions of user-generated documents during discovery. Since Google allows users to edit and delete content in real-time, preserving data becomes tricky.

Google Drive stores every version of each document, but the original metadata about a document's creation date gets overwritten by Google Workspace, which assigns the creation date as the date of export. Without this metadata, it's nearly impossible to ensure that you have the correct version of a document, edited by the right person on the right date. This makes it crucial to find a way to export information out of Google Workspace into a review-ready format without losing or altering any metadata.

Solution

To solve this challenge, you need a tool with the ability to preserve data for legal holds at the custodian level across all Google services and other applications. This is a must-have feature to look for in a Google eDiscovery solution, as it allows users to easily navigate Google Drive and select only the files and folders relevant to the case to avoid overcollection. By doing so, the amount of data collected for a matter can be dramatically reduced, which not only decreases the cost and burden of eDiscovery but also ensures compliance with legal requirements.

Another useful preservation feature is ID-mapping capabilities, which enable you to identify the same individual across multiple Google instances or other data sources, even when their name and credentials may differ. This can help ensure that you’re collecting all relevant data for that particular individual while maintaining the integrity of the metadata.

blog-image-google-workspace-data-id-mapping

Challenge #4: Search

The basic search interface of Google Vault may not be adequate for complex eDiscovery. Its inability to perform federated searches across all data sets, including those within Google, restricts its utility. Although administrators can narrow down search results by using specific search criteria, it lacks advanced search capabilities commonly found in more advanced eDiscovery tools. Consequently, relying solely on Google Vault can make it difficult to precisely locate pertinent data.

Solution

Choose an eDiscovery solution that provides centralized control of your data sources. The right solution will offer granular search capabilities with additional filtering using a rich set of metadata. This enables you to process and review all your data in one place, with a single workflow, eliminating the need for multiple searches across different platforms.

blog-image-google-workspace-data-advanced-search

Challenge #5: Metadata

Google Vault detaches metadata from its underlying files and exports the metadata via XML files. The documents are then labeled with both the file name and the internal Google reference number. However, users must reunite these separate files before a review platform can interpret them, leading to a significant increase in the time and effort needed to prepare data for review.

Additionally, Vault excludes essential metadata when exporting, including full file path description, file version information, file sharing and access permissions, parent folder details, and document status (whether it has been deleted or moved). Losing this critical information can make the discovery process challenging as metadata is a crucial search element.

Solution

Agile platforms that provide users with the flexibility to collect as much or as little metadata as needed. Such platforms can gather all metadata from the original source, extract all embedded items and .zip files, and preserve files in their native format. Collecting all metadata, such as the file path description, version history, and access permissions, helps users obtain the necessary information for efficient and effective discovery, ultimately reducing the time and effort required to prepare data for review.

blog-image-google-workspace-data-metadata

Challenge #6: Exports

Google Vault limits its data export options to only a few file formats, such as PST, MBOX, and JSON. This can create challenges for organizations that require data exports in different formats or need to preserve associated metadata. It's important to note that exporting data in Google is not the same as downloading. Users must navigate to the Exports section within the matter and download the files from there. Afterward, they have to manually re-upload the data to their review platform, which can be time-consuming and cumbersome.

Additionally, Vault's export capabilities are limited to a certain number of accounts and a single Google service at a time, making it challenging to perform large-scale data backups efficiently.

blog-image-google-workspace-data-json-file-export

Solution

To bypass the manual export and download process and easily transfer files for review, you can connect Google Vault with a specialized third-party tool. This enables you to view complete data exports in your preferred review platform, without worrying about file format limitations or manual transfer processes.

blog-image-google-workspace-data-gmail-onna-export

‍

Overcoming data collection challenges with Onna

While Google Workspace and Google Vault offer powerful tools for collaboration and occasional data extractions, they are not explicitly designed to address specific eDiscovery requirements.

Should you encounter or expect to encounter any of the challenges mentioned above, worry not — our data management platform was built to solve them all. Every solution outlined here is a feature of Onna’s Google eDiscovery solution. By doing all of the above and more, we significantly reduce the time and cost associated with collecting, processing, and reviewing data.

The challenges highlighted in this blog are not exclusive to Google Workspace, and as a cloud-based solution, we’re continuously developing new features to enhance the accuracy, completeness, and efficiency of managing eDiscovery in the cloud.

To learn more about our Google Workspace integration or how we work to centralize your data, feel free to reach out to us here.

blog-image-google-ediscovery-guide-banner

‍

Subscribe to our newsletter

Get Complete Visibility into Your Unstructured Data, Today

Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.

Get a Demo

Talk to an Expert