Implementing Robust AI Governance for Data Democratization

The stronger the governance, the more your employees can explore data freely without creating additional risk for the company.

Mar 19th, 2024 10:58am by Christian Kleinerman

Featued image for: Implementing Robust AI Governance for Data Democratization

Image from Daniel Prudek on Shutterstock.

The rapid emergence of generative AI empowers more people to unlock the power of data for new insights and better decision-making, but granting wider access to data requires a strategy for data governance. Enterprises that can balance these seemingly opposing trends — democratizing data, while maintaining strong governance over that data — will differentiate themselves in the market by unlocking unique data-driven insights.

By 2026, more than 80% of enterprises will have used generative AI APIs and models, or deployed generative AI-enabled applications in production, according to Gartner, up from less than 5% last year. Generative AI’s natural language interface allows nontechnical users, from department heads to frontline workers, to access and use data far more easily. This levels the playing field in terms of access to information and skills, with Gartner claiming it as “one of the most disruptive trends of this decade.”

Democratizing data in this way makes strong governance even more critical if companies are to avoid increasing risks around privacy, security and data quality. That means knowing precisely what data you have, where it resides, who is authorized to access it and how each type of user is permitted to use it. But how does an organization institute comprehensive controls without squelching innovation?

At a high level, the ideal approach is to unify data in a comprehensive repository that multiple teams and work groups can access easily and securely. Unifying data allows an organization to centralize governance and broaden access to that data, while minimizing complexity and optimizing costs.

In reality, this can be challenging, with data sovereignty laws that require certain data to be kept in particular countries or regions. In such cases, organizations should strive to eliminate silos as much as possible and apply a consistent governance framework across their data platform.

Beyond this, several specific approaches and technologies help to ensure that organizations can maintain strong governance while still widening access to data via generative AI. Some of these are essential governance practices that apply in any environment, but they become even more important when generative AI further democratizes access to data.

Fine-Grained Controls for Privacy and Compliance

As more employees gain access to more data, the potential risk that personally identifiable information (PII) may be leaked or seen by the wrong users only increases. Fine-grained control policies as well as anonymization and de-identification techniques are critical to ensuring regulatory compliance and preventing data from being accessed by the wrong people.

In our new Data Trends 2024 Report analyzing trends in Snowflake Data Cloud, we noted a significant increase in the use of governance features that provide granular control over data while also making it appropriately available to more users, for more use cases. For example, the use of applied masking or row-access policies increased 98% for the 12 months leading up to Jan. 31, 2024, compared with the same period a year earlier; meanwhile, the number of columns with an assigned masking policy grew 97%.

Notably, however, the total number of queries run against policy-protected objects was up 142%. That number is significant because it shows that good data governance is not about saying “no” and limiting data usage. Despite seeing more and more governance through the use of tags and masking policies, the report notes that the amount of work being done with this data is rising rapidly.

In some cases, employees may want to examine a data set they can’t be granted direct access to. Differential privacy is a powerful technology in such a circumstance because it allows users to share and explore data sets by looking at patterns within the data set, but without revealing any individual user’s PII. Going a step further, data clean rooms allow multiple parties to collaborate on data without disclosing the raw data to one another. Data clean rooms are typically used to share data between different organizations, but we’re seeing the technology used internally to meet increasing regulatory and privacy needs, and it can be an effective technology for exploring PII data in the context of a generative AI interface.

Consistent, Orchestrated Security

Security should be built into the fabric of your data platform rather than trying to bolt it on later for individual data sets and users. The technology that powers your conversational interface should not have to duplicate identity and other core permissions on the data, which would lead to a fragile setup. If two or more systems are keeping track of who can access what data, the chances of errors and unauthorized access greatly increase.

Technologies that can play a key role in securing data for generative AI use cases include continuous risk monitoring and protections, role-based access control (RBAC) and granular authorization policies. Role-based tagging and tag-based masking policies allow you to protect data at the column level by assigning a masking policy to a tag and then setting the tag on one or more database objects.

Data Silos Are the Enemy of Good Governance

Having copies or fragments of data stored in different systems makes it extremely hard to keep track of who can access what information and to keep access and control policies consistent. This is why data silos are the enemy of strong governance.

Data silos also make it hard to ensure that employees are querying the most current and accurate data, potentially leading to costly errors. To grant widespread access to data through generative AI, organizations need a single source of truth to ensure that all employees are looking at the same information, and where controls and policies can be applied and updated holistically across all data.

Ensuring Data Quality for Accurate Results

Even if you eliminate silos and have the right permissions in place, that doesn’t guarantee that the information employees are accessing is correct. A data quality framework, based on configurable data quality rules applied to a specific column or a set of columns in a table, can help detect quality problems and ensure accurate information.

Also, as we all know by now, generative AI can sometimes “hallucinate” and produce answers that aren’t grounded in fact, which is unacceptable for enterprise use. Organizations can address this by combining large language models (LLMs) with data sources they know to be trustworthy, such as an internal customer database or a vetted data set from a trusted third-party provider.

These trusted sources of data can be incorporated using processes that either require LLM customization, such as fine-tuning, or that don’t require LLM customization, such as prompt engineering or retrieval-augmented generation (RAG). In either case, these techniques help to ensure that employees receive accurate, high-quality results while adhering to the governance standards built into the internal cloud environment.

Data Access and the Power of Universal Search

An important aspect of governance for generative AI is making it easy for employees to find the right data sets and data products to help with their analysis. One reason AI is so powerful is that it allows employees to interact with data without going through a central team, but this requires those employees to know what data is available to them and how to find it.

A search capability provides this functionality, allowing users to find and query data sets and data products. This search functionality can itself be powered by an LLM to make searching for data even more intuitive — something we have been developing at Snowflake as part of our Universal Search.

Governance Is the Foundation for Data Democratization

Business users are eager to make wider use of their organization’s data, and generative AI finally makes this possible. Thanks to LLMs and natural language processing, employees in areas like finance, human resources, sales and operations can now formulate questions specific to their roles and receive the answers they need to make smarter decisions.

But to meet an organization’s security and compliance needs, this can only happen in an environment with strong governance in place. The stronger the governance, the more your employees can explore data freely without creating additional risk for the company. Generative AI has opened the door to true data democratization, and good governance is the foundation that makes it possible.

Christian Kleinerman is executive vice president of product at Snowflake with over 20 years of experience working with various database technologies. He has more than 15 years of management and leadership experience. At Microsoft, he served as general manager of...