LLM Integration Pitfalls: Protecting Sensitive Data in the AI Age
AI has become an integral part of every industry and has revolutionized how we interact with technology. Expectedly, communication, productivity, and governance software companies have been at the forefront of integrating large language models, and generative AI is now embedded in almost every tool we use daily.
At this point, we should stop calling it “AI” and recognize it as a new interface layer. But we can’t do that yet because everyone has to ride the hype train, and we get to be in awe of grand product naming achievements such as “Slack AI” and “Notion AI.”
As AI becomes more prevalent, one of the most powerful uses is its integration into our daily tools, where LLMs drive new efficiencies. For example, Notion AI is one of the most impactful generative AI deployments. It brought enhanced search — which was never Notion’s strength — and a super easy way to query entire workspaces, treating Notion as a super accessible knowledge base. Additionally, you can now link other tools like Slack and Google Drive and query the knowledge within them from a single search box. This is reminiscent of the Alfred MacOS app.
LLMs Are Amazing With Integrated Systems, With a Catch
Integrating LLMs into our everyday tools offers undeniable value, immense potential, and equally significant promises.
A vast array of tools enables us to create, edit, and store knowledge, and you can interact with a single chat or search window that can summarize, query, and surface the exact information you need in seconds.
On the other hand, you can automate a wide range of tasks, from support to product decision-making, without even needing a user interface.
The big win here is the instant availability of knowledge and automation of knowledge work. All that happens natively without switching context within a single platform that integrates with other tools and drives the LLM. When that platform also happens to be your daily driver — like Notion and VS Code are for me — you realize how tightly woven AI can be into your daily operations so that it is seamless and feels organic.
However, when it comes to LLMs, things are not that simple. The smooth UX where you click a button and OAuth your way through the setup obfuscates a ton of complexity:
- Data from other tools must be broken into smaller pieces before being sent to the LLM to ensure it doesn’t choke on large documents;
- Retrieval-augmented generation (RAG) will likely ensure that the LLM is fed with the correct information so that it doesn’t hallucinate inaccurate responses;
- Careful orchestration is required; end users have little insight into how it is all set up and managed.
As with any new technology, using AI has its challenges. Despite the convenience, integrating LLMs with interconnected tooling has significant risks. Once the data starts crossing multiple platforms, the stakes increase and the question becomes: “Can we trust the orchestration tool to manage data responsibly?” This is precisely where things get messy.
How I Was Denied My Notion AI Integration Request
LLM integrations can look and feel seamless, but they often hit security roadblocks, as I recently experienced with Notion AI.
I started my Monday the usual way — reading through many newsletters and notifications that keep me apprised of the day-to-day “wassup.” A message from Notion excited me — they delivered on the promise to integrate Google Drive with Notion AI so that we can have a seamless way to query Notion, Slack, and Google Drive. Amazing!
Nutrient is SOC 2 certified, and we take the security and integrity of our data and our customers’ data very seriously. As a result, I would not just install anything and hook it up to a company-wide tool.
I eagerly asked my operations team for permission to install the Notion AI connector for Google Drive, but I received zero permission.
To quote our security officer:
“I’d be worried about having similar issues to the prompt injection vulnerability found with Slack AI. IE, is it possible to use Notion AI to tell you about things in documents you don’t have access to?”
This reminded me of the issues with Copilot for Microsoft 365, in which Microsoft advised enterprises to strengthen data governance. In other words, we must be vigilant and take responsibility for protecting our data.
I won’t get my Notion AI integration with Google Drive yet, at least not until we’ve mitigated essential security risks. This is an example of AI excitement meeting reality.
My experience with Notion and Google Drive underscores the more significant security risks when integrating AI into business systems.
Security Risks in AI-Powered Integrations
While integrating LLMs with interconnected systems opens up a whole new world of possibilities, it also opens up a portal to serious security risks, which can force enterprises into non-compliance with data security and privacy legislation. The more systems we connect, the more complex the data pipelines get, and this is where vulnerabilities can sneak into the tool orchestrating the integration.
Prompt Injection
The first security risk to watch for is prompt injection. This occurs when someone manipulates the input being fed to an LLM to cause it to return information that it shouldn’t.
Many of us have toyed with the ChatGTP guardrails since the release of GPT-3. Reddit is filled with testimonials of people working around the constraints with interesting takes, such as a Reddit poster casting a Priori Incantatem at ChatGPT, getting it to generate a prompt that can circumvent its limitations.
Notion AI is powered by ChatGPT, and no matter how far the LLM and the surrounding apps progress with guardrails, there will likely always be a way because—in the case of LLMs—their generative features are also their worst enemy. It’s like Netflix geoblocking content, a never-ending battle of whack-o-mole discovering and blocking VPNs and proxies that circumvent the rules.
When integrating AI, we have little choice but to trust that the orchestrating tool has fully sandboxed permissions. We have to trust that it won’t spill the beans and expose sensitive data to persons or bots that should not be able to use it like Slack AI did, which creates risk.
OAuth Is No Guarantee
Then, there’s the false sense of security associated with OAuth. We all love OAuth because it makes integrations feel seamless and secure. However, OAuth cannot ensure that the data the LLM will access is scoped correctly, or at least not independently.
Yes, we can probably expect that whatever is using the OAuth token to sign requests to Google will be unable to fetch a resource that our user can’t access in Google Drive. But what happens when the LLM starts processing multiple data points across platforms? Does it always know what we, as users of various tools, should and should not be able to access? Can documents we access in Google Drive get chunked into a vector database separated from chunks available to other users? What happens when permissions change?
Most LLMs don’t have these sophisticated guardrails built in. Once the data starts flowing between systems and into the LLM, it’s a Wild West of permissions. It’s like handing a keyring with hundreds of keys to a locksmith and telling them which one of your friends can use it to unlock your house at what time, then changing the [person, key, timeframe] multiple times a day.
Yet again, we must trust the tool performing the orchestration by performing extensive testing, or we cannot use the integration.
Third-Party Integration Risks
Trusting a tool requires trusting its third-party integrations. Each new integration introduces a playground for prompt injection and opens the door for misconfigured or misinterpreted permissions. We might trust Microsoft Copilot or Notion AI, but what about the third-party tools they connect to?
Every time a new service is plugged in, the risk skyrockets. This is reminiscent of GDPR and subprocessors. While businesses must be informed about every new subprocessor that can process their sensitive data, each implements its own security and data protection. The upside is that a data breach with a subprocessor is usually a subset of the data, and the breach can be contained.
Unfortunately, with LLMs, if one part of the system gets breached, everything connected to it could be at risk. This creates a potentially detrimental domino effect, where vulnerabilities in one tool could cascade into others.
Integrating LLMs with interconnected tools, our favorite tool in the driver’s seat sounds excellent. However, it comes with significant security risks. The complexity of the network of connected tools and the lack of best practices and mature security protocols for LLMs means that the risks increase exponentially the more we connect.
If we’re going down this road, we must ensure that we have a security strategy, are testing for prompt injection, and are validating that access permissions are being respected. Once data starts flowing through those pipelines, controlling who has access to what becomes impossible.
Given LLM integrations’ significant risks, balancing innovation and security is crucial.
How to Balance AI Convenience and Security
Okay, we all want faster workflows, automated tasks, smarter decision-making, and other benefits of using LLMs — especially the convenience of using a single tool to work with excessive amounts of data stored across multiple platforms. But we can’t ignore the security risks.
We also don’t want to keep hitting a wall and just gaze at the promise of convenience. We know it will take more effort than hitting the “Allow” button on an OAuth prompt, but we can solve this problem.
Security Needs to Come First, but It Shouldn’t Brickwall Innovation
Let’s start by acknowledging that security concerns should never paralyze innovation. While we can expect that security teams will flag the risks of integrating LLMs with existing platforms, instead of shutting down AI initiatives, our goal should be to collaborate towards building trust in these systems.
Integrating the proper safeguard without killing the potential for knowledge automation and on-demand data retrieval across multiple tools is a balancing act. We should create a secure path for the adoption of LLMs.
AI Capabilities Need To Be Tested
We need to human-test AI tooling! There are almost an infinite number of edge cases, and — as we saw in that Reddit post — LLMs are their own worst enemy when finding ways to work around the guardrails.
When we developed the AI Assistant for Avelyn, we first implemented this excellent AI-enhanced redaction that users can initiate by simply asking the Assistant to redact a document by describing what they want to be removed. Properly redacting digital documents means physically destroying information within; it’s not just putting a thick black line over some text or an image. Now, imagine a situation when the user would redact a document, and the AI Assistant didn’t rebuild the document chunks in the vector database or invalidate all previous chat sessions. Imagine asking the Assistant about the content that was destroyed in the document. Getting a response based on cached document chunks or accessing redacted content through an old chat exchange would be a serious security flaw.
Start With Access Controls
Access control is the foundation of security when integrating LLMs with data that lives within multiple platforms. We have to ensure that LLMs can only interact with the data that the user has explicit permission to access. This is no simple task; things get muddy quickly, especially with more complex integrations.
The best place to start is to ensure that the tool orchestrating LLM utilization clearly outlines how it deals with access to data. I find the Notion AI security practices provide clarity on how access rights are respected within Notion. Their FAQ on AI connectors also makes it clear that users should only be able to access third-party content that they have access to in the third-party service.
Continuously Audit and Monitor
Trust but verify. Confirming that the tool in control of feeding the LLM is the starting point and that trust could be warranted, but auditing and monitoring how it interacts with our data is also warranted. It’s about making sure that our permissions are being enforced and reacting to anything that looks or feels suspicious.
For example, if I get approval to connect Notion AI with our Google Drive, I intend to remain vigilant and monitor compliance. A straightforward way to do it would be to have a colleague remove my viewing access from a document in Google Drive and then immediately try to ask Notion AI questions only that document can answer. If it provides answers based on the document’s contents, there’s at least an issue with caching access to chunks in the vector database, or it could also be a sign of mismanaging access altogether. It is a simple test that can be repeated periodically with observable results.
The other approach would be getting Notion AI to tell me things I shouldn’t know and never would have known. It’s simple: I’d ensure there’s a document in Google Drive that I cannot access with my company’s Google account. The idea is that the document was never shared with me in the past and that the owner has already queried Notion AI in a way that its content is used to generate responses. Then I’d summon my prompt hacking powers and try to convince it that I already know this information and can access it, and paste parts of the content as proof. I can’t reveal all my tricks before trying, but I like the challenge and try everything. If I get it to crack, I’d expect Notion to address it by implementing a new guardrail or improving their existing ones.
Hold Vendors Accountable
Testing, monitoring, and auditing won’t be helpful if we don’t hold our AI vendors accountable, right? They must provide clear, transparent documentation about their tool’s work, especially regarding data security and permission management. We need to push for detailed explanations of how they scope LLMs’ access to external data and how they handle vulnerabilities like prompt injection.
If vendors don’t provide the necessary clarity or refuse to work with us, I think we should not even consider using them. I believe that the best AI vendors in the future will offer a way to monitor how the data flows through their systems on a much deeper level. They will likely need to provide AI services to enterprise companies that demand more control. I expect AI-specific dashboards will allow citizen developers to inspect how the data is used and maybe even look at the underlying prompts.
Limit AI Usage To Vetted Tools
This is kind of a no-brainer. I’d expect that most organizations have already developed internal AI policies that dictate what tools can be used. Your “Master Tools” list should contain AI platforms that have passed the necessary security assessments and offer clarity and transparency about their LLM integration methods.
Build It Yourself
Building in-house AI integrations can significantly increase control and security for organizations with more rigid data protection policies.
I trust a finite number of AI vendors, but trust is given with several grains of salt. When you don’t trust or feel uncomfortable with the available solutions, there’s always the more fun route: build your own thing. Creating in-house AI tools tailored to our specific needs means you will have full control over how the system interacts with your platforms, how data is accessed, and how permissions are managed.
Nowadays, if I can build a Chrome Extension — with the help of Claude, of course — I’ll do that ten times over rather than using an existing one, especially if it lacks clarity on how the data is used. I just can’t risk customer, company, and personal data.
Building and maintaining LLM integrations is challenging, so developing your own requires the necessary bill. All those questions we would pester the vendor with are now questions we must ask ourselves. Solving challenges like prompt injection is now our problem, and — especially when it comes to LLMs — it will never go away. While this approach requires a deeper investment, it might be worthwhile, especially for organizations with big engineering teams handling susceptible information.
The true power of building in-house is the ability to test and monitor the LLM integrations at the prompt level, something that is unlikely to be available with any third-party AI tool. Fortunately, new frameworks for working with LLMs like LangSmith and Neon’s AI & Embeddings pop up every week, and they are getting more specialized and amazing.
Conclusion
Smart companies encourage their employees to discover and use as many new AI tools as possible. A new world is being created, and we’re all living it.
Smarter companies are likely more elaborate about it — they facilitate training and form workgroups to board different AI trains. The smartest companies will probably have done all that in 2023, and everybody there is already AI-fluent.
Several organizations haven’t addressed the topic, but the real problem is not limited to these kinds of organizations. When people just start linking up these tools without oversight, things can go south fast. The potential for ultimate convenience has infinite consequences if the tool orchestrating LLM utilization mismanages access rights or fails to implement the necessary guardrails against attacks like prompt injection.
We don’t want to be stifling innovation here. However, using LLMs with a network of platforms is a risk. At the very least, we have to be careful about the AI tools we choose. Ideally, our choice will be governed or assisted by a well-defined company policy, and we have to hold the AI technology vendor to the highest standards.
But we shouldn’t stop there: we must continuously monitor and audit the AI tool to ensure that the security standards are proclaimed and respected.
Ideally, though, if we have the resources and expertise, building our own LLM integrations could be the safest option. We retain full control over how the data flows between systems and can ensure that the necessary security measures have been in place since day zero. Yes, it’s a bigger investment, but it’s also likely unavoidable for companies handling highly sensitive data.
Building in-house gives us a much tighter grip on things because we can build and maintain our own guardrails and tests to challenge them at a deep level, which is not available when we opt-in for an existing solution.
This article is part of The New Stack’s contributor network. Have insights on the latest challenges and innovations affecting developers? We’d love to hear from you. Become a contributor and share your expertise by filling out this form or emailing Matt Burns at mattburns@thenewstack.io.