AI Alignment in Practice: What It Means and How to Get It

LLMs' wide and variegated training process makes them inherently misaligned to any specific scope and standards. Here’s what you can do about it.

Jan 28th, 2025 5:02am by Yam Marcovitz and Emily Omier

Featued image for: AI Alignment in Practice: What It Means and How to Get It

Featured image by Ayush Kumar for Unsplash+.

When we talk about artificial intelligence, what we’re really talking about is using computers to automate human intelligence. When we talk about AI alignment and misalignment in practice, we’re really talking about whether the AI application we’re working with acts in alignment with our needs and expectations.

Alignment issues happen in purely human interactions as well. For example, a customer service rep who is given outdated or insufficient training material is likely to communicate unfactual or wrongly extrapolated information to customers, similarly to an AI agent trained on outdated or incomplete documents. The difference is that most managers have a fairly good idea of how to correct or otherwise deal with their human agents.

But with AI agents, it’s not always so simple. That’s precisely because large language models (LLMs) are not human, nor is the process of training them.

We Can’t Settle for 70% Accuracy

When humans talk to each other, we intuitively convey a huge number of contextual clues. While such implicit exchanges occur during conversations, there’s also the largely unappreciated amount of context that we share in common with others in our everyday circles. We have a sense of how to deal with people and situations around us. Most importantly, we each do it a little differently due to divergences in opinions, needs and circumstances.

But LLMs have to be explicitly provided with our context, and directed on how we personally wish to approach the critical scenarios that happen within it. Otherwise, misalignment is practically guaranteed. The corollary problem is that they often start to lose focus and behave unpredictably if provided with more than just a few instructions. LLMs also have trouble prioritizing and resolving subtle conflicts between instructions unless they are explicitly trained to do so.

In addition to the challenges of aligning an LLM agent with our own intentions, there is also the problem of getting the agent to align with the customers and their own context. This is where most AI frameworks today fall short, as they’re built around recognizing and reacting to a user’s so-called intent.

The problem with this approach is twofold. First of all, even during real-life interactions, intent can take time and help to clarify. At the same time, a user could have multiple simultaneous intents that should be tied together in a response. Just as importantly, there are situations where, instead of reacting directly to the user’s perceived intent, you would want the AI application to counter with guiding questions or otherwise divert the flow of the conversation. And an intent-based approach, while simpler to reason about from the standpoint of engineering, isn’t actually well suited to these situations.

Because of these challenges, generative AI applications that get it right even 70% of the time are often portrayed as a success. But, especially in a customer-facing situation, that standard is ridiculously low. It leaves brands open to both reputational and legal risk, especially if they operate in regulated industries.

Common AI Misalignments

Before we dive into how to fix alignment issues, let’s first address the types of misalignment in AI applications.

Factual Misalignment

When a generative AI application hallucinates — offering made-up context, information or services — that is a type of factual misalignment. An easy yet common example here is if a bank’s client asks its AI agent, “What are my limits?” and the agent responds with anything from made-up facts such as, “Your withdrawal limit is $300 per day,” all the way to entirely decontextualized responses such as, “While knowing your limits can be challenging, stretching our limits is an important aspect of living a full and productive life.”

Factual misalignment can also arise from simpler causes, such as when the AI application conveys incorrect information that was directly provided to it. This generally happens because the knowledge base that the AI is trained on or fed to is out of date. The good news is that it’s often straightforward to update knowledge bases; the bad news is that it often takes considerable time and effort, which is why this is such a common reason for factual misalignment in AI applications.

Another kind of factual misalignment is when an AI agent reveals information that it is not supposed to reveal. It might have access to pricing information that isn’t public, for example, and reveal that information if asked in the right way. This pitfall is most commonly found in custom or fine-tuned generative models trained on your private data, which is why extreme care needs to be taken with these approaches.

Behavior Misalignment

Beyond whether or not the AI application is hallucinating, you must consider whether it is behaving in a way that could hurt your reputation, land you in legal hot water, or simply fail to engage your users effectively with the services it offers. It’s possible, for example, for an AI agent to get the outcome you want while exhibiting unacceptable behavior.

Brand alignment and outcome alignment, the other types of alignment talked about in generative AI, both boil down to behavior alignment. Is your AI application behaving in a way that’s in line with your brand? Is it getting the outcomes you want, and is it behaving in a way that you find acceptable to get those outcomes?

For example, your agent could report increased sales — a positive outcome — through over-promising or offering unapproved discounts. Behavioral misalignment can also be more subtle, such as when an agent skips crucial parts of service protocols, like informing customers that calls are recorded or asking them to confirm their identity before proceeding to provide service.

So it’s not enough to simply evaluate whether or not the AI agent is achieving the outcomes you want; you also need to make sure it’s behaving in a way that fits with the image you want for your brand, and isn’t undermining the company’s goals and requirements with its behavior.

How To Get the Best Possible Alignment in Your AI Agents

When you’re working on aligning an LLM agent — in realistic, production-grade use cases — you’ll need to give it at least a few dozen instructions, if not hundreds. The big problem here is that LLMs do not think logically, and if you give LLMs too many instructions at the same time, or instructions that are conflicting in any way, it diminishes the LLM’s ability to follow them.

So the first step, when working on aligning your LLM’s behavior, is to be able to sort through all of the instructions dynamically and identify which ones are relevant for a particular conversation and which ones are not. If we can eliminate the non-relevant instructions, that will immediately help focus and align the LLM’s behavior. Parlant, the recently launched open source alignment framework, calls this process contextual guideline matching.

The next step is to put in place self-critique, smart prioritization and conflict-resolution mechanisms. The challenge here is that most LLMs are not able to do this particularly well out of the box, because they are being asked to pay attention to too many inputs that pull their attention mechanism in different directions, equalizing output probabilities and consequently diminishing their behavioral consistency. They can also struggle because LLMs are trained to give priority to the parts of the prompt that come at the end. So prompts need to be researched and tested accordingly, in specific connection with the models they are used on, keeping in mind the models’ various properties.

One of the most practical ways to handle this is to implement a supervision element in your prompts. Parlant implements a new technique it has developed for this purpose, called attentive reasoning queries (ARQs), that helps divert the LLM’s attention back to the relevant parts of the prompt at the right times, to ensure that critical information and instructions aren’t being overlooked. This leads the LLM to give appropriate weight to each of the provided guidelines, no matter where in the prompt it appears, or how long or complex the prompt is.

Maximize Alignment to Reduce Risks

Companies that are serious about using AI agents in a customer-facing capacity should be aware of the risks of different types of misalignment and keep up with the latest techniques and innovations to maximize alignment in their AI agents. Otherwise, at least for some use cases, the risks are simply too high.

An experienced software builder with extensive experience in mission-critical software and system architecture, Yam understands what it takes to create reliable, production-ready software. This background informs his distinctive approach to the development of predictable and aligned AI systems.