Expedia Group Technology - Medium

The Most Expensive Milliseconds Are Unmeasured

Divya Gupta Arora — Wed, 03 Jun 2026 17:11:58 GMT

Expedia Group Technology — Engineering

How a screen-level performance metric reshaped platform decisions, engineering ownership, and release discipline

Photo by Pietro De Grandi on Unsplash

For the last few years, my responsibility has been straightforward to state but hard to execute — owning the traveler login experience across mobile platforms.

Not just whether a feature works, but whether it feels responsive, predictable, and trustworthy in the moments that matter most. During login, those moments are unforgiving: if a login screen hesitates travelers don’t interpret it as ‘a slow render’, they interpret it as risk.

And when the majority of travelers interact through our mobile apps, performance stops being a technical concern and becomes a product promise.

This post is a case study of how, within Expedia Group™’s login domain, we extended Native Time to Interactive (NTTI) across login screens — moving performance from a late-stage check to a first-class signal we can use to validate technology investments, compare platform behavior, and prevent silent regressions as we ship.

The Problem: Mobile reliability rarely fails loudly

Mobile performance rarely breaks with a crash.

It usually degrades quietly,

a button takes a beat longer to respond
a screen looks ready, but taps don’t register
the UI stutters just enough to feel “off”

Those are the expensive milliseconds — because they erode trust without triggering obvious alarms.

In our login flows, we were shipping consistently, evolving a major part of our stack, and supporting increasing product complexity. Yet we didn’t have a consistent way to answer the user’s real question — “When can I actually use this screen?”

Why our existing signals failed

We were not blind. We tracked many useful things,

crashes and ANRs
backend latency and service SLIs
some component-level timing signals
limited Time to Interactive tracking on onboarding and initial login

But we had a gap — we had visibility into system health, not screen readiness.

Once travelers moved beyond the initial login screen into deeper login flows, performance became harder to compare release-to-release. We could detect obvious failures, but we lacked a consistent, screen-level metric that told us,

Did this flow get slower after shipping feature X?
Did the new tech stack improve what travelers feel?
Are iOS and Android behaving similarly, or are we averaging away differences?

And during a major tech stack migration, leadership asked fair questions — Are we actually delivering a better traveler experience ? Or just different architecture?

We needed something more aligned with user-perceived readiness.

What we chose to measure

Expedia didn’t find an industry-standard metric that captured native screen interactivity the way Web Vitals capture LCP and TTI on the web. So we coined a native-inspired equivalent: Native Time to Interactive (NTTI).

Definition: All components that are visible in the initial viewport are loaded and available for interaction.

NTTI measures:
The time from when the screen starts loading to when its key components have loaded.

Key terms

Screen Start: when a new screen starts rendering and the user can’t interact yet (user is effectively blocked)
Key Components: essential UI components that must be ready for meaningful interaction (typically in the initial viewport)
Last Key Component: when the final key component completes render and is ready for interaction

Formula

NTTI = Last Key Component − Screen Start

What is Time to Interactive wrt native devices

How we instrumented NTTI across login screens

Performance instrumentation wasn’t absent before — but it wasn’t consistent across the login journey.

We had:

component-level signals on both platforms
limited screen-level visibility on a few launch flows
gaps across deeper login screens

Extending NTTI across login screens required more than “adding timers.” We had to make deliberate decisions.

Why NTTI worked for us

NTTI anchors to a human outcome:

A screen isn’t “fast” when it renders. It’s fast when it responds.

A screen that draws quickly but doesn’t respond to touch feels broken. A slightly slower screen that responds reliably feels fast.

NTTI helped us measure what travelers actually experience: readiness to interact, not just rendering completion.

The work broke down into four steps

1. Define “interactive” from a traveler perspective

We aligned on what “usable” means for each screen — not in theory, but in practice.
This forced healthy product + engineering conversations: what’s essential vs what’s nice-to-have.

2. Choose key components per screen

We selected a subset of components that represent real readiness.
Not every UI element should gate interactivity; if everything counts, NTTI becomes noisy and less actionable.

3. Standardize measurement across platforms

We ensured the measurement approach was comparable across iOS and Android even though runtime behavior differs.

4. Make the data reliable enough for decisions

A metric is only useful if people trust it. That meant validating instrumentation, watching for gaps, and ensuring consistency across releases.

This wasn’t just observability work. It was building an operational contract: if we ship UI changes, we must be able to measure the impact.

NTTI captures when a screen becomes usable, not just when it finishes rendering.

What we learned: iOS and Android are not symmetric

One of the most important things NTTI made visible is what experienced mobile engineers already know, but organisations often ignore — iOS and Android behave differently even when the UI looks identical.

iOS issues often surfaced as

subtle UI delays
main-thread contention
strict lifecycle + rendering constraints

Android issues often surfaced as

device + OS fragmentation
OEM-specific behavior
hardware-dependent variance

Before NTTI, it was easy to blend results and assume parity. After NTTI, we could see platform deltas clearly and evaluate them fairly.

The same backend and product flow could yield different NTTI uplift across platforms. That wasn’t a product inconsistency — it was platform behaviour and implementation nuance.

Same backend. Same Feature. Different Experience.

NTTI made those differences measurable instead of debatable.

Before vs After performance improvements in Android (46.3% faster)

The unexpected outcome: Better ownership

One of the most valuable outcomes had little to do with the metric itself.

Once NTTI was introduced, teams naturally started to

instrument what they shipped
read dashboards with intent instead of curiosity
connect code changes directly to traveler experience
identify and fix regressions earlier in the cycle

Observability stopped feeling like a downstream activity. It became a core engineering discipline. And because the metric reflected real user experience, performance stopped being “just an engineering concern.” It became a shared responsibility. Product teams aligned on performance as a release guardrail and part of the user experience discussion. Design helped define what “ready” actually meant from an interaction standpoint. Engineering made those expectations measurable and actionable.

Over time, ownership shifted from “Someone will catch this later.” to “If I ship it, I monitor it.” That kind of cultural shift is difficult to enforce top-down.
NTTI helped create it through consistency, visibility, and repeated feedback loops.

How this changes release conversations

Today, NTTI is one of the first signals I look at

before features ramp to wider production
while validating major technical initiatives
during release readiness discussions

We haven’t yet fully adopted NTTI as a formal rollout blocker, but that’s the direction we’re moving toward.

What we intend to do next

Treat NTTI degradation as a “pause and investigate” signal
Require explicit conversations before ramping when performance moves outside expected thresholds
Protect hard-earned performance gains from slowly regressing over time

To be clear, NTTI won’t be the only release signal, but it will become an important guardrail because it forces the right conversations to happen early, while fixes are still practical and inexpensive.

What NTTI tells us — and what it doesn’t

Being explicit about scope has helped us use the metric responsibly.

What we know

NTTI measures screen readiness from an interaction perspective
the same change can yield different outcomes across platforms
it works as an early regression guardrail
it improves release conversations through concrete signals
it increases engineering ownership of performance

What we don’t claim

NTTI doesn’t explain user intent or behavior
it isn’t used in isolation
it doesn’t prove causality beyond what it measures
platform parity is not assumed or enforced

Closing thought

NTTI didn’t just provide another metric, instead it gave us

earlier visibility into traveler friction
a concrete way to validate tech stack investments
a growing culture of engineers owning performance
a shared language for cross-platform performance conversations

During login where trust, speed, and reliability are inseparable this discipline matters. Performance isn’t something we optimize later. With NTTI, it’s something we design for from day one.

Learn about life at Expedia Group

The Most Expensive Milliseconds Are Unmeasured was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Just Because We Can Build It, Should We?

Rick Fast — Tue, 05 May 2026 11:01:02 GMT

Expedia Group Technology — Platform

How AI changed the build vs. buy equation, and why discipline matters more than ever

Photo by Ali Kazal on Unsplash

Agentic coding tools and AI-native workflows have changed what’s possible for platform engineering teams. I lead Platform Engineering at Expedia Group™, one of the world’s largest travel technology companies. We power brands like Expedia®, Hotels.com®, Vrbo®, Orbitz®, and Travelocity®. My organization builds the technology that all of these brands and their partners run on: APIs, data and AI services, developer tools, CI/CD infrastructure, and the shared capabilities that let thousands of engineers ship reliably at scale.

For many greenfield problems, it now feels like we can build almost anything from scratch with a fraction of the effort it used to take. But that creates a harder question: just because we can build it, should we?

The temptation is real

When you lead a platform organization and your engineers suddenly have access to powerful agentic coding assistants, the possibilities open up fast. It’s tempting to look at SaaS products your company is paying for and think, “We could build that in a weekend.”

And honestly? In many cases, you probably could stand something up quickly. The initial build cost has dropped through the floor.

But building something and owning something are very different decisions. That’s where the new calculus gets interesting.

The hidden cost of owning software

Vendors aren’t just shipping features. They’re maintaining the software, fixing bugs and edge cases you haven’t thought about yet, and improving the UX across thousands of customers with different requirements. Mature platforms have decades of accumulated work behind them: production hardening, compliance, operational tooling, and long-tail requirements you’d only discover slowly through real-world use.

You can’t clone that maturity quickly, even if the initial build feels cheap thanks to AI. The build side of the equation has changed dramatically, but the own-and-operate side has not changed nearly as much.

There are cases where the calculus does shift. Some products are undifferentiated and low value; they don’t justify ongoing spend when simpler alternatives exist. For those, it may make sense to stop paying and either build a lightweight replacement or simplify. But that’s a very different posture than “we can build everything now, so we should.”

Our AI platform strategy: what we actually did

Last year, we made an explicit strategic decision in our AI platform work: stop implementing everything from the ground up internally, and instead lean into composable, external building blocks.

We had previously built bespoke AI platforms in-house. Custom orchestration layers and gateways, proprietary memory systems, internal agent frameworks. They worked, but they came with heavy maintenance costs and, more importantly, they fell behind the pace of the broader AI ecosystem.

So we changed direction:

We adopted an AI platform architecture that uses open standards and vendor ecosystems, such as n8n and LangSmith, rather than proprietary abstractions.
We simplified our LLM proxy layer by swapping in an off-the-shelf routing product (LiteLLM) instead of maintaining a fully custom implementation.
We replaced our in-house agent memory system with a cloud-native composable alternative (such as AWS’s Agent Core memory), treating it as a pluggable component rather than another platform we’d have to run forever.

The principles behind these moves:

Composability matters most. Our ecosystem is complex. We need building blocks that fit together cleanly across dozens of teams and product lines.
Don’t abstract away vendor progress. The AI space moves extremely fast. If you build thick proprietary abstractions on top of vendor frameworks, you’ll always be behind their latest capabilities. You end up maintaining a translation layer instead of shipping value.
Turnkey integrations, thin layers. Make integration easy and batteries-included, but don’t build heavy opinionated layers that freeze you in place.

Since the pivot, teams have picked up new AI capabilities faster, we spend less time rebuilding what others already provide, and we can actually keep up with improvements from frameworks and vendors as they ship.

The broader lesson: what this means for platform teams

This applies well beyond AI platforms. As a platform engineering organization, building platform technology is our core job, and agentic tooling has made that a lot easier. But that doesn’t mean everything we could turn into a platform should become one.

If you over-platformize, you risk creating an internal marketplace of cool tech that isn’t aligned with near-term business goals and doesn’t have clear owners or long-term roadmaps. You anchor to local inventions that the broader industry doesn’t converge on, so you’re maintaining something proprietary while everyone else moves in a different direction. You raise the abstraction bar so high that teams can’t adopt new external capabilities when they emerge. In short: you can end up slowing your company down in the name of moving fast.

How we decide what to build

Here’s the framework we’re applying across Platform Engineering at Expedia Group.

Start from team priorities, not technical possibilities

Before building a new platform capability, ask: What specific team priority does this support? What goals are we trying to achieve in the next 12–18 months? Is this the best use of time for those goals, or just an interesting technical opportunity?

Build for your own domain first

When you build, optimize for your domain problem: your runbooks, your incident patterns, your critical flows. Don’t pre-optimize for everyone else’s domains. Other teams now have access to the same AI building blocks and can solve their own problems. If your solution becomes obviously valuable and reusable, you can look at platformizing it later, deliberately.

A good mental check: if this didn’t exist, would my team still have a hard, recurring problem next quarter? If the answer is “not really,” it’s probably not where you should invest platform cycles.

Favor composable primitives over monolithic platforms

Across our AI and platform work, the primitives that are working best:

Skills: Small, composable chunks of procedural and domain knowledge. Easy to share across repos, tools, and teams without creating hard dependencies.
API servers with standard protocols + CLI’s: Standardized ways to expose domain capabilities into agentic workflows. Model Context Protocol (MCP) is an open standard for connecting AI assistants to external tools and data, and it’s gaining real traction here. It works naturally for both coding and non-coding use cases.
Agents designed for compatibility: Scoped to specific problem domains or workflows (CI/CD, incident response, deployment orchestration) and built to load the right skills and connect to the right services automatically.

These are low-regret bets. They compose well with multiple tools and coding agents, stay close to open standards, and are easier to replace or upgrade than monolithic platforms.

A concrete example: Picture a CI/CD agent that picks up work after a pull request is merged and handles higher-level deployment workflows: interpreting pipeline configurations, managing rollbacks, handling approvals, calling standardized deployment APIs. Back it with composable skills and protocol-compliant service integrations, and you get something that stays close to existing platform primitives, evolves as your deployment stack evolves, and can be adapted by other teams without hard-coding all logic into a single bespoke system.

The buy side: same discipline, different direction

This same calculus applies to vendor and purchasing decisions. In today’s AI market, lots of vendors want you to centralize all your data in their ecosystem and use their proprietary dashboards, agents, and knowledge graphs.

In a space moving this fast, that creates a different kind of lock-in. Not just with a product, but with your data, workflows, and mental model of how you operate. We’re cautious about all-or-nothing stacks that try to own the entire vertical and platforms that don’t let you swap components in and out.

Our preferred vendor pattern: composable offerings where we can plug a component into our stack, swap it out later if needed, and we’re not forced to centralize everything into a single closed ecosystem. We want the flexibility to adapt as the AI space shifts, because it will shift, repeatedly, and probably faster than any of us expect.

The bottom line

The calculus of build vs. buy is changing. AI and agentic tools have altered what’s possible and how quickly platform teams can move. But discipline matters more than ever.

Here’s what we’re telling our teams:

Keep innovating, especially on your own workflows. Anything that improves your team’s productivity or simplifies your day-to-day is worth doing, even if it never becomes a broad platform product.
Be intentional about what you turn into shared platform tech. Ask what domain problem it solves repeatedly, whether it’s best expressed as a composable primitive, and whether it aligns with the goals you’ve already committed to.
Stay close to open standards and vendor primitives. This keeps you compatible with the tools your engineers already gravitate toward and lets you pick up new capabilities as they ship.
Don’t make yourself the bottleneck. Your job as a platform team is to amplify industry progress for your company, not get in the way with heavy bespoke platforms that age badly.

Our competitors have these same capabilities now. The difference won’t be who can build the most. It’ll be who applies that ability most deliberately to the software that actually helps them win.

The author leads Platform Engineering at Expedia Group, where the team builds the technology platform powering travel for millions of people across Expedia, Hotels.com, Vrbo, and other brands worldwide.

Just Because We Can Build It, Should We? was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Expedia’s Service Telemetry Analyzer

Nikos Katirtzis — Tue, 28 Apr 2026 11:01:01 GMT

Expedia Group Technology — Engineering

A system that facilitates investigation of service degradations and outages using service telemetry data and AI

Photo by Evangelos Mpikakis on Unsplash.

The recent advancements in the artificial intelligence space make us re-evaluate how work is done. From programming, to designing systems, or even operating them in production. While there is considerable focus on automating programming, one area which could undergo transformation is how we monitor and operate our systems and services.

A few of us came together and designed Expedia’s® Service Telemetry Analyzer (STAR), an early iteration of a system that facilitates investigation of service degradations and outages using service telemetry data and AI models and techniques.

Expedia’s Service Telemetry Analyzer (STAR)

The early product offering includes:

Execution of multi-step workflows.
Integration of software and systems engineering knowledge, including application and infrastructure, cloud, containerization, and orchestration patterns, into diagnostic workflows for complex distributed systems.
Application of domain-specific prompt engineering for metric and root cause analysis.
Utilization of advanced off-the-shelf AI models.
Implementation of prompt engineering techniques, including role prompting, prompt chaining, and generated knowledge prompting.

Design

The product offering is a web-based service that provides an application programming interface (API). While AI agents and chatbots are gaining traction, we aimed to start with something a) simple, b) precise (to a certain extent, considering the potential hallucinations of the models), and c) that avoids the additional and currently less understood failure modes of an agent. As this field evolves, we will continue to iterate on the design.

Therefore, there is limited context engineering beyond domain-specific prompts; for instance, there is no support for function calling / tool use, short-term and long-term memory, or retrieval augmented generation (RAG). The system provides vertical domain-specific workflows. It adheres to a predefined multi-step process, with an emphasis on automating scenarios encountered by our engineers and enhancing the system’s precision. If you are interested in tool use with model context protocol (MCP) servers for software development, you can read more in my public blogpost.

Web service

The architecture is relatively straightforward, comprising an API layer and a web server built with FastAPI. This service manages requests to Expedia’s chosen metrics platform (Datadog) and to the internal generative AI proxy, including authentication/authorization.

Web architecture behind STAR

AI models

The service invokes models via Expedia’s generative AI proxy. The proxy offers access to different models which we constantly evaluate for quality of results, cost, and performance implications. We are also exploring using different models for the various tasks in STAR. The use of large language models (LLMs) for any tasks was convenient for the prototype but it would be more effective to use specialised models for the different modalities of telemetry data and slower reasoning models for the final RCA.

Prompt chaining

Part of the implementation involves prompt chaining, which facilitates a programmatic dialogue between the user and the assistant.

Prompt chaining; programmatic dialogue between the user and the assistant

Multi-step workflows

Overall, STAR provides multi-step workflows, which are visualized below. In specific:

It collects telemetry data.
It analyzes these metrics and the associated metadata using AI models and domain-specific prompts and rules.
It aggregates all analyses and conducts a final root cause analysis.
It returns insights and recommendations.

Multi-step Reasoning Process implemented in STAR

Ingested data

The initial focus on observability metrics was on infrastructure components, with a particular emphasis on Kubernetes and JVM for two reasons: our heterogeneous tech stack and the higher degree of standardization at the infrastructure layer.

The default analyzer now ingests metrics including inbound and outbound traffic and errors, latency across various protocols like HTTP, gRPC, and GraphQL, and saturation monitored through container-level CPU and memory usage.

Additionally, the system ingests Kubernetes metrics, such as container restarts and probe failures, as well as JVM metrics for heap usage and garbage collection. This set of signals is tailored to our environment, where most services are backend JVM applications running on a Kubernetes-based compute platform.

Implementation details

While designing this system we faced a set of of interesting problems which may be useful to the reader.

The nuances of token-heavy systems

When we first designed STAR, LLM tooling was limited. Given STAR is a token-heavy system and in order to understand the feasibility and implications of this, we followed a systematic approach for back-of-the envelope estimation, grounded in facts, assumptions, and enforced limits.

We estimated the number of tokens using OpenAI’s GPT-4o tokenizer. For this, we took into account any payload sent as context to the models. This included fixed-length prompts for system prompts and the chain-of-prompts, as well as prompts the length of which depends on previous responses. To control the number of tokens we capped each response to 4k tokens. This number was then used for estimation purposes.

Based on this analysis and the relatively static nature of the system, we concluded that we can accommodate the context window size. Note that this differs between models and has been increasing over time.

Datadog and generative AI proxy limits

Both Datadog and Expedia’s Generative AI proxy have rate limiting in place. Even though the scale is still small and the number of metrics per workflow is fixed, we accommodate these limitations using common resiliency patterns, while also leveraging asynchronous operations and batch processing.

Architectural evolution

This service is mostly I/O bound, but we still have synchronous operations. Each analysis is independent, yet we need to provide a response on the status of the analysis to the user. For this, we initially used certain features from FastAPI such as async/await and background tasks. As part of scaling up, we moved to Celery with Redis acting as the broker and result backend to store the state and results of tasks. This architecture aligns with STAR’s request-response flow, and we don’t need a streaming platform like Kafka, at least for now.

Use cases

Numerous use cases could emerge for such a system. Below is a summary of how we have utilized STAR so far.

Incident investigation

The primary use case and the rationale behind the design of STAR. Our objective with this service was to minimize the time to know (TTK) and time to recover (TTR). By enabling rapid analysis of observability data and evaluation of hypotheses, this service proved to be a valuable time-saving tool. We applied STAR to several services that experienced outages.

Post-incident root cause analysis

Following an incident, teams file a ticket for post-incident review. By running STAR for the affected service(s) and the time-window of the incident, we can provide an initial analysis. This can then be reviewed and supplemented by human expertise.

Troubleshooting

Engineers spend a significant amount of time troubleshooting systems. Over time, Expedia’s reliability engineering group has documented troubleshooting steps in the company’s internal reliability hub. A logical step was to implement guides relying on metric data as workflows in STAR.

Our first addition was the process our engineers normally follow for troubleshooting container restarts in our Kubernetes-based compute platform. An indicative analysis result is available at https://gist.github.com/nikos912000/1e489021b406f682d70c14f3ebbad917.

Performance optimization

This is a recent use case that we are still evaluating. An Expedia service faced an issue where the JVM memory heap usage would suddenly spike. Such occurrences can be problematic; while container restarts can temporarily mitigate them, they expose long-standing issues that may lead to incidents.

Running STAR for this service provided a valuable analysis which was then reviewed and taken forward by the owners of the service.

Failure injection recommendation and analysis

Another idea involves recommending failures to inject and analyzing the impact of injected failures utilizing Expedia’s chaos engineering platform. When we developed this platform, we lacked a mechanism for the automatic evaluation of experimental results. STAR could serve as a complementary tool to this platform.

Evaluation

We are still in the early stages of evaluating the system. Given the complexity of this domain, we mostly rely on qualitative human assessment which includes subject matter experts (SMEs) and users. We also use Langfuse for prompt management, evaluation, and tracing. The results so far have been promising.

Next steps

As we iterate through this early prototype, our emphasis is on identifying high-leverage use cases, improving testing and evaluation, and adapting to this rapidly evolving field. As it was mentioned earlier, this is still a static system rather than a sophisticated multi-agent architecture, lacking core elements of context engineering. It may benefit from tool use through MCP servers and from additional context such as service documentation, metadata, or the dependency graph of the targeted service. In the future we could also expose a conversational interface.

The initial concept and implementation of this project were proposed by Lasantha Kularatne following discussions regarding recent advancements in the field of Artificial Intelligence and their application at the intersection of software and systems engineering. Also thanks to Sundeep Bhatia, Rahul Gupta, Gianpi Colonna, and Kaushik Patel.

Learn about life at Expedia Group™

Expedia’s Service Telemetry Analyzer was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Reimagining Platform Engineering for an Agentic Future

Rick Fast — Tue, 07 Apr 2026 12:50:31 GMT

Expedia Group Technology — Engineering

When your platform’s next user isn’t human

Photo by Alex Vasey on Unsplash

Earlier this month I hosted a town hall for Expedia Group™ Platform Engineering organization, focused on the rapid progress happening in the agentic coding space, and what it means for us as engineers and as a platform team.

Our teams are responsible for the horizontal foundations that power Expedia Group: AI and analytics, data, user experience platforms, edge and API platforms, cloud and infrastructure, as well as EG’s developer experience.

In other words, we own the “platform of platforms” that thousands of engineers build on every day.

Since late last year, with the arrival of models like Opus and modern deep agent harnesses, we’ve been riding a pretty intense wave of change. Larger context windows and more capable agents have made several things clear. A huge amount of what we call “engineering work” can now be done by agents. The real leverage for humans is shifting toward product thinking, architecture, system design, and context. Our platforms, which were designed for humans, are not yet ready to support agents as a distinct user group.

This post is about what it means to run a large‑scale platform organization in that world, and how we’re retooling our stack, our interfaces, and even our mental models to support both humans and agents at the same time.

The change curve for senior engineers

For many engineers, especially those who’ve been in the industry for decades, this isn’t just a new toolchain; it’s a new inner loop. We’re asking people to delegate more of the “typing” to agents, spend more time on what we’re building and how it fits into the larger system, and learn how to collaborate with agents as teammates, not just as autocomplete.

That would be hard enough in a greenfield startup. In a company that runs a large chunk of the online travel ecosystem, it’s even harder. We still must keep the planes in the air: keep sites up, pipelines flowing, and revenue‑critical systems healthy.

Finding the time and headspace to change how you work while you’re doing all of that is non‑trivial.

One experiment we ran to break through that inertia was something we called Ralphathon, a “no‑coding‑allowed” hackathon designed to encourage people to explore what these tools could do without falling back to old patterns. The goal wasn’t to build production‑ready projects; it was to spark curiosity, lower the barrier to experimentation, and give people a safe space to rethink their workflows.

It worked well enough to confirm a hypothesis: you can’t just tell people to change their inner loop. You have to create the right environment and incentives so they can feel the new way of working.

Double duty: Transforming ourselves and the platform

Platform engineering is on double duty.

On the one hand, we’re going through the same personal and professional shift as every other engineer: How do these tools change our careers? How do we work more efficiently, productively, and creatively? What does “senior engineer” mean in a world where agents can write most of the code?

On the other hand, we’re responsible for a platform that was originally built for humans. Expedia Group has been around for decades. Our architectural decisions were optimized around human constraints.

Take microservices. We didn’t split everything into tiny services because it was faster or cheaper. We did it so:

Teams could deploy independently,
Work could be done autonomously, and
We didn’t have thousands of people all crowding into a single repo and stepping on each other’s toes.

That all made sense in a human‑only world. But in an agentic world, some of those same decisions become friction: CI/CD flows that assume a human is in the loop, SDKs and libraries optimized for human ergonomics rather than agent ergonomics, and UIs that are the only way to perform certain critical operations. Our job now is to fast‑forward the platform beyond “SDKs + UIs + some APIs for humans” into something that works just as well, if not better, for agents.

And we must do that while supporting thousands of engineers who are also trying to navigate the same transition.

How to Expedia, for agents

A concrete example: our design system.

We’ve invested heavily in component libraries across iOS, Android, and web. They allow teams to build reusable modules without worrying about which consumer or B2B brands they’re targeting. The theming, styling, and look‑and‑feel are all handled by the platform. That’s a core capability for a multi‑brand travel platform, and it’s not going away.

But over time, we layered proprietary abstractions on top of standard tech: custom rendering frameworks on top of React, patterns and helpers that are unique to us, and other undifferentiated technology that only exists inside our walls.

Humans can learn those abstractions. Agents, on the other hand, were trained on the open ecosystem. They “just know” things like Vite, Tailwind CSS, and popular off‑the‑shelf UI libraries and patterns. When we ask an agent to work inside our design system, it can do it, but it doesn’t feel natural. The path of least resistance is to reach for the tools it understands best, which might be outside the guardrails we want it to stay within.

That’s the first kind of problem we must solve: not “How do we add more abstractions?” but “How do we remove unnecessary ones so both humans and agents can work with the same primitives more easily?”

The hard part isn’t just writing code

The second category of friction shows up after the code is written.

Even with current tools, we’re seeing clear gains in how much software can be produced in a given amount of time. Agents are extremely good at generating boilerplate and glue code, translating patterns across services and stacks, and filling in implementation details once the architecture is clear.

But there’s a lot that happens between “code compiles” and “value in production”, including deploying services, running CI/CD pipelines, monitoring workloads, and debugging integration issues across a highly distributed ecosystem.

Agents are ready and willing to help here, but we haven’t always given them a clean way in.

AI finds a way (whether we like it or not)

One of the more interesting (and slightly unsettling) things we’ve seen is how creative agents can be when you don’t give them proper interfaces. People have asked agents like Claude or others to “watch my pipeline” even when there is no standardized API or integration to do that. The only official way is through a web UI.

What happens?

The agent figures out how to log in through the browser, inspects cookies and session state, then navigates the UI as a human would, just faster and more persistently.

If there’s any way to accomplish the task, the agent will try to find it. From a platform perspective, that’s a signal: if you don’t give agents ergonomic, well‑defined interfaces, they’ll route around you.

We don’t want to block agents from doing useful work. But we also don’t want our critical operations mediated through brittle browser automation held together by HTML and guesswork.

So we’ve started to optimize for agent guardrails, not agent roadblocks.

Designing for agent ergonomics

This is where agent ergonomics becomes a core design principle. Instead of assuming a human is clicking around in a UI, we’re investing in channels and surfaces that are built for agents from day one. That includes:

CLIs designed for agents, not humans
We’re building command‑line tools whose primary consumers are agents. They are:

Easy to introspect.
Consistent in their inputs and outputs.
Wired into the systems agents need to operate, including CI/CD, repo management, Kubernetes workloads, logs, and more.
If there’s a simple, well‑documented way to perform an action via a CLI, an agent will generally prefer that over trying to reverse‑engineer a browser UI.

One of our first big steps here is a new CLI we call Tarmac. It’s an agent‑centric interface that exposes CI/CD operations, repository management, workload monitoring in our Kubernetes clusters, and log exploration and related workflows. Tarmac is built around a simple premise: if an agent can see a clean, structured way to operate the platform, it won’t need to invent its own.

MCP servers and agent‑native protocols
We’ve been aggressively rolling out Model Context Protocol (MCP) servers and a registry for our core platform capabilities so agents can:

Discover what’s possible.
Call well‑defined operations.
Get structured responses instead of scraping pages.

Reducing agent friction (because it also costs money)

Even when agents can figure things out, there’s still friction: it takes more steps, burns more tokens, increases latency, and adds complexity to prompts and orchestration. Agent friction is just another form of platform friction. We’ve always cared about human ergonomics, i.e. reducing the number of manual steps it takes to ship something. Now we’re doing the same thing for agents.

One of the most effective tools we’ve found here is markdown‑based skills and agent definitions. We encode how to use platform capabilities in a format agents naturally understand by pre‑packaging the “tribal knowledge” that used to live in docs, chats, or individual brains. That lets agents discover and reuse these skills instead of rediscovering how to talk to each service from scratch.

The net result: agents spend less time fumbling around and more time doing useful work. And the cost profile of that work improves.

Internal apps as a safe testbed

Some surfaces are too critical or too complex to hand over to agents today. A full consumer‑grade Expedia or Hotels.com experience involves hard production constraints that go far beyond what a Tailwind‑powered prototype can handle.

But internal apps are a very different story. Today, we have Backstage instances running for certain developer experiences, a variety of bespoke UIs built by different platform teams, and a lot of “pretty good” tools that, collectively, still add up to “too many interfaces.” From a builder’s perspective, this means context‑switching between a lot of different places to get anything done.

This is where agents like Claude shine. They’re very good at stitching together internal APIs, they can generate credible UIs quickly when the risk profile is lower, and ignoring which team owns which service. They only care about the underlying concepts.

We’ve started to lean into that by building a new internal experience we call Koda. Koda lives in a monorepo dedicated to agent‑built internal apps and focuses on context engineering, not on building yet another monolithic platform. It lets agents discover existing APIs, simplify authentication flows on behalf of users, and assemble a developer console organized around what developers care about, rather than around our org chart.

Our role, as humans, shifts from “build the whole thing by hand” to “define the principles and guardrails that agents use to construct and reconstruct these UIs from the ground up.”

Where we’re heading, quickly

Put all of this together and you get a picture of where we’re aiming over the next year.

1. Keep the core capabilities battle‑tested

We’re not trying to rebuild our foundations from scratch. Our core capabilities are differentiators: Deploying and running large‑scale applications, performing analytics over massive datasets, running both streaming and batch data workloads at scale, and powering high‑quality user experiences across hundreds of tenants and brands. Those components need to remain robust, reliable, and boring in the best possible way.

2. Radically change the surface

Where we do want radical change is at the surface area. We want to make APIs and auth consistent across the platform, ensure we have agent‑native channels such as CLIs like Tarmac, MCP servers, and other structured interfaces designed for automation, and package skills and agent markdown in a holistic way so builders, human or agent, can easily discover and use the platform’s capabilities. If you squint, the goal is simple: make it just as natural for an agent to operate our platform as it is for a human today.

3. Embrace new failure modes

As more agents interact with our systems, we’re going to discover new failure modes, edge cases, and new ways systems can be misused or overused. Instead of fearing that, we’re treating it as training data for the platform: When we see agents hacking around UIs, we respond by adding better agent interfaces. When we see complex, brittle flows, we respond by simplifying contracts and tightening guarantees. When we see friction, we respond by improving ergonomics, for both humans and agents. Over time, that feedback loop should give us more consistent interactions and more resilient infrastructure.

Closing thoughts

If you run or work in a platform organization today, you’re probably feeling some version of this same tension: You have a ton of battle‑tested capabilities that already work at scale and a new generation of agents that are eager to help but constrained by surfaces that weren’t built for them. You also have humans who need to reinvent their inner loop while keeping the lights on. The temptation is to treat agents as a thin layer on top of your existing tools, a smarter autocomplete or a chat window bolted onto a legacy UI.

My view is that agents are a new class of user, and we need to treat them that way with dedicated interfaces, real ergonomics, and the same care we would give a human user operating at scale.

At Expedia Group, we’re trying to meet that moment by doing double duty: reshaping how we as engineers work and reshaping the platform that our humans and our agents will share.

We’re early in that journey, but one thing is already clear: if you don’t design for agents explicitly, they’ll find their own way in anyway. Better to invite them in through the front door.

Learn about life at Expedia Group

Reimagining Platform Engineering for an Agentic Future was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Operating Trino at Scale With Trino Gateway

Prakhar Sapre — Tue, 24 Mar 2026 12:01:00 GMT

Expedia Group Technology — Data

Workload‑aware routing for Trino

Photo by Joseph Barrientos on Unsplash

Trino — a fork of PrestoSQL — is a powerful tool in modern data analytics, enabling organizations to query large datasets quickly and efficiently. As a distributed SQL query engine, Trino provides fast, scalable insights without requiring data relocation. While Trino is robust on its own, its capabilities are further enhanced when paired with a Gateway, which introduces features such as query routing, strong security, and streamlined cluster management.

A brief overview

The Gateway project originated at Lyft as Presto Gateway, serving as a proxy and load balancer for PrestoDB. It was later forked and integrated into the Trino ecosystem, with contributions from various organizations and the open-source community. The Gateway serves as a central point for managing and routing queries, providing a unified interface for users and administrators.

As organizations scale their analytics platforms, they often encounter challenges such as increased query complexity, higher concurrency, and the need for specialized cluster configurations. Directing users to specific cluster endpoints becomes impractical as the user base grows. A Gateway addresses these challenges by routing queries to the most appropriate clusters based on workload, improving efficiency and responsiveness.

The Gateway acts as a vital intermediary between users and the Trino query engine. By abstracting the complexities of distributed query execution, it manages critical functions such as routing, authentication, and load balancing across diverse backend clusters. This ensures that queries are efficiently directed to the optimal processing cluster. With an intuitive user interface, the Gateway transforms what was once a convoluted process into a manageable and transparent experience empowering administrators with real-time insights and precise control over their backend cluster infrastructure. Whether it’s monitoring cluster health, usage statistics, managing query history, viewing and modifying routing rules on the fly, or managing backend clusters, the Gateway is engineered to simplify operations while enhancing system performance.

Advantages of using Trino Gateway:

Use of a single connection URL for client tool users with workload distribution across multiple Trino clusters.
Automatic routing of queries to dedicated Trino clusters for specific workloads or specific queries and data sources.
No-downtime upgrades for Trino clusters behind the Gateway in a blue/green model or canary deployment model.
Transparent change of capacity of Trino clusters without user interruptions.

Example Trino Gateway architecture in Trino ecosystem

Example Cluster types and workload segregation

A common pattern for organizations using Trino at scale is to manage a fleet of clusters across different environments. These clusters can be categorized into shared and team-specific clusters, or by workload type. Typical cluster categories include:

Adhoc Clusters: Often used to handle a variety of query complexities and data volumes with medium concurrency. These clusters are versatile, supporting mixed workloads that range from simple to moderately complex queries. They provide a balanced environment for exploratory data analysis and development tasks.
ETL Clusters: Optimized for high-volume, highly complex queries with low concurrency. These clusters are fine-tuned for heavy data processing tasks such as data integration, transformation, cleansing, and enrichment. The primary goal is to prepare optimized datasets for downstream consumption.
BI Clusters: Tailored for low-complexity queries with high concurrency. These clusters may be configured to refresh pre-aggregated data and support BI tools like Tableau and Looker. High concurrency ensures that multiple users can access dashboards and reports simultaneously without performance issues.

This division ensures that each team or workload can optimize cluster configurations for their specific needs.

Example Use Cases for the Gateway

The following examples illustrate how organizations might use gateway based routing in general analytics environments.

Separating Large Table Queries
A common challenge is that queries against massive tables can cause significant delays for smaller queries on shared clusters. By routing large table queries to specialized clusters, organizations can ensure that smaller queries execute without being queued behind resource-intensive operations. This segregation can significantly improve response times for most users.
Separating Metadata Queries
Metadata queries such as select version() or show catalogs are frequently run by BI tools to check cluster health. Failures in these queries can lead to subsequent extract failures. By implementing routing rules that direct metadata queries to a lightweight, single-node Trino cluster, organizations can reduce extract failure rates and improve dashboard load times. This also allows for more effective application of user-level limits on concurrency and memory, leading to fewer errors and better overall performance.
Routing BI Tool Queries
BI tool queries can sometimes end up on clusters not optimized for BI workloads. By using routing rules to divert these queries to dedicated BI clusters, organizations can optimize performance without requiring users to change their configurations.

Routing Rules

Routing rules are at the heart of the Trino Gateway’s ability to direct queries to the most suitable backend clusters. These rules are written to inspect incoming queries and determine the best cluster for execution, based on factors like the tables being queried, the type of query, or the source application. Below are some practical examples that illustrate how routing rules can be used to optimize performance and reliability.

1. Routing Large Table Queries

When users run queries against very large tables, these operations can consume significant resources and slow down other queries on shared clusters. To prevent this, we can create a routing rule that detects when a query targets specific large tables and redirects it to a cluster optimized for heavy workloads.

name: "large-table-query" 
description: "Route queries for large tables" 
actions:   
- |     
  foreach (table : trinoQueryProperties.getTables())     
  {       
    String tableSuffix = table.getSuffix();       
    if (tableSuffix.contains("table1") || tableSuffix.contains("table2")) 
    {         
      result.put("routingGroup", "large-cluster");         
      return;       
    }      
  } 
condition: "true"

2. Routing Metadata Queries

Metadata queries are lightweight queries that retrieve information about the database itself, such as select version() or show catalogs. These are often run by BI tools to check the health of the cluster before loading dashboards. If these queries fail or are delayed, it can cause subsequent data extracts to fail as well. We can create a routing rule to detect such queries and route them to a metadata cluster.

name: "metadata-queries" 
description: "Routes any select version or select 1 type of queries to the metadata cluster" 
condition: "true" 
actions:   
- |     
  if (trinoQueryProperties.getBody().toLowerCase().contains("select version()") || trinoQueryProperties.getBody().toLowerCase().contains("show catalogs")) 
  {       
    result.put("routingGroup", "metadata-cluster");       
    return;      
  }

3. Routing Queries from BI Tools

Sometimes, users may inadvertently run BI tool queries (from applications like Tableau or Looker) on clusters that are not optimized for BI workloads. To address this, we can create a rule that detects the source of the query and routes it to the appropriate BI cluster.

name: "trino-gateway-bi" 
description: "Route queries coming from BI tools to Gateway BI instance" 
condition: 'request.getHeader("X-Trino-Source") contains "Tableau" || request.getHeader("X-Trino-Source") contains "Looker"' 
actions:   - 'result.put("routingGroup", "gateway-bi")'

Display routing rules and modify them through the UI

The Gateway’s routing rules are critical for directing queries to the appropriate Trino clusters. Previously, managing these rules often involved directly editing configuration files, a process prone to errors and requiring technical expertise. Also, there was no way for someone to see what rules are configured, rule inspection was required reviewing configuration files or examining the gateway environment directly. Both of these ways were not very user friendly. This made it challenging for administrators to quickly adapt routing rules to changing workloads or cluster conditions. The routing rules are the crux of the gateway since it depends on them to route the queries to the right backend cluster and it’s very important for them to be easily visible to the admins.

We have contributed a user-friendly way to display and edit routing rules directly within the Gateway’s UI. Administrators can now easily view existing rules, modify them as needed, and save the changes. These changes are persisted when a shared storage is used, ensuring that the routing rules are always up-to-date. This eliminates the need for manual configuration file editing, reducing the risk of errors and empowering administrators to easily manage routing based on their specific requirements.

Routing rules

Show and modify routing rules from the UI

Add source filter to the history page

Analyzing query history is essential for understanding and tracking usage patterns. However, before our contribution, filtering this history based on the source of the query (e.g., the application that initiated it) was a cumbersome process. This made it difficult to pinpoint queries originating from specific applications or users, hindering debugging and analysis efforts.

To address this, we implemented a source filter directly on the Gateway’s history page. This new filter allows users to easily select and filter queries based on the source client. Now, administrators and developers can quickly isolate queries from specific applications, making it much easier to diagnose issues, understand query patterns, and optimize performance. This simple addition significantly streamlines the process of analyzing query history.

Source filter

Add Source filter to History page

Display health of the cluster

Monitoring the health of the backend Trino clusters is paramount for ensuring optimal performance and stability. Previously, the Gateway only provided a simple active/inactive switch for each cluster, offering limited insight into the actual health status. Previously, this only allowed you to activate or deactivate cluster health checks, but did not indicate whether the cluster was ready to accept queries. This made it difficult to proactively identify potential issues or understand the overall performance of the backend clusters.

To address this, we implemented a more comprehensive cluster health display on the cluster page of the Gateway UI. Administrators can now get a quick view of the health of each backend cluster, enabling them to proactively identify potential problems, optimize resource allocation, and ensure the smooth operation of their Trino environment.

HEALTHY — A Trino cluster shows this state when health-checks report the cluster as healthy and ready. RoutingManager only routes requests to healthy clusters.

UNHEALTHY — A Trino cluster shows this state when health-checks report the cluster as unhealthy. RoutingManager does not route requests to unhealthy clusters.

PENDING — A Trino cluster shows this state when it is still starting up. It is treated as unhealthy by RoutingManager, and therefore requests are not routed to these clusters.

Cluster status

Display cluster health status on cluster page

Show query text in a separate window

Previously, the Gateway truncated query text to 200 characters for storage and display, forcing users to visit the originating cluster UI to view full queries. To make this more user friendly we added a feature to remove the 200 character limit as well as open the entire query text in a separate window. This allows users or admins to check all their queries on the Gateway side instead of going to the cluster UI always. This also improves the readability of the query.

Query text window

Open query text in a separate window

Conclusion

The contributions above represent efforts to make the Gateway more user-friendly and powerful. By simplifying tasks like filtering query history, managing routing rules, and monitoring cluster health, these enhancements empower Trino users to focus on their data analysis tasks rather than wrestling with complex configurations. This work provided valuable learning opportunities while collaborating with the open-source community. We are excited about our contributions and will continue to add value to the project so that everyone can benefit from it. The project is available on https://github.com/trinodb/trino-gateway.

Operating Trino at Scale With Trino Gateway was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Managing Technical Debt: Building Habits for Long-Term Agility

Rafael Torres — Tue, 03 Mar 2026 12:01:01 GMT

Expedia Group Technology — Engineering

A lightweight framework for balancing speed today with agility tomorrow

Photo by Benjamín Gremler on Unsplash

As engineers, we’re often encouraged to “move fast” — and for good reason. Delivering value quickly is critical. But running fast usually means taking shortcuts: skipping tests, hardcoding a value, bending an abstraction a bit too far. Individually, these tradeoffs feel small. Collectively, over time, they pile up into something heavier: technical debt.

If left unpaid, technical debt will eventually slow us down more than those shortcuts ever sped us up. The question isn’t if we’ll incur debt, but how we’ll manage it.

From “Pain-Driven Development” to today

Years ago, I worked with a team that used the term pain-driven development well before I ever saw it mentioned elsewhere. The idea was simple: anytime you hit a painful development experience — slow builds, brittle tests, awkward APIs, manual deployment steps — write it down. Back then, we’d jot these on a whiteboard. Towards the end of each sprint, after delivering our commitments, we’d look at that board, pick the top pain points, and fix them. This practice not only surfaced technical debt early, before it became costly to address, but also revealed opportunities to improve our tools and processes in ways we might not have noticed otherwise.

This ritual created space for re-engineering and innovation:

Refactoring solutions for greater maintainability and long-term flexibility.
Early automation scripts that grew into full build and deployment pipelines, long before CI/CD became standard practice.
Visualization dashboards that broadcast the status of multiple projects on team screens.

The lesson: pain is a signal. Paying attention to it can turn frustration into progress.

How we manage tech debt today

On our current projects, we’ve carried forward the same spirit — but with more structure:

Treat tech-debt items as legitimate priorities. Using a dedicated backlog we track tech debt items in their own Epic. Whenever a pain point shows up, we log it there instead of ignoring it.
Monthly backlog review. Once a month, the team revisits these items: reprioritizing, refining, and estimating them. This ensures we don’t lose sight of problems we’ve already felt.
Consistent sprint allocation. Every sprint, we reserve 10–15% of capacity for addressing high-priority debt items. Your percentage may vary, but the cadence matters more than the number. Habits come from routines — and addressing tech debt should be a habit.

This practice creates a healthy rhythm: we can still move fast, but we don’t let the pile grow unchecked.

The Tech Debt Management Loop

Allocating sprint capacity

To make this concrete: in sprint planning, we don’t just look at the feature backlog. We also treat our tech-debt backlog as an equally important queue. By consistently pulling the top priority items from both with most capacity on features and a steady slice on debt, we ensure product work moves forward while we steadily reduce engineering friction.

Sprint commitment = Top priorities from main backlog (~90%) + tech-debt backlog (~10)

Whose responsibility is it?

It’s worth calling this out explicitly: managing technical debt is engineering’s responsibility.

Product, or any other stakeholders, won’t (and shouldn’t) tell us to fix flaky tests or remove a dependency we’ve outgrown. If we don’t negotiate the time and discipline to address these items, nobody else will.

Think of it like a financial loan: paying off debt requires regular installments. Skipping them might not hurt immediately, but eventually the interest compounds. Likewise, even small, steady payments on technical debt keep the codebase healthy, adaptable, and fun to work with.

Why this matters

Left alone, technical debt doesn’t just slow us down — it erodes morale. Developers stop enjoying the work. Velocity drops. Quality suffers.

But when a team normalizes debt repayment as part of their routine, two powerful things happen:

The codebase stays flexible. We can respond to change without dread.
The team stays empowered. Engineers know they have agency to improve the environment they work in.

That combination — adaptability and empowerment — is exactly what lets us build software that lasts.

What you and your team can do

If your team doesn’t already have a habit of managing tech debt, consider starting small:

Create a dedicated backlog for pain points.
Review it regularly.
Allocate a slice of each sprint (no matter how small) to addressing them.

The key is consistency. Over time, those small “payments” compound into a stronger, faster, happier engineering culture.

Managing Technical Debt: Building Habits for Long-Term Agility was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Interleaving for Accelerated Testing

Benjamin Stieger — Tue, 17 Feb 2026 12:01:01 GMT

Expedia Group Technology — Data

Quickly identifying winning ranking models before committing to A/B tests

Authors: Adam Woznica, Benjamin Stieger, and Stefania Ebli

Photo by Il Vagabiondo on Unsplash

Expedia Group™ covers a portfolio of brands such as Expedia.com, Hotels.com, and Vrbo, that power lodging searches for millions of travel shoppers every day. In this competitive market matching users to hotel inventory is crucial, as users tend to quickly switch from website to website. As a result, having the best lodging recommendation and ranking gives the best chance of winning a sale.

The lodging ranking system at Expedia Group consists of three main components:

Selection [1]
Relevance ranking [2]
Re-ranking with business adjustments [3]

These three steps are optimized to maximize conversion rate (CVR) and several business metrics. Typically, all changes in the ranking stack — whether a new ranking algorithm, a new business adjustment, new hyperparameters, or an infrastructure improvement — are A/B tested to assess the impact on the business metrics.

The limitations of classic A/B tests

A key limitation of A/B testing is its requirement for large sample sizes to maintain control over significance levels, test power, and the minimum relative uplift of the business metric. This becomes especially challenging when evaluating conversion-focused metrics, as these tend to be low, reflecting the relatively infrequent nature of lodging transactions.

As an example, Figure 1 visually illustrates the results of a statistical power analysis [4] for a two-sample, two-sided test for proportions. The analysis considers various base conversion rates, minimum CVR uplifts, and typical values for test power and significance levels (80% and 10%, respectively). For instance, with a base CVR of 3% and a relative uplift of 0.3%, a sample size of approximately 92 million users (combined across both variants) is required. Depending on the test scope — such as restricting the sample to users eligible for personalization — this can lead to tests lasting several months. Such extended durations significantly constrain the practicality of A/B testing and limit experimental bandwidth. Moreover, running long A/B tests can introduce several technical challenges, including sample pollution (e.g., due to expired cookies, device switching, or multiple account logins), shifts in user behavior as they adapt to the experiment, and temporal effects driven by external factors such as seasonality, competitor actions, or market trends.

Figure 1. Power analysis results for two-sided, two-sample tests of proportions.

Interleaving: a more sensitive alternative

In recommender systems, a well-established alternative to A/B experiments is interleaving testing [5–7]. This approach involves “blending” recommendations from different ranking variants and presenting a single interleaved result to the user, as illustrated in Figure 2. Because each user is simultaneously exposed to both variants, interleaving tests are considered within-subject experiments.

This design is a key reason why interleaving tests are significantly more sensitive than traditional A/B tests, with reported sensitivity increases ranging from 10x to 100x, depending on the source [5,8].

Figure 2. Schematic view of A/B testing vs Interleaving testing.

Our experimentation funnel

In recent years, interleaving testing has become a standard tool for lodging ranking systems at Expedia Group. The machine learning science team at Expedia Group leverages internal tools designed to:

streamline the process of launching new interleaving tests,
running multiple tests in parallel,
monitoring variant performance, and
generating final readouts.

Similar to other e-commerce companies that use interleaving as an experimentation methodology [7–8], we employ it as a pruning strategy to identify treatments for A/B testing. Among the various potential treatment variants:

We use backtesting to identify those that improve the target metric (typically NDCG [9] or revenue-focused metrics).
These promising ideas are then evaluated through interleaving tests.
The top-performing variants selected for A/B testing.

This process is schematically represented in Figure 3.

Figure 3. Schematical view of the experimentation approach. The size of each box does not reflect the cardinality of treatment variants.

Events, attribution and metrics

Unlike standard search engine use cases, to determine which ranking in the interleaved variants performs “better”, we assign two types of events to the elements in the interleaved lists:

Property detail page views (i.e. clicks)
Booking transactions (bookings)

We decided to track the above two events separately as it improves our understanding of the impact of rankings to both conversion and click-through rates. We attribute clicks and bookings to the source rankings to estimate relative preference across variants.

After attributing events to impression lists, we assign to each search a winning variant, i.e. the ranking which was assigned a higher number of clicks (or bookings); ties are also possible, e.g., when the two variants have the same number of attributed events.

To quantify the preference signal we define a “lift” metric that compares the number of searches where each variant wins, normalized to account for the number of ties. It is worth noting that the results do not strongly depend on the normalization method.

The lift metric equals 0 when A and B win an equal number of times, indicating no user preference between the two rankings. It is positive when more users prefer variant A and negative when more users prefer variant B. Note that this metric captures the direction of user preference in clicks (or bookings) for either ranking A or B, rather than the absolute magnitude of the uplift in conversion rate, as typically measured in an A/B test.

An equivalent lift metric can also be defined aggregated at the user level such that it compares the number of users who preferred one ranking over the alternative and ties account for the number of users without preferences. Reporting results on the user level is the default approach in our interleaving readouts.

Significance testing

A common approach in interleaving to assess whether the selected metric is significantly different from 0, beyond random chance, is to conduct significance testing using bootstrapping. More precisely, we can perform the bootstrapping percentile method to compute confidence intervals for point estimates. While this method works well as it doesn’t make any assumptions on the underlying data, it’s slow in practice even if implemented in a distributed fashion.

A faster alternative to determine confidence intervals for the observed metrics is to perform a t-test for the mean value being different than 0 of a distribution of winning indicators:

1 = win
-1 = loss
0 = tie

This approach yields virtually the same results as the bootstrapping approach but is considerably faster. The process of attributing events, computing the metric, and performing significance testing is graphically depicted in Figure 4.

Figure 4. Schematic view of computing the interleaving metrics.

Sensitivity increase

Similar to other e-commerce platforms, we observed a significant increase in sensitivity when using interleaving testing. To demonstrate this, we compared the confidence intervals obtained from A/B testing and interleaving testing for two representative lodging ranking treatments that are expected to lead to deteriorated rankings:

Pinning a random property to a slot between positions 5 and 10.
Randomly reshuffling a number of top slots.

Intuitively, the impact of the first treatment is minimal, making it challenging to detect in a standard A/B test. In Figure 5, we present the confidence intervals for the lift metric as determined in an interleaving test compared to the conventional CVR uplift as measured in a traditional A/B test, as a function of sample size. We can make two main observations:

Interleaving is significantly more sensitive than A/B testing and correctly detects the negative effects both of random pinning and reshuffling within a few days of data taking. A/B testing fails to detect the negative effect of random pinning even with the full sample size.
For the deteriorating treatments under test, click events are more sensitive than booking events and show a statistically significant negative result already after the first day of data taking.

Figure 5. Comparison of lift confidence intervals for A/B and Interleaving tests from illustrative test data on two representative ranking treatments. The lift metric used for A/B tests is the standard uplift metric on CVR. The figure layout is as follows: The two columns correspond to booking and click events, respectively. The first row shows the effect of randomly pinning a property in a position between 5 and 10; the second row shows the effect of randomly reshuffling the top slots. The x-axis indicates the sample size, while the y-axis represents the uplift metric (with negative values indicating the variant performs worse than the control). Different colors are used to distinguish between testing approaches. Note that the A/B test sample size requires twice as much time to accumulate as the population needs to be split into two buckets.

Challenges and limitations

Interleaving is a powerful tool that allows for quick exploration of new ranking models, finely segmented test readouts, and highly sensitive pre-deployment no-harm testing. However, some challenges arise, particularly related to the proper sizing of interleaving tests, interpretation of test results and assumption on the items’ independence.

Sizing interleaving tests in business terms

Traditional A/B tests provide measurements of the absolute value of relevant business metrics such as CVR or revenue focused metrics and allow sizing the tests in terms of a minimum detectable effect (MDE) on those same metrics. We thus know exactly how much data to collect for a desired effect size, test power, and confidence level. Interleaving, on the other hand, measures the relative user preference between two rankers, in terms of user interactions (clicks, bookings, etc.) with the search results. How those metrics relate to the business metrics is not pre-determined and can depend on the treatment being tested. Sizing interleaving tests in terms of effects on business metrics therefore requires to first establish the relationship between these metrics heuristically by carrying out a series of companion A/B and interleaving tests.

Interpreting interleaving results

For the same reason that sizing interleaving tests is challenging, interpreting the interleaving test results themselves isn’t straightforward. Users may well show a preference by booking items ranked higher by a new model during the interleaving test but not show a significantly different CVR or GP during the subsequent A/B test. Furthermore, interleaving is entirely blind to non-ranking differences between models, such as latency improvements.

Independence assumption

Another limitation of the interleaving methodology is its assumption that ranked items are independent. This becomes problematic for tests where effects occur at the page level, such as balancing impressions across different item categories (e.g., organic conventional items, sponsored listings, and vacation rentals). By breaking this assumption, interleaving can lead to inaccurate conclusions.

Running interleaving in production

Running interleaving tests in a dynamic production environment without interfering with either the test result or the user experience can be challenging. In practice, we either:

Confine tests are confined to an isolated element of the overall ranking stack, or
Implement the interleaving framework carefully so that both variants are fairly represented in the final ranking.

Conclusions

Interleaving is a powerful alternative to A/B testing for evaluating ranking systems and has become an essential part of our ranking experimentation process at Expedia Group. It acts as an efficient method for assessing ranking quality before moving forward with A/B tests. In this blog post, we delved into the details of the attribution logic, metrics, and significance testing involved in interleaving. Additionally, we emphasized the importance of understanding and addressing the limitations of interleaving when applying it in practice.

References

[1] Choosing the Right Candidates for Lodging Ranking, Adam Woznica and Meli Sedghi, Medium blog post

[2] Channel-Smart Property Search: How Expedia Tailors Rankings for You, Anne Morvan, Medium blog post

[3] The Juggler Model: Balancing Expectations in Lodging Rankings, Tiago Cunha, Medium blog post

[4] Power analysis in Statistics with R, blog post

[5] Large-scale validation and analysis of interleaved search evaluation, Olivier Chapelle and Thorsten Joachims and Filip Radlinski and Tisong Yue, ACM Trans. Inf. Syst., 2012

[6] Debiased balanced interleaving at Amazon Search, Nan Bi and Pablo Castells and Daniel Gilbert and Slava Galperin and Patrick Tardif and Sachin Ahuja, 2022

[7] Innovating Faster on Personalization Algorithms at Netflix Using Interleaving, Joshua Parks and Juliette Aurisset and Michael Ramm, Medium blog post

[8] Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving, Qing Zhang and Michelle Du and Reid Andersen and Liwei He, Medium blog post

[9] https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Interleaving for Accelerated Testing was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Quant Crossroads: UX Research or Data Science

Alyssa White, PhD — Tue, 27 Jan 2026 12:03:00 GMT

Expedia Group Technology — Data

Two roles one goal — understanding users better

By Sophie Rabet and Alyssa White

Photo by Samsung Memory US on Unsplash

Quantitative User Experience (UX) Research, as a discipline, is growing rapidly. Quant UX Con 2022, the first ever general industry conference for the discipline, was organized with the expectation of about 200 attendees. After registration exceeded 2000, the organizers were shocked to find simply how many people felt an alignment with what has historically been thought of as a niche discipline. Kerry Rodden, who gave the keynote speech at this conference, started by sharing her original job posting from 2006; the first “Quantitative UX Researcher” ever hired at Google. Fast forward to 2008 and LinkedIn coined the term “data scientist”.

While both job titles remain in their teenage years, the discipline of Data Science and the role of a data scientist has grown tremendously in the last 20 years, with no sign of stopping. In the United States, the Bureau of Labor Statistics projects a 36% employment growth for data scientists from 2023 to 2033, which is roughly 73k jobs.

As both of these disciplines continue to grow, it helps to break down similarities and differences between them to help inform thinking around job growth and desired skill sets.

The Basics

The most underlying and foundational similarity between disciplines is connection to large datasets. Which datasets and the ways of working, however, are where the paths diverge.

QUXRs (Quantitative UX Researchers) have a very long job title for a reason: each word is an important part of the skills required to do the job. They are, first and foremost, researchers, which means they need to have a foundational understanding of research design skills (especially for survey design). The research they conduct is often quantitative in nature, meaning they are working with larger samples to understand differences between groups, treatments, measures, etc. To do this type of analysis, an advanced knowledge of statistics is required, and in order to access large datasets and conduct their statistics, QUXRs must have a foundational knowledge of at least one programming language.

Meanwhile, data scientists in the tech industry use deep business knowledge to apply programming skills and machine learning techniques to analyze user behavior on sites/apps, design and analyze experiments and build datasets to monitor performance.

Within these foundational pillars, there are some shared skills. Statistical knowledge is shared between disciplines, though often with specialization emerging in terms of methodologies and with a wider knowledge of statistics in a QUXR role than in typical Data Science roles. While QUXRs will need to possess basic programming knowledge, Data Scientists will be much more advanced in this regard. Different languages are typically expected with QUXR roles focusing on statistics-oriented languages such as R, whilst Data Scientists focus more on data pulling languages such as SQL and modelling languages like Python.

Of course, within any company and team the exact expectations and roles of a QUXR and data scientist will vary. However, the crux of the difference between the two is that while both disciplines are aiming to gain insights necessary to optimize a product, data science aims to understand and predict what users are doing whilst QUXR aims to understand why they are doing it. Having these disciplines working together can provide product and design teams with a full picture of the customer journey.

A table describing job differences between a Data Scientist and Quantitative UX Researcher

Charting your path

While there are differences between our roles, that’s not to say that researchers can’t make fabulous data scientists, or vice versa. It’s all about identifying the skills that speak most to you and pursuing mastery to help inform the path you take. As stated previously, foundational knowledge of programming and statistics sit at the core of both roles, but assuming you have that squared away, how do you approach specialization?

Ask yourself: do you look at a dataset or piece of analysis and wonder if there’s a level of nuance you are missing? Do you find joy in thinking through website design choices and wondering how that might impact end user behavior and feelings? Do you find attitudinal data fascinating? Do you enjoy a well written survey question? If these questions speak to you, QUXR might be the right pathway.

Now try again: do you enjoy building models and exploring machine learning techniques? Do you enjoy making sense of large sets of data and competing signals? Do you enjoy acting like a detective to extrapolate the meaning behind data? Would you like to have a role that has the potential to jump between marketing, commercial and product? If so, data science could be right for you.

How we work differently

More often than not, QUXRs bridge the gap between data and design. They are asking UX oriented questions, like “how do these new features or new branding impact user behavior”, or “how can we measure the experience our users have with X feature”. In Google’s job description for a QUXR, they specifically call out “you’ll provide a UX perspective on quantitative data to help stakeholders understand their users”. This means that a QUXR needs to know more than just the ins and outs of the data; they need to understand the design choices and human computer-interaction principles that make the data what they are.

QUXRs collaborate more frequently with other research disciplines, like Market Research and of course Qualitative UX Research, in order to inspire or help further define their findings and data. For example, a Qual UXR may conduct a Kano study in order to identify must-have features in a new product space, and Quant UXR might pair that with a Max Diff in order to rank those features. Market Research may pull together a massive segmentation for new markets, and Quant UXR might survey existing users that meet these criteria to understand their pain points and challenges in order to inform future fixes. Researchers are curious by nature and will continue asking one another questions (ideally playing to strengths of other disciplines) until a true answer is found.

Data Scientists typically work hand-in-hand with their business counterparts in marketing and product. Their job often starts with exploring the data to uncover areas of opportunity: Where are customers dropping off? Which features are driving growth? How much value could the business unlock by improving a certain area? From there, they partner with product or marketing teams to shape hypotheses, design experiments, and test potential changes. Once the experiments run, data scientists dig into the results — translating the numbers into clear insights that help the business decide whether to double down, adjust, or pivot entirely.

On top of experimentation, data scientists also apply machine learning in practical ways. That could mean building models to forecast future performance, analyzing massive text datasets to extract meaning, or segmenting users into groups with similar behaviors. These models power things like propensity scoring (predicting how likely different users are to take an action), helping businesses make smarter, more personalized decisions at scale.

Data scientists, like researchers, are naturally inquisitive. They will keep probing the data until patterns emerge and a clear story begins to take shape. Whilst a stereotype of data scientists is often introversion, the best data scientists excel at explaining complex ideas simply to their audiences to ensure actions are taken.

How we work together

One of the most valuable aspects of the collaboration between Quant UXR and Data Science at Expedia is the strong and effective partnerships established between our teams. We’ve presented at conferences together, built tools together, and overall elevated one another’s work by combining forces and solving problems. Let’s break this down into a few scenarios.

Problem: We want to understand at scale how new products are impacting the quality of our experiences, whether or not they’re driving immediate financial impact.

Solution: Quant UXR devises survey based metrics aligned to AB tests. Data Science embeds these into experiments. Both teams work together to ensure best practices are upheld from both perspectives.

Problem: How much should we really invest in long term value of a great experience? Is it worth it?

Solution: Quant UXR administers a measure of experience quality. Data Science devises a causal propensity model to illustrate impact of such metrics over time. Both teams work together to illustrate tradeoffs and optimization opportunities.

In conclusion, we hope this article has been illustrative of our growing fields. We’ve included below some selected books and articles if you’re curious about Quant UXR or Data Science as a potential career path, and in either capacity we encourage you to keep learning and keep advancing your skill sets.

Books:

Quantifying the User Experience by Jeff Sauro and James Lewis

Quantitative User Experience Research by Chris Chapman and Kerry Rodden

Quant UX Blog, Chris Chapman

Figures and Frameworks, Carl Pearson

Counting Stuff, Randy Au (for thoughts of a Quant UXR who has also been a Data Scientist)

The Deep Groove, Sarah Gomillion

Dataclysm: Who We Are, Christian Rudder (if you find this interesting and keep wanting to dig into the data more, Data Science might be right for you!)

Learn about life at Expedia Group

The Quant Crossroads: UX Research or Data Science was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

Powering Vector Embedding Capabilities

Manisha Sudhir — Tue, 06 Jan 2026 12:02:27 GMT

Expedia Group Technology — Data Science

Empowering developers with seamless vector embedding solutions

Photo by Daniela Cuevas on Unsplash

Introduction

Rapid advances in Machine Learning (ML), especially Generative AI, have increased the need for specialized capabilities like vector embedding similarity search. Vector embeddings are the numerical representations created by machine learning models which allow disparate inputs to be compared against each other. A similarity search can be accomplished by querying an indexed collection of vectors for items similar to a given vector. This process involves comparing the distance between the input vector — a single point in a multidimensional space — and each vector in the collection. Techniques such as k-nearest neighbors search (KNN or NNS) and approximate nearest neighbors search (ANN) are often employed to efficiently identify vectors that are most similar to the input vector.

Vector similarity search is gaining increased attention, particularly due to the growth in the use of large language models (LLMs). Many databases are now being created or modified to support vector similarity search capabilities. In this article we discuss how the Machine Learning Platform team at Expedia Group™ aims to reduce the engineering and integration effort required to quickly create and iterate on vector embedding use cases.

Why do we need a centralized vector embedding service?

In the Expedia Group ML Platform team, our mission is to make it easy for our ML community to build, deploy, and iterate on ML-powered services with reliable, standardized, and reusable tools. The platform team shoulders the responsibility of creating and operating the necessary infrastructure so that each ML team can focus on applying their core skills.

With a centralized service for storing and managing embeddings, teams can easily discover, manage, and search vector data across the organization. A centralized service can link the embedding model with the vector data to avoid mixing vectors from different models or querying vectors generated by the wrong model. Additionally, it can enforce schema contracts for each vector collection to ensure compatibility is maintained for consumers when the creator of a collection makes changes.

Having a centralized service also allows us to provide composite operations for common sequences of tasks. In short, it’s all about reducing operational overhead, improving collaboration, and making life easier for everyone building with embeddings.

What is Expedia Group’s Embedding Store Service?

The Embedding Store Service is the comprehensive solution we developed for managing, storing and querying vector embeddings at scale. Along with leveraging Feast, an open-source feature store, it extends the functionality of traditional feature store to provide support for vector embeddings. The service provides a unified platform to manage embedding data while ensuring seamless integration with existing ML workflows. Key features include:

CRUD Operations: Create, read, update, and delete embeddings efficiently.
Similarity Search: Ability to perform vector similarity searches across collections.
Metadata Filtering: Discover existing vector embeddings by filtering based on metadata attributes such as model, version, or associated service.

Figure 1: Embedding Store Service High Level Architecture

Leveraging the Feast feature store for metadata management and discoverability

The Embedding Store Service utilizes Feast to maintain metadata about the collections created in the service. This helps maintain important information about all the collections specific to embeddings and enables discoverability.

The metadata defined can include the associated service (the system or application that generates and/or consumes the embeddings) and the specific model used to produce them. This plays a vital role in organizing and managing collections, providing benefits such as the following:

Data Consistency: The collection definition guarantees that all embeddings in a collection are linked to consistent metadata, such as the model and service they are associated with. This alignment prevents mismatches between embeddings and their intended applications.
Search and Discoverability: Users can easily locate collections based on components of its metadata, such as a specific model or version, to discover existing vector embeddings or multiple versions of embeddings tailored to the same associated service.
Version Management: Multiple versions of the same dataset, tailored to different needs and scenarios, can be created based on various factors such as different embedding models or model versions, modifying the indexing algorithms to suit various use cases or modifying the schema. This flexibility allows users to maintain a clear lineage of their data, experiment with different configurations, and seamlessly adapt to evolving requirements, all while preserving the integrity and usability of their embeddings.

Feast also introduces the concepts of an “online store” and an “offline store”, which together enable efficient management of both current and historical data while supporting different types of workloads. The online store is the vector database for interactive workloads, providing performant similarity searches on the most current and relevant data. This store is optimized for real-time queries, enabling fast and efficient retrieval to support use cases like recommendation systems and semantic search.

The offline store acts as the repository for the historical dataset of a collection. It supports batch workloads such as analytical queries, experimentation, and training of new models. By maintaining a complete historical record of embeddings and their associated metadata, the offline store ensures traceability and acts as a reliable data backup.

The seamless integration between the online and offline stores allows users to restore data from the offline store to the online store whenever needed. This can be done based on various scenarios such as embeddings’ creation dates, specific time ranges, or more complex SQL queries. This flexibility ensures that data remains accessible for both real-time applications and historical analysis, providing a robust foundation for embedding workflows.

Generating and inserting embeddings from features

Once a collection is created, users can begin loading vector embeddings and associated data into it. There are three methods available for loading data, depending on the volume and generation process:

Batch Ingestion: For large volumes of embeddings generated through feature engineering processes, the Embedding Store Service provides a batch ingestion mechanism utilizing Feast materialization. This uses a Spark-based process to efficiently load data from one or more offline sources.
Insert API for Small Batches or Real-Time Data: When working with smaller batches of embeddings or handling real-time embedding generation, users can use the standard Insert API to load data directly into the service.
On-the-Fly Embedding Generation: For scenarios where embedding generation needs to be offloaded, the Embedding Store Service can generate embeddings dynamically by calling specific models to generate embeddings on the fly.

Regardless of the method chosen to load data, the service ensures that all embeddings are stored simultaneously in both the online and offline storage systems, providing robust access for various use cases.

Search capabilities

Once data is stored in a collection, similarity searches can be performed to find embeddings that are most similar to a given query vector. Vector similarity search works by calculating the distance between the query vector and the vectors in the collection, leveraging the index to return the most similar results. Indexes in vector databases are designed to speed up the process by organizing and structuring the data, avoiding the need to compare the query vector against every single vector in the database — a process that would be computationally expensive. The choice of index type depends on factors such as dataset size and the balance between speed and accuracy required.

In addition to similarity search, the Embedding Store Service also supports hybrid search, which combines vector similarity search with filtering based on additional fields in the data. This enables queries that not only find similar vectors but also apply conditions, such as “price < 100” or “category = electronics,” to refine the results. Hybrid search makes the queries smarter and more precise by combining the power of vector searches with traditional filtering.

Summary and moving forward

In this overview, we have shared the goals and capabilities of the ML Platform team’s Embedding Store Service. The advantages of the centralized service includes:

Reduced development time and acceleration of development and iteration of different ML experiences.
Standardized APIs for ease of use and rapid development of ML applications.
Discoverability and management of embeddings through seamless integration with Feast’s feature store, leveraging metadata management and collection versioning for better organization and lineage tracking.
Multiple embedding workflow support, including batch ingestion, real-time insertion, and integrated embedding generation via defined models.
Performant search capabilities, including similarity and hybrid searches.

We aim to continue to integrate new vector database developments and provide powerful capabilities through standardized APIs. In this ever-evolving space, the Embedding Store Service will continue to power vector embedding capabilities throughout Expedia Group.

Credits

Written by Manisha Sudhir & Timon Pike. We thank our peers at Expedia Group™ for feedback, review and support.

Powering Vector Embedding Capabilities was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

A Functional Programming Alternative to the Strategy Pattern

Rafael Torres — Tue, 02 Dec 2025 06:32:20 GMT

Expedia Group Technology — Engineering

Exploring the strategy pattern and functional programming alternatives in Kotlin

Photo by Karsten Winegeart on Unsplash

When designing software around business processes (e.g., orchestration services), one of the key challenges is organizing business logic in a way that is maintainable, scalable, and adaptable to change. In this post, we’ll explore how to address such challenges with the Strategy Pattern (object-oriented, OO), and a Functional Programming (FP) alternative. We’ll also discuss how to handle shared logic between strategies and compare the trade-offs of each approach.

Note: The examples and patterns in this post are for educational use and do not include any proprietary information. We’ll base all the examples assuming the main logic is executed by invoking a “handler”.

The initial example: hardcoded logic in a handler

Let’s start with a basic implementation where a Handler processes messages based on a lob (line of business) parameter. The logic is hardcoded in a when block.

class Handler {
    fun handle(message: Message): Response {
        return when (message.lob) {
            "LOB1" -> Response("Handled by Line of Business 1: ${message.content}")
            "LOB2" -> Response("Handled by Line of Business 2: ${message.content}")
            else -> Response("Unhandled line of business: ${message.lob}")
        }
    }
}

LOB stands for Lines of Business, but it could also represent product types, categories, or any other concept that signifies a distinct “business segment” within the solution domain.

This approach works for simple scenarios, but as the number of lob values grows or the logic for each becomes more complex, the code quickly becomes unwieldy and difficult to maintain.

Refactoring with the strategy pattern

The Strategy Pattern (OO) offers a way to encapsulate each lob's processing logic into separate, reusable classes. The Handler delegates processing to the appropriate strategy based on the lob.

interface MessageHandler {
    fun handle(message: Message): Response
}

class LOB1Handler : MessageHandler {
    override fun handle(message: Message): Response {
        return Response("Handled by Line of Business 1: ${message.content}")
    }
}

class LOB2Handler : MessageHandler {
    override fun handle(message: Message): Response {
        return Response("Handled by Line of Business 2: ${message.content}")
    }
}

The Handler uses a Map to associate lob values with their respective handlers:

class Handler {
    private val handlers = mapOf(
        "LOB1" to LOB1Handler(),
        "LOB2" to LOB2Handler()
    )

    private val defaultHandler = DefaultHandler()

    fun handle(message: Message): Response {
        val handler = handlers[message.lob] ?: defaultHandler
        return handler.handle(message)
    }
}

This approach improves maintainability by decoupling the logic for each lob into separate classes. Adding a new lob simply involves creating a new MessageHandler implementation and registering it in the Handler.

A functional programming alternative

In the Functional Programming (FP) approach, we achieve the same separation of concerns without relying on interfaces or class hierarchies. Each handler is represented as a lambda function, and the Handler uses a Map to look up the appropriate handler.

class Handler(private val handlers: Map Response>) {
    private val defaultHandler: (Message) -> Response = { message ->
        Response("Unhandled line of business: ${message.lob}")
    }

    fun handle(message: Message): Response {
        return handlers[message.lob]?.invoke(message) ?: defaultHandler(message)
    }
}

Each handler is defined as a simple lambda:

val handlers = mapOf(
    "LOB1" to { message: Message -> Response("Handled by Line of Business 1: ${message.content}") },
    "LOB2" to { message: Message -> Response("Handled by Line of Business 2: ${message.content}") }
)

This approach is concise and avoids the need for boilerplate code, such as creating separate classes for each lob.

Handling shared logic between handlers

In both the OO and FP approaches, there may be shared logic that is common to all handlers. Let’s see how this is handled in each paradigm.

OO approach with shared logic

In the OO approach, we can move the shared logic into an abstract base class that all handlers inherit from:

abstract class BaseMessageHandler : MessageHandler {
    protected fun sharedLogic(message: Message): String =
        "Shared Prefix: ${message.content} | "
}

class LOB1Handler : BaseMessageHandler() {
    override fun handle(message: Message): Response =
        Response(sharedLogic(message) + "Handled by Line of Business 1")
}

While this avoids code duplication, it introduces a deeper class hierarchy. If sub-LOBs or more complex variations arise, the hierarchy can become convoluted, making the code harder to navigate and maintain.

FP approach with shared logic

In the FP approach, shared logic can be passed as a parameter to each handler using higher-order functions. Defining a typealias keeps the signatures clean:

typealias HandlerFn = (Message, (Message) -> String) -> Response

class Handler(private val handlers: Map) {
    private val defaultHandler: HandlerFn = { message, sharedLogic ->
        Response(sharedLogic(message) + "Unhandled line of business: ${message.lob}")
    }

    fun handle(message: Message): Response {
        val sharedLogic: (Message) -> String = { msg -> "Shared Prefix: ${msg.content} | " }
        return handlers[message.lob]?.invoke(message, sharedLogic)
            ?: defaultHandler(message, sharedLogic)
    }
}

Each handler explicitly uses the shared logic:

val handlers = mapOf(
    "LOB1" to { message: Message, sharedLogic: (Message) -> String ->
        Response(sharedLogic(message) + "Handled by Line of Business 1")
    }
)

This approach avoids hierarchy complexity and keeps the shared logic close to where it is used. Dependencies remain explicit, and the code is more composable at the cost of slightly more boilerplate in handler declarations.

Pros and cons

Strategy pattern approach

Pros

Encapsulation of behavior in reusable classes.
Clear separation of concerns.

Cons

Requires more boilerplate code (e.g., interfaces, classes).
Can lead to a complex and rigid class hierarchy when handling shared logic or sub-LOBs.

FP approach

Pros

Concise and flexible implementation.
Avoids class hierarchy complexity.
Shared logic can be passed explicitly, making it easy to reuse and test.

Cons

May be less familiar to developers accustomed to OO patterns.
Shared logic must be carefully managed to avoid duplication.

Conclusion

Both the Strategy Pattern and its Functional Programming alternative provide effective ways to structure business logic. The strategy pattern approach fits naturally with established design patterns and may feel familiar to teams with a strong strategy pattern background. The functional programming approach, by contrast, emphasizes composability and keeps shared logic explicit — helpful when you want to avoid deep class hierarchies or keep dependencies transparent.

Ultimately, the right choice depends on your project’s requirements and your team’s strengths. By weighing these trade-offs, you can pick the style that not only solves today’s problem but also scales with your codebase and team culture.

Learn about life at Expedia Group

A Functional Programming Alternative to the Strategy Pattern was originally published in Expedia Group Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.