Netflix Engineers Rethink Mock Testing for GraphQL
NEW YORK — Wouldn’t it be wonderful if developers could issue their pull requests knowing in advance that the new code proposal will work fine in production? Sure, UI testing can be very useful depending on the application, but it doesn’t always reveal how the application might run — or crash — in distributed environments, especially containerized ones.
Ideal testing would involve replicating or “mocking” the environment in which the application or update will be deployed. Needless to say, as Juliana Congote at the GraphQL Summit here in New York said during her talk, “Fake It ’til You Build It: Navigating GraphQL Mocking Solutions,” creating a working mock for Netflix’s infrastructure and networks is a monumental task, especially in Netflix’s case as it services hundreds of millions of subscribers to its media streaming service.
@Netflix ‘s Juliana Congote at @graphqlsummit in New York during her talk “Fake it ’til You Build it: Navigating @GraphQL Mocking Solutions.” Yes, mocking is very hard: “Let’s face it. There are a lot of different edge cases where you need to be able to cover.” @thenewstack pic.twitter.com/ivIafaNOtG
— BC Gain (@bcamerongain) October 10, 2024
“Building effective mock solutions isn’t just a technical challenge — it’s about understanding your users,” Congote said during her talk. “To build a mock solution that supports multiple workflows, the system needs to be customizable, intuitive and scalable, just like our data models.”
Meanwhile, some say that only a true production release can ultimately determine whether an update or release will function properly. Canary deploys and traffic shaping can allow you to deploy a new version to production for testing, Larry Maccherone, DevSecOps transformation architect at Contrast Security told me. “This is how load and performance testing is done today with lightweight APM agents,” Maccherone said. “Previously, you had to stand up a mock environment and run fake traffic through it to maybe catch some portion of potential load and performance problems.”
Instead of attempting to create proper mock tests, many DevOps teams rely heavily on canary testing, which often means that the testing and debugging process only really begins once the application is already in production. Consequently, canary releases can serve as an option, and, in the event of failure, remediation can begin immediately based on prior integration-testing baselines. However, the canary release is not foolproof. “Only after you fix the stuff found in the canary mode and deploy more widely is the new version more reliable,” Maccherone said.
UI testing alone typically fails to provide a complete picture of how an application or microservice will perform once deployed in a Kubernetes environment. Moreover, UI testing often requires manual interaction through screens to verify correctness, resulting in slowdowns in production-release schedules. Although automated UI test suites exist, they are frequently challenging to maintain and debug for QA and DevOps departments working with microservices. This has led many DevOps teams to rely heavily on canary testing, which often means that the testing and debugging process only really begins once the application is already in production. Consequently, canary releases are more reliable, since, as Maccherone described, releases that don’t go to a wide audience until after they’ve first been tested with a canary release are more reliable. In the event of failure, remediation can begin immediately based on prior integration-testing baselines.
An ideal testing solution should model and replay all related inbound and outbound traffic. In essence, traffic would be realistically “mocked” without interrupting the development team’s workflow as code is developed. This approach results in a honeycomb-shaped testing pattern, as detailed below, while auto-identifying dependencies within the Kubernetes production environment throughout the testing process. Testing in this way enables an effective shift-left approach for Kubernetes environments, improving reliability and reducing failure rates once applications are deployed.
However, the independence often assumed in unit testing is challenged by modern microservices architectures. High coupling (dependencies) between components means that testing units independently is not always feasible; outputs are frequently dependent on other components’ behavior. Consequently, the line between unit tests and integration tests becomes blurred, as Martin Fowler described. Rather than attempting to resolve this ambiguity, many testing experts advocate embracing it as a natural evolution of software architecture, placing a stronger emphasis on comprehensive integration testing or, increasingly, testing in production.
“I only do solitary unit testing nowadays for libraries and UI components,” Maccherone said. “Though keep in mind, Martin Fowler introduced the idea of ‘sociable unit testing’ a while back, but I fail to see the difference between that and integration testing.”
The Mock Way
Netflix, as mentioned above, has faith in mock testing. During her talk, Congote said one of the primary goals with mock testing is to ensure data “doesn’t return too simply.” “To address this, we can split large mocks into smaller pieces to provide more realistic data. There are multiple ways to achieve high fidelity — some more complex than others — and a company’s needs usually determine how complex their solution should be,” Congote said. “Additionally, it’s important to make these mocks easily customizable so clients can configure them as needed.”
A mock testing solution also needs to be user-friendly, given that the tests are shared across teams, Congote said. This means that everyone involved should be able to draft, update and test the stream protocol comfortably within the workflow. Isolated testing environments allow team members to use existing query plans without requiring them to load up an entire system just to test schema objects. As data schemas grow in complexity, the approach also needs to be future-proof so that it remains viable as models become more intricate, Congote said.
Obviously, mock tests must be well-adapted for developers. “I’ve also learned that developers value intuitive workflows. They want tools that integrate naturally into their work streams, without veering too far from established processes,” Congote said. “It’s important that any new tool or solution provides all the necessary data and resolvers in a straightforward way. Developers should be able to set up a project with minimal hassle — if setup is too complex, it likely won’t be widely adopted.”
The creators of the mock tests must absorb much of the complexity on the backend, Congote said. “While elements like schema registries can be inherently complex, taking on that complexity allows developers to have a seamless experience with our tools,” she said.
The solution Congote and her team are crafting operates in a prioritized environment, leveraging Netflix’s DGS (domain graph service) framework, which is Netflix’s platform for building Java-based GraphQL APIs. Congote described how the mocks her team are creating pool together the data, configurations and resolvers located in a central location, accessible to both clients and servers. This central repository allows any user to add fields or make schema updates without extensive coordination with other teams, she said. The DGS framework makes it easy for engineers to customize mocks in one place, avoiding the need to navigate multiple repositories, Congote said.
“In terms of user adoption, if our editors embrace this system, we envision that it will provide significant long-term benefits. With it, clients could control migrations and test schemas before deploying. Importantly, it does not deviate from our engineers’ existing workflows,” Congote said. “The framework is isolated but federated, meaning it retains consistency with the production schema while remaining flexible for testing purposes. This is powerful because it allows for more nuanced data management without altering our core practices.”
Congote did not offer any more specifics during her talk but indicated that Netflix’s mock testing system is not completely operational yet. Ultimately, developing a comprehensive solution meant meeting with various teams to understand their distinct needs, Congote said. “Each team had unique ideas and challenges, highlighting that there is no one-size-fits-all solution,” Congote said. “A successful solution requires understanding what each developer or team needs, fostering communication and integrating flexibility.”