Moving to the Cloud Presents New Use Cases for Feature Flags

Feature management tools can help show how a given feature behaves in production and gain insights into how users interact with it.

Dec 2nd, 2021 6:23am by Charles Humble

Featued image for: Moving to the Cloud Presents New Use Cases for Feature Flags

Featured image via Unsplash.

While not a new concept — the oldest blog post I can find about successfully using feature flags is from Flickr and now more than a decade old — commercial feature management and experimentation products are relatively new when compared to other DevOps technologies such as continuous integration (CI) and continuous delivery (CD), having only come onto the market in the past half-decade or so. Feature flags work by encapsulating code within an if statement and then having a process, such as reading a value from a configuration file, to determine whether the conditional value is true or false. This example, taken from the LaunchDarkly documentation for Java, shows what this looks like:

LDUser user = new LDUser("user@test.com");

boolean showFeature = client.boolVariation("your.feature.key", user, false);

if (showFeature) {

// application code to show the feature

}

else {

// the code to run if the feature is off

}

Of course, this is easy enough to do with a home-grown solution, but leading feature management vendors such as LaunchDarkly, AB Tasty and Optimizely add a range of capabilities on top, such as advanced audience targeting, management for progressive releases, security and privacy capabilities, and tooling to manage the lifecycle of feature flags. Combined, these capabilities enable the precisely controlled, gradual rollout of a feature to either a given percentage or segment of users, thereby separating deployment from release. Feature management can thus be used to gain a better understanding of how a given feature behaves in production, become confident with its implementation, and gain insights into how users interact with it in a gradual and controlled manner. During her keynote at the recent Trajectory conference, LaunchDarkly CEO and co-founder Edith Harbaugh gave an example, based on the classic DevOps infinity loop, to illustrate how this works: a feature is given first to internal users, before gradually ramping up a release to more and more external users, thus enabling both testing in production and canary deployment.

diagram showing how feature management works

How features are released to internal users and then increasingly to external ones, allowing for testing both in production and via canary deployments. (Diagram courtesy of LaunchDarkly)

Testing in Production

Testing in production used to be somewhat controversial, carrying connotations of cowboy programming and sloppy QA. And, as stated by Cindy Sridharan, a distributed systems engineer and O’Reilly author, it does require real care to do it safely:

“[B]eing able to successfully and safely test in production requires a significant amount of automation, a firm understanding of the best practices as well as designing the systems from the ground up to lend themselves well toward this form of testing.”

However, it has significant advantages when done correctly. Sridharan has said:

I’m more and more convinced that staging environments are like mocks – at best a pale imitation of the genuine article and the worst form of confirmation bias.
It’s still better than having nothing – but “works in staging” is only one step better than “works on my machine”. — Cindy Sridharan (@copyconstruct) March 16, 2018

By contrast, testing in production enables your QA team to gain real-world insights about how a given feature actually performs. It can also highlight unexpected problems in other parts of an application that can be hard to stub out, or to replicate elsewhere. The use of tools like feature flags mean that if issues are encountered in the process — for example, the new feature causes some performance degradation, or a drop in customer engagement — a feature can instantly be switched out, simply by turning the flag off, without user disruption or the need to re-deploy. This process can be automated using signals from monitoring and observability tooling such as Datadog and Honeycomb, switching off a feature if a particular threshold is reached, without the need for manual intervention.

Feature Flags as Technical Debt

Of course, a less desirable but natural outcome of inserting feature flags into source code is the accumulated technical debt that comes from it. As feature flags become more pervasive, touching more elements of the stack, both the risk and the level of technical debt can rapidly grow. In light of this, it is important, when considering a vendor-based solution, to factor in what the product offers to allow better management of flags. A recent Forrester Wave report on feature management and experimentation singled out LaunchDarkly for particular praise in this area. The LaunchDarkly product includes the ability to run analytics to make it easier for development teams to see if a flag is in use, functionality called code references. “Once you’ve spotted a flag that is no longer needed, you can pinpoint the repository and line of code within a few clicks to remove it from the code base,” Ravi Tharisayi, senior director of product marketing at LaunchDarkly told The New Stack. This functionality is complemented by some suggestions on how to manage the removal of flags from a work item and process point of view.

Infrastructure as a Feature

Whilst typically used for code-only features and UX experiments, the shift to the cloud and the rise of infrastructure as code has opened up additional use cases. “In a cloud world where the layers of the stack are so intertwined, you can take that concept and really expand the scope of what a feature is, applying those same concepts down to the infrastructure layer,” Tharisayi said. During her talk at Trajectory, Harbaugh described LaunchDarkly’s own database migration, where the vendor moved from an environment based on Postgres and MongoDB to CockroachDB. By running both the legacy and new databases in parallel, the LaunchDarkly team was able to reduce the risk of errors, and could monitor the application performance through each step of the migration. Moreover, customer support agents could gain insights into which version of the database a particular customer was on, enabling a more efficient troubleshooting process. As Charlie Custer noted on the CockroachDB blog, this approach offers further advantages, including time for learning:

“[S]hifting to a new database system can require some shifts in thinking, particularly if you’re moving from a legacy, single-node system to a modern, multicloud, multiregion distributed system. It’s essential that you give your team the time and space to experiment and make those shifts — and that, in turn, is only possible if you’re taking a phased migration approach and starting with less critical workloads that allow you a bit of a margin for error.”

This same phased approach — a variation on the strangler pattern — can also be used for migrating from a monolith to a microservices-based architecture, and other types of re-platforming activity. “We had a customer that moved their lambda processing to AWS’ new Graviton2 processor,” Tharisayi told The New Stack. “Using feature flags, they were able to test out whether that service was able to save them money. This is at a layer that is deeper than what you think about when you typically think about features.” In a sense, this ties back to the infinity loop. The journey to the cloud is never really done; as new services become available with a given cloud provider the ability to use feature flags as a way of trying those services out in a low-risk manner can help developers continuously optimize their whole application, including its architecture and the underlying infrastructure on which it runs.

Charles Humble is a former software engineer, architect and CTO who has worked as a senior leader and executive of both technology and content groups. He was InfoQ’s editor-in-chief from 2014-2020, and was chief editor for Container Solutions from 2020-2023....