The Language Rewrite Question: When Migration Actually Pays Off — and When It Doesn’t
Discord, Figma, Shopify, and Dropbox all documented their rewrites with real outcomes. There is now enough evidence to build a serious decision framework — and to name when a rewrite is an expensive engineering vanity project.
1. The Question That Never Goes Away
Every few years, a team hits a wall with their current stack and someone in the room says the words: “We should just rewrite this in something better.” What follows is usually either one of the best engineering decisions the team ever makes — or a year of missed features and a codebase that somehow ends up with the same problems in a different language.
The frustrating reality is that both outcomes are well-documented and both happen for reasons that were, in retrospect, entirely predictable. Joel Spolsky’s 2000 essay argued that rewriting from scratch is “the single worst strategic mistake that any software company can make,” citing Netscape’s three-year disappearing act while competitors ate their market share. That argument holds real weight and still gets cited in architecture discussions today. But it also predates a generation of carefully documented, scoped, metrics-driven rewrites that succeeded precisely because they did not follow the pattern Spolsky was warning about.
In 2026, there are enough public post-mortems with real numbers to move beyond anecdote. Discord published latency charts with microsecond-level before-and-after comparisons. Figma documented both the wins and the rough edges from their TypeScript-to-Rust migration. Shopify built a YJIT compiler rather than migrate away from Ruby — and used it to handle 489 million requests per minute on Black Friday 2025. The evidence is there. What has been missing is a clear framework for interpreting it. This article is that framework.
2. Four Case Studies with Real Outcomes
Table 1 — Four Case Studies with Real Outcomes
| Company | Migration | Headline result | Primary motivation | Scope | Outcome |
|---|---|---|---|---|---|
| Discord | Go → Rust · 2019 | ~0 msGC pauses eliminated. Avg latency moved from milliseconds to microseconds. Rust beat Go on every metric after profiling. | GC pauses every 2 min for users with 1,000+ servers — no in-language fix possible | Read States service only — one bounded component | Clear ROI |
| Figma | TypeScript → Rust · 2018 | 10×Multiplayer server performance. Single-threaded Node.js was blocking the event loop on large documents. | Node.js single-threaded runtime — structural constraint, not fixable in-language | Multiplayer hot path only — initial plan to rewrite whole server was dropped | Clear ROI |
| Shopify | Ruby — stayed · 2023–25 | 489MRequests/min on Black Friday 2025. Invested in YJIT (Rust-backed Ruby JIT) and pod architecture instead of migrating. | Scaling pressure — chose to invest in current stack rather than migrate away from Rails | No migration — architecture and tooling investment instead | Counter-case |
| Dropbox | Python → Rust · 2019 | “Best””Betting on Rust was one of the best decisions we made” — correctness and concurrency as primary wins, not just speed. | Correctness: encoding invariants in the type system that Python’s dynamic types could not enforce | File sync engine — bounded with clear interfaces | Clear ROI |
3. Discord: Go → Rust — The Clean Success
The Discord rewrite is the most-cited successful language migration in recent engineering history, and it deserves that status — primarily because the problem definition was unusually precise before a single line of Rust was written.
Discord’s Read States service — which tracks which messages each user has read across all their servers — was written in Go. As Discord documented in their own blog post, the problem was specific: Go’s garbage collector needed to scan the entire LRU cache to determine which memory could be freed. As the service scaled to millions of concurrent users with larger and larger caches, this produced GC pauses of 10–50 milliseconds every two minutes. For most users, invisible. For power users with thousands of servers, a perceptible and recurring latency spike.
Crucially, the Go team had already exhausted in-language solutions. They had tuned GC settings extensively. They could not go further without either changing the problem or changing the language. The result of the Rust rewrite was unambiguous: average response time moved from milliseconds to microseconds. GC pauses were eliminated entirely — Rust has no GC. CPU and memory usage both improved. And the initial Rust port, written with only basic optimisation effort, already outperformed the hyper-tuned Go version. After further profiling, it beat Go on every single metric.
This case illustrates three conditions that, when present together, make a rewrite compelling: a quantified, specific bottleneck; evidence that the current language cannot solve it within its own paradigm; and a scoped component small enough to rewrite in weeks rather than months.
The Discord Lesson: The initial port from Go to Rust was completed in approximately six months with a small team. The ROI was immediate and measurable on day one of production deployment. Discord’s own summary: “We don’t think you should rewrite everything in Rust just because.” — The qualification is as important as the result.
4. Figma: TypeScript → Rust — The Scoped Win
Figma’s multiplayer server was originally written in TypeScript. It was, as they note in their own post-mortem, “surprisingly good” for years. The problem that eventually forced action was structural to the runtime, not the code: TypeScript runs single-threaded on Node.js. When a slow operation — like encoding a very large Figma document — blocked the event loop, every other document on that worker waited. There was no in-language solution. Single-threaded JavaScript is single-threaded.
The decision to rewrite in Rust was a deliberate scope limitation. As Evan Wallace, Figma’s co-founder and original author of the multiplayer protocol, wrote directly: “Our multiplayer server is a small amount of performance-critical code with minimal dependencies, so rewriting it in Rust even with the issues that came up was a good tradeoff for us. It enabled us to improve server-side multiplayer editing performance by an order of magnitude.” The key phrase is “small amount of performance-critical code with minimal dependencies.” That sentence describes the rewrite. Not the whole product — one service, one hot path, tight scope.
Figma’s post is notable also for its honesty about what went wrong. Rust’s ecosystem at the time was less mature than today. Two compression libraries they tried had correctness bugs that would have caused data loss. The async API (futures) had ergonomic issues that made them abandon Rust for some network handlers and fall back to C via FFI for compression. These are the kinds of friction costs that rarely make it into celebration blog posts. Furthermore, Figma explicitly dropped their initial plan to rewrite the whole server in Rust, choosing instead to focus solely on the performance-sensitive part.
5. Shopify: The Rewrite Avoided — The Most Important Counter-Case
The Shopify story is arguably more instructive than the Discord or Figma rewrites, precisely because it is the story of what happens when a team resists the rewrite impulse and invests in the existing stack instead.
Shopify runs Ruby on Rails. Their monolith dates to the early 2000s. For years, the standard engineering consensus has been that Ruby doesn’t scale — a claim that Shopify has been methodically refuting at increasing scale. Rather than migrate away from Ruby, Shopify made several strategic investments. They built YJIT, a Just-in-Time compiler for Ruby written in Rust, which became the default JIT in CRuby and delivered 15%+ throughput improvements for Rails applications globally. They created Sorbet, a static type checker for Ruby. They invested in a pod-based sharded deployment architecture that isolates failure domains. They built Ruby LSP for first-class editor intelligence.
The outcome is documented: on Black Friday 2025, Shopify’s Rails monolith processed $14.6 billion in merchant sales, handling peak loads of 489 million requests per minute on the edge and over 53 million database queries per second. Not despite staying on Ruby. Because of the depth of investment they made in that stack instead of migrating away from it.
The Shopify Principle: Shopify’s engineering leadership made a deliberate decision to treat Ruby and Rails as “100-year tools” and invest accordingly. The cost of that decision was saying no to the migration conversation for a decade. The payoff was one of the most resilient, high-throughput Ruby deployments in existence. The principle generalises: before asking “what language should we migrate to,” ask “what would happen if we invested this same engineering budget into making the current stack as good as it could be?”
6. Dropbox: Python → Rust — The Hybrid Route
Dropbox’s file sync engine rewrite is a less-discussed but important data point because the primary stated motivation was not performance — it was correctness and concurrency safety. The Dropbox engineering team’s own conclusion was that “Rust has been a force multiplier for our team, and betting on Rust was one of the best decisions we made. More than performance, its ergonomics and focus on correctness has helped us tame sync’s complexity. We can encode complex invariants about our system in the type system and have the compiler check them for us.”
This represents a third rewrite motivation pattern distinct from both Discord and Figma. Discord rewrote for GC-latency elimination. Figma rewrote for concurrency headroom in a single-threaded runtime. Dropbox rewrote to encode correctness guarantees that Python’s dynamic type system could not provide, in a system where concurrency bugs only appeared under load in production. The Rust ownership model and type system became their bug-prevention infrastructure — a different class of ROI that does not show up in a latency chart.
7. What the Data Actually Shows
Rewrite outcome patterns across documented cases

Several patterns emerge clearly from the documented cases. First, the most consistent predictor of a successful rewrite is the narrowness of the initial scope. Every successful case — Discord, Figma, Dropbox, npm’s registry rewrite, Cloudflare’s network-critical services, AWS Firecracker — involved a bounded component with clear before-and-after measurement criteria. Every documented failure or regret involves a broader scope: a full product rewrite, a migration without a specific quantified problem, or a team that spent the rewrite timeline losing product velocity to competitors.
Second, the motivation matters as much as the destination language. Rewrites driven by a specific, measurable problem — GC latency, single-threaded bottlenecks, memory safety bugs in security-critical code — have a significantly better track record than rewrites driven by “the codebase is messy” or “the new language has better ergonomics.” The latter reasons are not invalid, but they do not produce the kind of measurable ROI that justifies the migration cost.
8. The Anatomy of a Vanity Rewrite
The term “engineering vanity project” deserves a precise definition rather than being used as a vague insult. A vanity rewrite has some or all of these characteristics, and it is worth naming them plainly.
Table 2 — Signs the ROI is Real vs. Signs it is a Vanity Project
| Signs the ROI is real | Signs it is probably a vanity project |
|---|---|
| You can state the specific bottleneck in one sentence, with a number attached | The primary motivation is “the new language is cleaner / more modern” |
| You have already exhausted in-language optimisations and can prove it | Nobody can define what “done” looks like in measurable terms |
| The scope is a single component or hot path, not the whole system | The scope keeps expanding as work progresses |
| Success criteria are defined before any migration code is written | The team needs to learn the target language during the migration |
| The team migrating is already productive in the target language | There is no plan for running old and new in parallel |
| The current language cannot solve the problem within its own paradigm | The problem could be solved with profiling and targeted refactoring |
| You can run old and new in parallel and do a measured comparison | The timeline is measured in months before any production validation |
Notably, the “messy code” motivation that Joel Spolsky targeted in 2000 is still the most common driver of failed rewrites. As he pointed out, and as every subsequent study has echoed: working code, however messy, contains years of hard-won knowledge about edge cases, weird inputs, and failure modes that the team has encountered and handled. A rewrite does not preserve that knowledge — it discards it. The new codebase starts with clean architecture and rediscovers every one of those edge cases in production.
The Hidden Cost Nobody Budgets For: Development velocity typically drops 30–50% during the first 3–6 months of Rust adoption for teams coming from garbage-collected languages. Senior engineers who are productive in Go or Python find themselves debugging lifetime annotations instead of shipping features. This is not a reason to avoid Rust when it is the right tool — it is a cost that must be explicitly budgeted and communicated to stakeholders before the migration starts, not discovered mid-project.
9. The Decision Framework
The following framework is derived from the patterns across successful and failed rewrites. Work through the questions in order. The earlier you hit a stopping condition, the more clearly the data suggests the rewrite will not pay off.
Table 3 — Decision Framework: Work Through in Order
| # | Question | What to look for | If no / unsure |
|---|---|---|---|
| 01 | Can you state the specific problem in one sentence with a measurable number? | “GC pauses cause 40 ms spikes every 2 min for users with 1,000+ servers” — not “the code is slow.” | Stop |
| 02 | Have you demonstrably exhausted in-language solutions? | Profiling, algorithmic improvements, infrastructure scaling, and GC tuning all tried and failed. | Stop |
| 03 | Is the problem inherent to the language’s model — not just your code? | Go’s GC cannot be disabled. Node.js is single-threaded. Python’s GIL limits parallelism. These are model constraints; a slow algorithm is not. | Stop |
| 04 | Is the scope a single bounded component? | Discord’s Read States service. Figma’s multiplayer path. Dropbox’s sync engine. Not “the backend” or “the monolith.” | Risky |
| 05 | Can you run old and new in parallel and measure the difference? | A/B deployment is the only way to validate ROI on day one. Without it, you cannot prove the migration paid off. | Risky |
| 06 | Is the team already productive in the target language, or has learning budget been explicitly allocated? | Rust’s 30–50% velocity drop in the first 3–6 months is real and documented. Budget it before you start or it will look like project failure. | Budget first |
| 07 | Is the business stable enough to absorb the feature velocity cost? | Pre-PMF startups should almost never rewrite. The bottleneck you are solving today may be in a component you pivot away from next quarter. | Yes to all → proceed |
10. Language Fit Guide: Rust vs Go vs TypeScript
| Scenario | Best fit | Why | Avoid if |
|---|---|---|---|
| GC-pause latency in hot path | Rust | No GC, deterministic memory release, Discord/Figma proven pattern | Team has no Rust experience — budget 3–6 months onboarding cost first |
| Python/Ruby service hitting CPU ceiling | Rust or Go | Either language will dramatically improve CPU throughput; Go is lower learning curve | The bottleneck is actually I/O or database — optimise that first |
| Memory safety bugs in security-critical code | Rust | Ownership model eliminates entire classes of CVEs at compile time; Microsoft, AWS pattern | The codebase is not security-critical — the ROI requires a security threat model to justify |
| High-concurrency microservice or CLI tool | Go | Goroutines, lower learning curve than Rust, single binary deployment, strong stdlib | You need guaranteed-zero GC pauses — Go’s GC is good but not eliminable |
| Gradually typing a JavaScript codebase | TypeScript | No language migration — incremental adoption, same runtime, immediate IDE feedback | You want runtime performance gains — TypeScript compiles to JS, no runtime difference |
| Legacy Ruby/Python — scaling pressure but working product | Stay + invest | Shopify pattern: invest in JIT, type checkers, and architecture before migrating | You’ve exhausted this path and still have a specific, measured language-model constraint |
| Proof of concept / startup, pre-PMF | Don’t rewrite | Pivot risk makes migration cost unrecoverable; performance rarely limits growth at this stage | You have a specific security or safety requirement that genuinely requires a different language |
The rewrite velocity curve: scoped component vs full system

11. What We Have Learned
The language rewrite question has a real answer in 2026 — it is just not the same answer for everyone. Here is the distilled version of what the documented evidence shows:
- Scope is everything. Every successful documented rewrite was a bounded component, not a full system. Discord’s Read States service. Figma’s multiplayer hot path. Dropbox’s sync engine. When the scope expands to “the backend,” the failure rate climbs sharply.
- The problem must be measurable and language-model-specific. GC pauses, single-threaded event loops, GIL limitations — these are language-model constraints. Slow algorithms and messy code are not. Only the former justify a migration.
- The counter-case (Shopify) is as instructive as the success cases. Investing in the existing stack — JIT compilers, type checkers, better architecture — often delivers more ROI than a migration, and it preserves years of accumulated production knowledge.
- Rust’s velocity cost is real. A 30–50% productivity drop during the first 3–6 months is documented consistently. It is not a reason to avoid Rust when it is the right tool. It is a cost that must be planned for explicitly.
- Rust is the right answer for: GC-pause elimination, memory safety in security-critical paths, and correctness guarantees in complex concurrent systems. Go is the right answer for: high-concurrency services, developer teams that need faster onboarding, and CLI tools requiring a single static binary.
- TypeScript is almost never a language migration. It is an incremental typing layer on existing JavaScript. Conflating it with a Rust or Go rewrite overstates both its cost and its benefit.
- Joel Spolsky was right about full rewrites and wrong about scoped ones. The distinction he missed — which the 2020s evidence makes clear — is that a rewrite of one bounded component is categorically different from a full-system rewrite. The former can be rational. The latter almost never is.



