Migration
This is the last leaf in the map, and it is the one where everything else gets put at risk on purpose. A search system that works is being moved from one backend to another, a self-hosted lexical engine to a hosted or distributed platform, and the entire difficulty is that the system works: users depend on it, and the move cannot make their experience worse, even briefly. Migration is where the relevance metrics and the atomic handoff stop being good practice and become the only things standing between a careful upgrade and a visible outage. I wanted to understand it as a procedure rather than a vibe, and the procedure turns out to be mostly about never being committed to a state you cannot leave.
Why not just cut over
The naive migration is a cutover: build the new engine, test it, and one night switch all traffic from old to new. It is tempting because it is simple, and it is dangerous because it is all-or-nothing. The new engine has been tested offline, but offline judgments are incomplete and load behavior cannot be fully simulated, so the first time the new engine sees the full shape and volume of real traffic is the moment it is serving all of it. If something is wrong, a relevance regression on query types the judgments missed, a latency cliff under real concurrency, a downstream interaction nobody modeled, every user hits it at once, and the only recovery is a reverse cutover under pressure. A cutover bets the whole system on the new engine being right on the first try with real traffic, which is exactly the bet you cannot validate until you have already made it.
Ramp, with metrics as the guardrail
The safer procedure is a gradual ramp. Both engines run at once, and you send a small fraction of traffic to the new engine while the old engine serves the rest, then increase the fraction step by step, watching metrics at every step. This is the two-backends-coexisting condition put to work: the overlap is not an awkward transitional state to minimize, it is the mechanism that makes the migration safe, because at every moment most traffic is on the engine you trust and you can stop or reverse the ramp instantly.
The metrics are the guardrail, and the ramp is what makes them actionable. As traffic moves to the new engine you watch two things, because a migration can regress on either. Relevance, measured both offline against judgments before the ramp and online via the live A/B that the ramp effectively is, since some users get the new engine and some the old, comparable on real behavior. And latency, because the new engine can match the old one’s ranking and still be slower under load, and users feel slowness directly. A win on relevance does not excuse a loss on latency or the reverse; both have to hold. If either metric goes bad at a given traffic fraction, you stop ramping, and if it is bad enough you roll back, before the next increment exposes more users. The ramp converts a single terrifying yes-or-no decision into a sequence of small reversible ones, each gated on numbers.
The killswitch and failover
Underneath the ramp are the safety valves that make “roll back” a real option rather than a hope. A killswitch is a fast, deliberate way to pull the new engine out of the traffic path and send everything back to the old one, in a single action, without a deploy or a rebuild. It exists so that when a metric goes bad, the response is one switch and seconds, not an incident bridge and minutes. Failover is the automatic cousin: if a backend becomes unhealthy, traffic routes to the healthy one without a human in the loop. Together they mean a bad new engine, or a bad index promoted into either engine, degrades to “we are back on the known-good path” rather than “search is down.”
This is where the migration leans directly on the staging-to-serving handoff. Because builds are immutable and promotion is an atomic pointer flip, the old engine and its last-good index are still intact and serving throughout the ramp, so rolling back is flipping traffic and pointers to a state that never went away, not reconstructing one. The cheapness of the rollback is what makes the aggressive ramp safe: you can push traffic onto the new engine precisely because backing it out costs one action. A migration without a fast, tested rollback is a cutover wearing a ramp’s clothing, because the moment you cannot reverse, you are committed, and committed is the state the whole procedure exists to avoid.
Defining done before you start
The last discipline, and the one easiest to skip, is deciding what “successfully migrated” means before the ramp begins, in the same concrete terms as “don’t regress.” Concretely: the new engine holds at or above the old engine’s relevance metric within a stated tolerance, and at or below its latency at full traffic, sustained over a stated period, across the query types that matter. Naming that bar in advance is what keeps the migration from drifting into “it seems fine, let us just finish,” which is how regressions ship. The metric is the gate, the ramp is how you approach the gate without risking everyone at once, and the killswitch is how you retreat from it, and all three depend on having decided beforehand what passing the gate actually requires.
The thing I take from the whole serving territory, and from this leaf as its endpoint, is that operating search well is less about any one clever component than about never losing the ability to undo. Immutable builds you can roll back to, a ramp you can stop, a killswitch you can pull, two engines either of which can carry the load. The retrieval and ranking work fills the system with quality; the serving work makes that quality safe to change. A search system is not finished when it ranks well. It is finished when you can improve it without being afraid to.
Up to serving at scale. The metric that gates it is relevance evaluation; the immutable handoff that makes its rollback fast is staging to serving. Back to the whole map, where this all began with five hundred friars and one book.