From The Indexing Pipeline

Staging to Serving

The batch job produces a fresh index and IDF artifact. The serving layer reads an index from some location, continuously, while users are querying. The problem this leaf is about is the join between them: how the new build gets into the location the live system reads without the live system ever reading a half-written file. It sounds like plumbing, and it is, but it is the plumbing where a careless pipeline takes search down for everyone, so it is worth doing precisely.

The failure to design against

Picture the naive version. The job writes the new index directly into the directory the serving layer reads, file by file. Halfway through, a query arrives. The serving layer reads an index that is part old and part new, or part written and part missing, an inconsistent state that exists only for the seconds the write takes. The result is wrong answers or a crash, on live traffic, intermittently, for the duration of every rebuild. The bug is a race between the writer and the readers, and the fact that it only shows up during the write window makes it exactly the kind of thing that passes every test and fails in production.

The whole discipline of the handoff is to make sure the serving layer only ever sees a complete, consistent index, never a partial one. The way you guarantee that is to never write into the live location incrementally. You build somewhere else, and you make the live location point at the finished build in a single indivisible step.

Build into staging, promote atomically

The pattern is two locations and an atomic switch. The job builds the entire index into a staging location, taking as long as it needs, with no reader looking at staging. When the build is fully complete and verified, it is promoted into the serving location in one atomic operation, an operation that either fully happens or does not, with no in-between state a reader could observe. Because the build is immutable, the old index stays intact and serving the whole time the new one is being built in staging, and the switch flips readers from the complete old index to the complete new one with no partial state in between.

What “atomic” means concretely depends on the storage. On a filesystem it can be writing to a temporary directory and then a rename, which is atomic, or swapping a symlink that the serving layer follows. On object storage it can be flipping a pointer or a manifest that names the current build. The mechanism varies; the property is the same: the change from old to new is a single step with no observable middle, so a reader sees either the entire old index or the entire new one and never a mix. The symlink version is the clearest illustration of the shape:

bash

# WRONG: the serving layer reads /serving/index while it is half-written
build_index --out /serving/index            # readers see a partial index mid-build

# RIGHT: build into staging, then flip a symlink in one atomic step
build_index --out /staging/index-2026-06-07     # slow; no reader looks at /staging
verify /staging/index-2026-06-07                # only promote a complete, checked build
ln -sfn /staging/index-2026-06-07 /serving/current.tmp
mv -T /serving/current.tmp /serving/current     # atomic rename: readers flip in one step
# the previous build is still on disk -> rollback is another one-line symlink flip

The serving layer always reads /serving/current; it never sees a partial index, and the prior build staying intact is what makes the rollback in the next section a single cheap operation.

Two orderings, and which is safe

There is a sequencing question inside this that is easy to get subtly wrong: do you build before or after you promote, and what exactly is the promote pointing at. The safe ordering follows from the failure you are designing against. Build to completion in staging first, verify the build is whole, and only then promote, because promoting before the build is complete would point the serving layer at a half-built index, which is the exact race you are trying to avoid. Promotion is the last step, and it is the only step that touches what readers see. Everything expensive and slow happens in staging where no reader is watching; the only thing that happens in the live path is the instantaneous switch.

This also means promotion should be reversible. If the new build turns out to be bad, having the previous build still intact, because it was immutable and you only flipped a pointer, lets you flip the pointer back and roll the serving layer onto the known-good index without rebuilding anything. That rollback path is the seam where this pipeline concern becomes the operational safety of serving: the atomic, reversible promote is what makes a bad index survivable rather than an outage.

Discovering the latest build

A detail that matters more than it seems is how the promote step finds the build to promote. The naive version hardcodes a path, and that breaks the moment builds are timestamped or versioned, which they should be, because you want the previous builds to stick around for rollback rather than being overwritten. So the pipeline has to discover the most recent staged build rather than assume its location, typically by listing the staging area and selecting the latest by timestamp or by a version marker. Hardcoding the path couples the promote step to one build’s name and quietly fails or promotes the wrong build when the naming changes; discovery by timestamp is robust to the build layout evolving and is what lets old builds accumulate safely as rollback targets.

Reusing what exists

The last point is less a mechanism than a discipline. Assembling an index from its parts and promoting it from staging to serving are operations a mature search stack usually already has, because every index build needs them. The temptation when adding a new artifact, like an IDF table, is to write fresh assembly and promotion logic for it. The better move is almost always to route the new artifact through the existing assembly and promotion path, so it inherits the atomicity, the discovery, and the rollback that path already got right, rather than reimplementing those guarantees and getting one of them subtly wrong. Whether the existing promotion logic can be reused for a new artifact, and where the new artifact slots into the existing build-and-promote flow, is a real question to answer before building anything, because the existing path has already paid for the correctness you would otherwise have to earn again.

The thread here is that the handoff is not about moving files, it is about never letting a reader observe an inconsistent state, and every part of the pattern, staging, immutable builds, atomic promote-last, timestamped discovery, reused promotion logic, exists to preserve that one invariant. Get the invariant right and a rebuild is invisible to users; get it wrong and every rebuild is a small outage.

Up to the indexing pipeline. The build it promotes is the immutable artifact from batch versus incremental, whose schema is the IDF artifact. The rollback and failover this enables are part of serving at scale and migration.