Defensive Range Checks in a Browser-Side Compression Decoder

A topical explainer on parameter validation for binary decoders that consume server-pushed data, written at the class-of-issue level. The specific patch this maps to is currently under Mozilla security embargo, so I’m describing the kind of decoder, the kind of input, and the kind of defensive treatment without naming the file or linking the bug.

A note on what’s missing: This post does not link to a Bugzilla bug or a Phabricator revision, and does not name the specific compression scheme, decoder file, or call-site geometry. The bug is in a restricted security group at Mozilla, which means it is not publicly accessible until Mozilla decides it can be opened. Until then, the technical content here stays at the level of how this class of bug arises in browser-side decoders generally, not here is the input that breaks the specific decoder. When the bug becomes public, I’ll update this post with the links and the specifics.


The setting

Browsers ship with a handful of small binary decoders for data that arrives over the network from infrastructure services. The bytes don’t come from a web page or from the user. They come from servers that the browser is configured to trust, get pulled down on a schedule, and feed into subsystems that make decisions about what to load and what to block.

Compression decoders are a common shape for this kind of code. They’re useful because the payloads are small numbers — deltas, indices, hashes — that compress well with schemes designed for that distribution. They’re also exactly the kind of code where “what if a parameter is weird” is a question that needs an answer in the source, not in the reviewer’s head.


Why decoders on a trust boundary need explicit range checks

The instinct, especially with code that’s worked fine for years, is to assume the inputs are well-formed because the producer is well-known. That instinct is a liability. Three reasons:

  1. The producer can be wrong. Update services have rolled out malformed payloads before, in many products, by accident. If the decoder doesn’t validate, an upstream bug becomes a downstream incident.

  2. The trust assumption can shift. A decoder originally written for one specific service may get reused for another later. Range checks at the decoder layer travel with the code; range checks in calling code don’t.

  3. The cost is essentially zero. A handful of if (param > MAX) return error lines. Compared to the cost of not having them when something does go wrong, it’s not even close.

The shape of these patches, in general, is: identify the configuration parameters the decoder reads, identify what range each one is meaningful in, and add explicit early-out checks at the moment the value is parsed. Anything outside the meaningful range is rejected before it can be used to drive subsequent reads, shifts, allocations, or loop counts.


The general lesson

When auditing a binary decoder of this shape for the first time, the questions to walk through:

  • What does each input parameter mean, and what range is it actually defined over? Just because the field is a uint32_t doesn’t mean every uint32_t value makes sense. Most binary formats have much narrower meaningful ranges than their storage type implies.

  • What happens if the parameter is at the edge? Zero, max, max minus one. Edges are where bugs hide.

  • What does the parameter control downstream? If it controls a loop count, you have one class of risk. If it controls a shift amount, another. If it controls a buffer size, another. If it controls how many bits to read for a subsequent value, another. The class of risk depends on what the parameter is used to drive, not what it is.

  • Is the check at the right layer? If the decoder doesn’t validate, every caller has to. That’s a lot of surface area, and one missing check is enough.

None of this is novel. It’s the standard advice for any binary parser sitting on a trust boundary. The patch was small. The reason to write about it isn’t the patch itself, it’s the habit: when you touch a decoder, the first thing you do is enumerate the parameters and decide what range each one belongs to.


What I learned

  1. Defensive checks at the decoder layer are cheap insurance. Even when the producer is “trusted infrastructure,” the trust assumption is a runtime property, not a compile-time one. Bugs upstream are cheaper to absorb at the decoder than to debug after they’ve propagated.

  2. The work of auditing a decoder is mostly enumeration. Writing the actual if statements is trivial. The non-trivial part is figuring out what the meaningful range is for each parameter, which means reading the spec, reading similar implementations, and sometimes deciding what range you want to enforce even if the spec is loose.

  3. Security work isn’t always glamorous. Sometimes it’s a few dozen lines of bounds checks in a decoder almost no one will ever look at again. That’s fine. The whole point is that no one ever looks at it again.

  4. Embargoed security bugs are why this post is missing IDs. Mozilla files certain classes of bug in restricted groups so that fixes can land in release before the bug becomes public. That’s the right policy. It also means the writeup has to live at the class-of-issue level until the embargo lifts. When that happens, I’ll backfill the references and the specifics.