100%

Search and Retrieval: Index

From Machine Learning
Search and Retrieval
Page metadata
First created May 30, 2026
Last edited May 31, 2026

Working notes building up the mental model for modern search and retrieval systems from zero. One concept per page. Written in the order I had to learn them to make sense of a production hybrid-retrieval stack.

Index

  • What Search Is. Starting from grep and working up to the shape of meaning-based retrieval. The distributional hypothesis, why exact-match breaks, and the basic structure every modern retrieval system inherits.
  • Counting Words Smarter: TF, Length Normalization, and IDF. Building lexical retrieval up from raw word-overlap to the three ideas that make it actually work: term frequency, document-length normalization, and inverse document frequency.