Applied LLM Engineering: Index
| From Search Infrastructure and Software Engineering at Shopify | |
| Applied LLM Engineering | |
|---|---|
| Page metadata | |
| First created | May 19, 2026 |
| Last edited | Jun 7, 2026 |
The model side of the team’s work: getting the Sidekick assistant to behave for the help-tooling job through prompting, context design, and evaluation. Two patterns I worked through from the team’s public talks, the counterpart to the search-infrastructure side I spend most of my time on.
Posts
Just-in-Time Context: Moving Tool Instructions Out of the System Prompt
Calibrating an LLM Judge: Cohen’s Kappa from 0.02 to 0.61
Index
- Just-in-Time Context: Moving Tool Instructions Out of the System Prompt. A working note on Death by a Thousand Instructions and the pattern of returning tool guidance inline with tool results.
- Calibrating an LLM Judge: Cohen's Kappa from 0.02 to 0.61. A working note on how the Sidekick team turned an unreliable LLM-as-judge into a usable training signal, with the statistics that made the calibration legible.