Your training data comes from user clicks on ranked results. What's wrong with training directly on this data, and how do you correct for position bias?
formulate your answer, then —
tldr
Position bias: users click high-position items more regardless of relevance, so clicks conflate relevance and position. Training directly on clicks teaches the model to predict position, not quality — creating self-reinforcing feedback loops. Fix with: Inverse Propensity Scoring (IPS) — weight clicks by 1/p(position); position-as-feature trick — include position in training, zero it out at serving; or randomized exploration to collect unbiased data. IPS is principled, position-as-feature is cheap and widely used.
follow-up
- What is the position-as-feature trick in detail, and what assumptions does it make about how position affects clicks?
- How would you estimate click propensity without running a full randomization experiment?
- Beyond position, what other types of exposure bias exist in recommendation systems?