You want to test whether a new recommendation algorithm increases click-through rate. Walk me through the full A/B testing process — from design through analysis. What decisions do you make upfront, and why?
You said "don't peek at results early." But in practice, experiments cost money and time. What if you want to stop early when results are clearly significant or clearly hopeless? Is there a rigorous way to do that?
tldr
A/B testing = rigorous hypothesis testing for product decisions. Decide sample size upfront based on minimum detectable effect and desired power. Don't peek at results early unless using sequential testing methods (O'Brien-Fleming, always-valid intervals) that control false positive rates. The analysis itself is straightforward — the discipline is in the design decisions that happen before data collection begins.
follow-up
- Your A/B test shows a statistically significant 2% improvement in CTR, but revenue per user decreased by 1% (not significant). How do you make the launch decision?
- How do you handle network effects in an A/B test — for example, testing a social feature where one user's experience affects another's?
- What is the multiple comparisons problem, and how do you handle testing 5 variants simultaneously?