How do you encode categorical features? Compare one-hot, target encoding, and embeddings.

Question

Accepted Answer

You have a categorical feature with high cardinality — say, 10,000 unique zip codes. How do you encode it? Walk me through the trade-offs of different approaches. Think about: what one-hot encoding does to a 10,000-cardinality feature in terms of matrix size and sparsity. Why target encoding can leak label information and how to prevent it. Why an embedding representation is better than one-hot for neural networks. What happens at inference time for categories not seen during training. **One-hot