Transformers, normalization techniques, and the architectural decisions behind modern neural networks.