mlprep
mlprep/ML Breadthmedium12 min

Explain how RNNs work and where they still fit in the ML landscape today.

formulate your answer, then —

You mentioned vanishing gradients — how do LSTMs actually solve this, and what's the core mechanism that makes the difference?

formulate your answer, then —

tldr

RNNs model sequences with a recurrent hidden state but suffer from vanishing gradients over long sequences. LSTMs fix this with a cell state that updates additively, not multiplicatively — the forget gate acts as a gradient valve, letting relevant signals flow backward across hundreds of steps unchanged.

follow-up

  • How would you decide between an LSTM and a transformer for a new time-series forecasting task?
  • What is gradient clipping and why is it necessary even with LSTMs?
  • How do modern state space models like Mamba differ from LSTMs, and what problem are they trying to solve?