Transformers

Attention Is All You Need

The transformer architecture was introduced by Vaswani et al. (2017). It replaced recurrence with a self-attention mechanism, enabling massive parallelization and leading to breakthrough results in NLP and beyond.

Key Concepts

Self-Attention

Positional Encoding

Encoder-Decoder Architecture

Resources

Attention Is All You Need (arXiv 1706.03762)
Papers With Annotations
The Illustrated Transformer — Jay Alammar