Artificial Intelligence

A paper-centric guide to artificial intelligence — from the first neuron model in 1943 to the large language models of today.

Modern AI was not invented in a single lab or a single year. It grew through decades of papers — each one answering a stubborn question, opening a new door, or slamming one shut. This guide is organized around those papers: the ideas that drove the field forward, the dead ends that redirected it, and the breakthroughs that changed everything.

How to use this guide

Each section below covers an era. The landmark papers are highlighted with the question they set out to answer. Click through to dedicated pages for in-depth explanations and code. Use Fundamental Papers for the complete annotated chronological reading list.

Era 1 · Origins of Neural Computation (1943–1969)

Can a machine think? Can it learn?

The story begins with a deceptively simple question: can networks of simple units compute? Two decades of work establish both the promise and the hard limits of artificial neurons.

Year	Paper	Key contribution
1943	McCulloch & Pitts — A Logical Calculus…	First mathematical model of a neuron
1958	Rosenblatt — The Perceptron	First learning algorithm for neural networks
1969	Minsky & Papert — Perceptrons	Proof of fundamental limits → first AI Winter

Deep dives:

Perceptron (Rosenblatt, 1958) — the algorithm that started it all, with Python implementation

Era 2 · The Backpropagation Revolution (1986–1991)

How can a deep network learn?

After a decade of winter, a rediscovered algorithm reignites the field. Backpropagation makes it possible to train multi-layer networks — and the ideas from this era still underpin every neural network trained today.

Year	Paper	Key contribution
1986	Rumelhart, Hinton & Williams	Backpropagation — gradient-based training for deep nets
1989	LeCun et al.	Convolutional Neural Networks — deep learning applied to vision
1989	Cybenko	Universal Approximation Theorem — networks can approximate any function
1991	Bottou	Stochastic Gradient Descent as the standard optimizer

Concepts:

Deep Learning — neurons, layers, backpropagation, architectures
Optimizers — SGD, Adam, learning rate schedules
Loss Functions — MAE, MSE, cross-entropy
Activation Functions — ReLU, sigmoid, softmax
Tensors — the fundamental data structure

Era 3 · Scaling & Representation (1997–2012)

Can deep networks handle language and massive images?

LSTMs tame long sequences. Word embeddings encode meaning as vectors. Then in 2012, AlexNet wins ImageNet by a landslide and the deep learning era truly begins.

Year	Paper	Key contribution
1997	Hochreiter & Schmidhuber	LSTM — memory cells for long sequences
2003	Bengio et al.	Word embeddings — distributed representations of language
2012	Krizhevsky, Sutskever & Hinton	AlexNet — GPU-trained CNNs dominate ImageNet
2012	Mikolov et al.	Word2Vec — semantic vector arithmetic

Concepts:

Machine Learning — supervised vs. unsupervised, bias-variance
Cross-Validation — robust model evaluation
Parsnip — unified modeling in R (tidymodels)

Era 4 · Attention & the Transformer (2014–2020)

What if we dropped recurrence entirely?

A single architecture — the Transformer — rewrites NLP, then vision, then science. The attention mechanism becomes the universal building block.

Year	Paper	Key contribution
2014	Bahdanau et al.	Attention mechanism — alignment between encoder and decoder
2015	He et al.	ResNet — residual connections enable 100+ layer networks
2017	Vaswani et al.	Attention Is All You Need — the Transformer architecture
2018	Devlin et al.	BERT — bidirectional pre-training for language
2020	Brown et al.	GPT-3 — few-shot learning at scale

Concepts:

Transformers — self-attention, positional encoding, encoder-decoder
Foundation Models — Llama, Gemma, Mistral, DeepSeek
HuggingFace Transformers — pipeline() API in practice

Era 5 · Foundation Models & Reasoning (2021–present)

Can scale alone produce intelligence?

Scaling laws, instruction tuning, RLHF, and chain-of-thought prompting transform large language models from text predictors into reasoning systems.

Year	Paper	Key contribution
2021	Kaplan et al.	Scaling Laws — loss as a power law of compute
2022	Ouyang et al.	InstructGPT / RLHF — aligning LLMs to human intent
2023	Touvron et al.	Llama — open-weight foundation models
2025	DeepSeek-AI	DeepSeek-R1 — reasoning via reinforcement learning

Concepts:

Foundation Models
Learning Paradigms — where statistical, ML, and deep learning meet

Resources

Fundamental Papers — full annotated chronological reading list (1943–2025)
References — textbooks and links
Jupyter — notebook tips and workflows
Kaggle — competitions and datasets

Era 1 · Origins of Neural Computation (1943–1969)

Era 2 · The Backpropagation Revolution (1986–1991)

Era 3 · Scaling & Representation (1997–2012)

Era 4 · Attention & the Transformer (2014–2020)

Era 5 · Foundation Models & Reasoning (2021–present)

Resources

Online Courses