Deep Learning
The deep learning era spans roughly 1986–present, defined by the ability to train neural networks with many layers end-to-end using gradient descent. The key papers are covered in Foundational Papers.
Artificial Neurons
Inspired by biological neurons, an artificial neuron performs a simple computation:
\[ y = \sigma(w_1x_1 + w_2x_2 + \dots + w_nx_n + b) \]
where \(x_i\) are the inputs, \(w_i\) the weights, \(b\) the bias, and \(\sigma\) an activation function (e.g., ReLU, sigmoid). This is the multi-layer generalisation of the Perceptron.
Layers
- Input Layer — receives raw data.
- Hidden Layers — process data through weighted connections; learning happens here.
- Output Layer — produces the final result (class label, probability, etc.)
Backpropagation
Algorithm for computing gradients using the chain rule, propagating error backwards through the network. Introduced by Rumelhart, Hinton & Williams (1986) — the paper that made training deep networks practical.
Neural Network Architectures
- Feedforward Neural Networks (FNN) — simplest architecture, data flows in one direction
- Convolutional Neural Networks (CNN) — specialized for images and spatial data
- Recurrent Neural Networks (RNN) — designed for sequential data
- Transformers — attention-based models for NLP and beyond
Regularization Techniques
Techniques to prevent overfitting:
- Dropout — randomly zeroing units during training (Srivastava et al., 2014)
- L2 Regularization (Weight Decay) — penalizing large weights
- Early Stopping — halting training when validation loss stops improving
Concepts
The following glossary pages cover the building blocks used across all deep learning architectures:
- Activation Functions — step, sigmoid, ReLU, softmax, GELU
- Loss Functions — MAE, MSE, cross-entropy
- Optimizers — SGD, Adam, learning rate schedules
- Tensors — the fundamental data structure