Deep Learning

The deep learning era spans roughly 1986–present, defined by the ability to train neural networks with many layers end-to-end using gradient descent. The key papers are covered in Foundational Papers.

Artificial Neurons

Inspired by biological neurons, an artificial neuron performs a simple computation:

\[ y = \sigma(w_1x_1 + w_2x_2 + \dots + w_nx_n + b) \]

where \(x_i\) are the inputs, \(w_i\) the weights, \(b\) the bias, and \(\sigma\) an activation function (e.g., ReLU, sigmoid). This is the multi-layer generalisation of the Perceptron.

Layers

Input Layer — receives raw data.
Hidden Layers — process data through weighted connections; learning happens here.
Output Layer — produces the final result (class label, probability, etc.)

Backpropagation

Algorithm for computing gradients using the chain rule, propagating error backwards through the network. Introduced by Rumelhart, Hinton & Williams (1986) — the paper that made training deep networks practical.

Neural Network Architectures

Feedforward Neural Networks (FNN) — simplest architecture, data flows in one direction
Convolutional Neural Networks (CNN) — specialized for images and spatial data
Recurrent Neural Networks (RNN) — designed for sequential data
Transformers — attention-based models for NLP and beyond

Regularization Techniques

Techniques to prevent overfitting:

Dropout — randomly zeroing units during training (Srivastava et al., 2014)
L2 Regularization (Weight Decay) — penalizing large weights
Early Stopping — halting training when validation loss stops improving

Concepts

The following glossary pages cover the building blocks used across all deep learning architectures:

Activation Functions — step, sigmoid, ReLU, softmax, GELU
Loss Functions — MAE, MSE, cross-entropy
Optimizers — SGD, Adam, learning rate schedules
Tensors — the fundamental data structure