Deep Learning

The deep learning era spans roughly 1986–present, defined by the ability to train neural networks with many layers end-to-end using gradient descent. The key papers are covered in Foundational Papers.

Artificial Neurons

Inspired by biological neurons, an artificial neuron performs a simple computation:

\[ y = \sigma(w_1x_1 + w_2x_2 + \dots + w_nx_n + b) \]

where \(x_i\) are the inputs, \(w_i\) the weights, \(b\) the bias, and \(\sigma\) an activation function (e.g., ReLU, sigmoid). This is the multi-layer generalisation of the Perceptron.

Layers

  • Input Layer — receives raw data.
  • Hidden Layers — process data through weighted connections; learning happens here.
  • Output Layer — produces the final result (class label, probability, etc.)

Backpropagation

Algorithm for computing gradients using the chain rule, propagating error backwards through the network. Introduced by Rumelhart, Hinton & Williams (1986) — the paper that made training deep networks practical.

Neural Network Architectures

  • Feedforward Neural Networks (FNN) — simplest architecture, data flows in one direction
  • Convolutional Neural Networks (CNN) — specialized for images and spatial data
  • Recurrent Neural Networks (RNN) — designed for sequential data
  • Transformers — attention-based models for NLP and beyond

Regularization Techniques

Techniques to prevent overfitting:

  • Dropout — randomly zeroing units during training (Srivastava et al., 2014)
  • L2 Regularization (Weight Decay) — penalizing large weights
  • Early Stopping — halting training when validation loss stops improving

Concepts

The following glossary pages cover the building blocks used across all deep learning architectures:

Resources