Part IV: Deep Architectures
Why do different neural network architectures exist? Because structure matters. The right architecture makes certain patterns easy to learn and others difficult. Architecture encodes inductive biases—assumptions about the problem that guide learning.
This part explores how network structure creates capabilities. Convolutional networks exploit spatial structure in images. Recurrent networks maintain state for sequences. Attention mechanisms provide flexible context. Transformers combine these ideas to power modern language models.
Convolutional neural networks revolutionized computer vision by encoding two insights: nearby pixels are related (spatial locality), and patterns appear anywhere in an image (translation invariance). Weight sharing across space makes CNNs efficient and effective for vision tasks.
Recurrent neural networks process sequences by maintaining hidden state—memory that carries information forward through time. While largely superseded by Transformers for language, RNNs remain useful for streaming data and low-resource scenarios.
Embeddings bridge discrete symbols and continuous neural networks. Words, tokens, and categorical variables must be represented as vectors before neural networks can process them. Learned embeddings capture semantic relationships in vector space.
Attention mechanisms let models focus on relevant parts of input dynamically. Instead of compressing everything into fixed-length vectors, attention selectively weighs different parts of the input. This flexibility solves bottlenecks that plagued earlier architectures.
Transformers combine attention with parallel processing. They’re the foundation of modern language models, but also power vision systems and multimodal models. Understanding Transformers is essential for understanding contemporary AI.
Each architecture choice encodes assumptions about the problem. CNNs assume spatial structure, attention assumes some inputs matter more than others. Choosing the right architecture means choosing the right inductive bias for your problem.
After this part, you’ll understand why language models use Transformers (Part V) and how architecture choices affect system design (Part VI).