Part I: Foundations

Before we can understand neural networks, language models, or production AI systems, we need to understand learning itself. What does it mean for a machine to learn? How does data shape what’s possible? Why do some models generalize while others fail?

This part builds the foundations used throughout the rest of the book. The concepts here—learning as optimization, data quality, compression, generalization tradeoffs, and feature representation—apply whether you’re training a linear model or a large language model.

We start by framing machine learning as prediction under uncertainty. Models learn by minimizing error over training data, adjusting parameters through optimization. This simple principle underlies everything from decision trees to Transformers.

We then explore the role of data. Data determines what can be learned and what remains out of reach. More data helps, but quality matters more than quantity. Understanding how data shapes models helps you debug failures and set realistic expectations.

Next, we examine models as compression machines. A model that memorizes training data perfectly has learned nothing useful. Effective models compress patterns from training data into parameters that generalize to new examples. The quality of this compression determines model performance.

The bias-variance tradeoff governs all of machine learning. Simple models underfit—they’re too rigid to capture patterns. Complex models overfit—they memorize noise instead of signal. Finding the right balance is the core challenge of applied ML.

Finally, we look at features: how machines see the world. Raw data must be transformed into representations that expose relevant patterns. In classical ML, engineers design features manually. In deep learning, models learn features automatically. Either way, representation determines what’s learnable.

After this part, you’ll understand the fundamental concepts that make machine learning work. These foundations will help you understand why neural networks learn features (Part III), why different architectures suit different problems (Part IV), and why production systems fail (Part VII).