Part II: Classical Machine Learning

Most production machine learning runs on methods developed decades ago. Linear regression, logistic regression, decision trees, and gradient boosting dominate real-world applications. They work, they’re interpretable, and they remain practical for many problems.

This part covers classical machine learning: the workhorses of industry. These methods optimize loss functions through various strategies, from closed-form solutions to iterative algorithms. Understanding them shows that neural networks aren’t revolutionary—they’re evolutionary extensions of familiar ideas.

We start with linear models, the simplest approach to prediction. Despite their simplicity, linear models power countless applications. They’re fast to train, easy to interpret, and work well when relationships are approximately linear. More importantly, they introduce concepts that carry forward to neural networks: weights, features, loss functions, and optimization.

Logistic regression extends linear models to classification. The core insight: transform linear outputs into probabilities through a nonlinear function. This pattern—linear transformation followed by nonlinearity—appears throughout machine learning.

Decision trees take a different approach: partition the input space into regions and make predictions within each region. They handle nonlinear relationships naturally and remain interpretable. But single trees are weak and unstable.

Ensembles combine many weak models into strong predictors. Random forests and gradient boosting consistently win on tabular data. They’re the default choice for problems where neural networks aren’t necessary.

Finally, we formalize loss functions and optimization. Every model learns by minimizing some measure of error. Understanding loss functions and optimization prepares you for neural network training, which uses the same principles at larger scale.

After this part, you’ll see that neural networks (Part III) build on these foundations. They use the same loss functions and optimization strategies, but learn features automatically instead of relying on manual feature engineering.