
Last time, we had a chance to pop open the hood of Machine Learning, and see some of the key components that makes ML tick.
Today, we’re going to open the skull and see the brain of machine learning (I mean that in the least morbid way possible).
In other words, we’re going to explore how machine learning models actually learn. What’s really going inside of these models? How do they know that they need to improve?
Let’s run through this epoch.

How Models Actually Learn
I want you to think of a teacher you had in your life that was tough. Like, Agatha Trunchbull from Matilda tough. (Ok, maybe not that bad, but the teacher should be strict enough for you to remember).
Now, imagine this teacher gave you an exam that you had no time to prepare for. What’s the best you can do? (besides cheating)
That’s right. You could only take your best guess on what the answer is.
After taking the exam, it’s time to face the music and hear the feedback from your teacher. As expected, your teacher doesn’t sugarcoat it, he/she tells you straight up how far off your answers were. Ouch.
So, what do we do next? Well being the smart and persistent individual you are, you could do the following: use the poor exam results as feedback to figure what to study better.
This is pretty much how machine learning models learn!
ML Models learn just like a curious and determined student; by making mistakes, checking the answers, and adjusting their understanding.
When the models measure how wrong their answers were, they use the Loss Function. When they make adjustments to improve they use Gradient Descent. (Yes, that’s where the name of our newsletter comes from!).
Let’s break these down.

The Loss Function
Let’s bring back Ms. Trunchbull that I mentioned above.

Loss Function has never been scarier.
Imagine the loss function as a dramatic, over the top instructor, yelling at your machine learning model every time it messes up: “WRONG AGAIN! THAT’S NOT EVEN CLOSE!”
It’s basically like the model’s report card, measuring how bad its predictions are. The bigger the mistake, the more disappointed the instructor gets, so it keeps tweaking itself to get better.
The Loss Function is a mathematical measure that quantifies how far off a prediction is from the actual answer.
Thus, a lower loss function generally indicates that the model is performing better.

The Gradient Descent
Gradient Descent is much more than just the name of my newsletter. It comes from Calculus, born as a means to provide mathematical optimization.
In other words…
Gradient Descent is what helps our model perform better. One step after another.
In ML, gradient descent is the process of adjusting the models’ parameters, to minimize the loss.
You remember those old AM/FM radios? You’ve have to turn the knobs around experimentally, until all the static becomes less audible, eventually disappearing entirely.
Gradient descent essentially employs this for models, by optimizing the ML algorithm to find it’s minimum point, which represents the optimal set of parameters where the model performs best on the training data.

Picture Credit to Rishi Zirpe

The Learning Rate
There’s one more things I wanted to mention, called the Learning Rate. The unsung hero of gradient descent!
The Learning Rate controls how quickly or slowly a model adjusts it parameters based on the loss. It’s absolutely key; getting the learning rate wrong can make the difference between a model that learns well and one that never even figures it out.
In the image above of Gradient Descent, notice those little purple circles that represent the learning steps? You’ll notice that the step size gets smaller and smaller, until it reaches the minimum.
That’s the learning rate controlling how much the model is adjusting its parameters.
Think of Gradient Descent as telling the model: “Move this way to reduce errors!”
Then, think of the Learning Rate telling gradient descent: “This is how far you should move!”

Conclusion
Try to think of machine learning models as eager, curious, bright students trying to get an A+ on their text.
Sure they mess up at times (that’s Ms. Trunchbu—ahem the Loss Function saying “wrong!”). But then they take the feedback, and adjust their answers so they can improve bit by bit (gradient descent).
The learning rate? That’s how bold (or cautious) they are with their answers. Too timid with their answers means their learning rate could be too slow. Too big, and they might overstep their answer.
But get it just right? A beautiful, wholesome sign of real progress.
Feel free to reply to this email directly, and tell me how your “learning rate” is going with these emails!
In the next email, we will discuss some key metrics for measuring how well (or how poor) our models are doing.
Until then, may your gradients always descend.
Need to review some topics? Check out these posts!
What Is Machine Learning, Actually? - Turning “WTF is ML” Into “Oh, I get it.”
Key ML Concepts You Need To Know - Without knowing these, your ML model might just rage-quit.

