
Last time on Supervised Learning’s Got Talent, we were introduced to talented decision maker of the show, Logistic Regression.
Today, we’re going to branch out and meet our next Supervised Learning contestant. This contestant, like Logistic Regression, is also skilled at making decisions, but the difference?
They are a lot more detective-like in their process.
Introducing, the Decision Tree!

Decision Tree. They may not produce oxygen, but they sure make some definitive calls!

🌳 Decision Tree - “The Overthinking Oracle”
Remember how Logistic Regression is good at making decisions? Well Decision Trees do this too (I mean it’s right in their name right?), but with a key difference:
Decisions Trees asks a lot of questions before getting to a decision.
Imagine that you’re trying to decide whether to go out for ice cream. What are you going to do? You’ll probably ask a series of questions:
“Is the ice cream shop even open?” (Yes/No)
“Do I have money?” (Yes/No)
“Am I willing to deal with the guilt later?” (Yes/No)
Notice how each question is relevant to the decision? That’s exactly how a Decision Tree works!
It’s literally a flowchart of questions that leads to a decision or a prediction.
At their core, Decision Trees just break your data into smaller and smaller chunks by asking yes/no questions at every level—like a nosy neighbor who won't stop until they know everything about your weekend.

🤖 Components Of A Decision Tree
Root Node: The topmost node representing the most important feature that splits the data first.
Internal Nodes: These are points where the tree makes decisions based on certain features. Each node asks a yes/no question to help further divide the data! Kind of like mini crossroad signs within the tree.
Branches: These are the paths you follow after answering a question at a node. They lead you to the next node/outcome, like following a breadcrumb trail.
Leaf Nodes: These nodes provide the final decision/result. Think of them as the end of the trail where you get your answer or prediction, such as classifying data into categories or predicting a value.

Decision Tree: The ultimate family tree for data nerds.
Let’s say you’re trying to decide what show to watch on Netflix.
It all starts at the Root Node, with a big question like “Am I feeling adventurous?”
Based on your answer (that’s a branch!), you travel down to an Internal Node for another question, say “Is my partner awake?”.
You keep following branches through more internal questions until you land on a Leaf Node - that’s your definitive choice, such as “Rewatch Game Of Thrones!”.
Here’s yet another example of the Decision Tree in action:

If you’re intuitive, you may saying to yourself: Okay, yeah great Decision Trees asks lots of questions, but how do they know what’s the best question to ask next?
To which I say, I like the way you think. The answer to that is something called Information Gain and GINI Index.

📚 Information Gain
Information Gain is all about finding the question that makes your data less confusing. It’s based on the concept of Entropy: the measurement of how messy your data is. Here’s the formula:

Entropy: Math’s fancy way of saying “It’s a hot mess in here.”
pi
→ This is the proportion (percentage) of things in a group.
Example: If 7 out of 10 nuts are tasty, pi = 0.7log2(pi)
→ This asks: How surprising is it to find this thing?
The lower the probability, the more surprising it is.Multiply: pi × log2(pi)
→ This combines: how common something is × how surprising it is.Sum them all up (Σ)
→ Add this up for all categories (like tasty nuts and yucky nuts).Put a minus sign in front (–)
→ This flips the negative result to positive because entropy should always be positive.

✍️ GINI Index
One more thing I want to show you is something called the GINI Index. It measures how mixed or impure a dataset is.
While Information Gain asks “How much cleaner did I make this data?”, the GINI Index asks, “How clean is this pile?”
It measure impurity directly, by following this equation:

Believe it or not, this equation tells you how mixed up your life really is.
pi = the probability (or proportion) of class i in the dataset.
Sum ∑ = add this up for all classes.
pi2 = multiply each class’s probability by itself.
Subtract from 1 = this gives you the impurity.

Conclusion
And that’s a wrap for Decision Tree! With its clear, step by step, detective-like approach, Decision Tree wowed us by making complex decisions feel like a simple game of yes/no.
Don’t go anywhere yet — up next on the stage is K-Nearest Neighbor, the friendly algorithm who knows the value of your closest neighbors.

