In our last episode, we had our analytical detective, Decision Tree took center stage to wow us all.

Today however, we have an interesting star. This guy is all about awareness.

Ever meet someone in your neighborhood who seems to know what everyone else is up to?

Well, I’m introducing you to the Machine Learning variant of that nosy neighbor:

K-Nearest Neighbor!

K-Nearest Neighbor: The Ultimate Eavesdropper!

🕵️ K-Nearest Neighbor (KNN) - “The Neighborhood Watch”

Forget about complex algorithms for a minute.

Imagine you’re at a party, and you meet someone new. You want to figure out what kind of person they are (e.g. Are they a quiet introvert? Or a loud party animal? ).

How would you do it?

You’d look at the people around them! (Or maybe lack of). If they’re quietly hanging by themselves in the corner, yeah — probably an introvert.

If they’re loud and talkatively participating in a game of charades, probably a party animal.

This is how KNN works in machine learning.

KNN is a simple machine learning algorithm that guesses what something is by looking at the “K” most similar things nearby and choosing the majority.

🤖 How KNN Works

KNN is an algorithm used for both classification (predicting a category) and regression (predicting a number) .

The core idea revolves around three simple questions:

  1. “Who are your neighbors?”:

    1. When you have a new, unknown piece of data (like our new person at the party), KNN looks at the K data points that are "closest" to it in the training data.

  2. “What are your neighbors like?”:

    1. For Classification: If most of its K closest neighbors belong to a certain category (e.g., "introvert"), then the new data point is assigned that same category. It's a "majority vote."

    2. For Regression: If you're predicting a number, KNN takes the average (or median) of the values of its K closest neighbors.

  3. “How do we measure closest?”

    1. This is usually done using a distance metric. The most common one is Euclidean Distance (think of it as the straight-line distance between two points on a graph). But it could be others, depending on the data.

📚 What Is “K” In KNN?

Simple! In K-Nearest Neighbor, "k" tells the algorithm how many nearby data points to consider for making a prediction.

If k = 5, it looks at the 5 closest points and picks the most common class among them. This helps decide what the new data point should be classified as.

Image Credit: Amit Chauhan

✍️ How Does KNN Find It’s “Nearest” Neighbor?

KNN uses distance metrics, which basically measure how far apart data points are. The most common one is Euclidean Distance (say ‘hi’ to those straight lines in the image above).

Euclidean Distance

Recognize this guy? He’s kind of a big deal in math.

Euclid: The guy who made straight lines a big deal for 2000 years.

That’s Greek mathematician, Euclid. He did a whole lot more than just nail the look of a “mage” in an RPG video game.

He built an entire system of geometry based on just a few rules, which has remained a gold standard for over 2000 years.

Based on his intellectual groundwork, Euclidean Distance was formed. Euclidean Distance is the straight-line distance between 2 points.

Alright, heads up, here comes the formula for it:

Math’s way of saying: “How far is too far?”

See (x1, y1) and (x2, y2)? Those are literally just coordinates of two points. You simply follow these 4 steps:

  • Subtract the coordinates:
    Find how far apart the points are horizontally and vertically:
    (x2−x1) and (y2−y1)

  • Square the differences:
    This gets rid of negatives and gives you the squared "legs" of a right triangle.

  • Add the squares:
    Remember the Pythagorean Theorem? You’re applying it here:
    a2 + b2 = c2

  • Take the square root:
    This gives you the actual straight-line distance (the hypotenuse).

Let’s run through a quick example, yeah?

Let’s say our (x1, y1) and (x2, y2) is: (1, 2) and (4, 6).

Following the four steps above we get the following (take a moment to try figure this out on your own by the way):

Euclidean Distance = 5

Voila! Now there’s other distance metrics used in KNN, such as Manhattan Distance and Minkowski Distance (which I’ll show you in the future), but the most common one is Euclidean Distance.

So get comfortable with it!

Advantages & Disadvantages

KNN is great for working with small to medium size datasets, as well as being simple to understand and implement.

Those are some brownie points for sure. But there’s some slip-ups as well.

With large datasets or data with a lots of dimensions, thats when KNN starts to falter.

⭐ Conclusion

Just think of KNN as that one friend who always asks, “Who’s nearby? What are they doing?” before making a decision — no complicated training, just straight-up copying the crowd.

Now don’t go anywhere, because coming up next is a bit of anomaly: Naive Bayes — our most naive contestant yet still somehow seems to know it all.

See you on the next exciting episode of Supervised Learning’s Got Talent!

P.S., What did you think of our contestant KNN? Rank below:

Reply

or to participate