
Ever had a manager who gave you a bunch of tasks, without much guidance or direction? As if they expect you to take the “figure it out on your own” approach?

A boss who clearly prefers Unsupervised Learning to Supervised Learning.
Well, you’re about to meet the machine learning equivalent of that approach.
If you’ve read my free e-book ML Simplified (check my first ever email to you, if you haven’t), or this post, you should have an idea of what Unsupervised Learning is.
Unsupervised Learning is more detective-like compared to Supervised Learning.
Let’s investigate further. 🔍

📗 What Is Unsupervised Learning?
If you remember going through Supervised Learning, you’ll remember it’s kind of like training a puppy — you show examples, reward good behavior, and correct mistakes.
Unsupervised Learning skips all of that.
No labels. No “this is good” or “this is bad”.
Just a mountain of raw data.

Instead, Unsupervised Learning wanders through it and says, “I think I see some patterns. Let’s organize this mess.”
Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data.
The core idea of unsupervised learning is that it discovers patterns, structures, or relationships hiding in a messy pile of information. You’re not telling it the answer…you’re asking it, “What do you see?”
Take this example:

Here, our “messy pile of information” is a jumbled mess of green squares, blue triangles, and red circles.
Unsupervised Learning will look at this and attempt to order it by grouping elements based on what it finds.
For instance, it will see a red circle and think, “Hey, this thing is round and red, just like the other thing here”, and group them, isolating it from the other shapes and colors.

📚 The Core Components
When you strip away the math, the intimidating graphs, and the existential dread of realizing your dataset has 400 columns… unsupervised learning really comes down to a few core components.
Unlabeled Data
Definition: This is data that hasn’t been categorized, tagged, or annotated by humans. Unsupervised Learning only works with unlabeled data.
The algorithm’s job is to look at all this raw data and figure out the structure within it.
Similarity
Definition: This is essentially digital matchmaking. The algorithm’s way of swiping left or right.
Most unsupervised learning algorithms rely on some way of comparing data points. Questions such as:
How close are these 2 points?
Do they move together?
Do they behave similarly?
Common tools:
Distance metrics (Euclidean, Manhattan, Cosine)
Similarity Score
Correlation
Definition: This is the moment when unsupervised learning discovers some structure in the data:
Groups, Patterns, Sub-Patterns, Trends, Outliers
Unsupervised learning algorithms are basically detectives looking for clues.
Clustering or Reduction
Definition: Strategies/algorithms used to turn hidden structures in data into something meaningful. These are specific mechanisms used to organize the data.
Clustering Logic
Algorithms such as K-Means pull points toward centroids like a determined team leader trying to organize group projects.
DBSCAN builds dense clusters and boots out the outliers.
Dimensionality Reduction
PCA composes features to preserve the “big picture”.
t-SNE rearranges data into a 2D landscape where neighbors stay neighbors.

🔑 Why Bother With Unsupervised Learning
Labels Are Expensive or Impossible
Good luck labeling 50 million photos by hand. Not even interns want that job.Explore Before You Commit
Helps you understand the shape of data before jumping into supervised models.Hidden Patterns Lead to Insights
Maybe your “one type” of customer is actually five distinct subgroups — each needing different marketing.
In short, it helps transform an overwhelming amount of information into meaningful structures — turning confusion into curiosity-driven discovery.

⭐ Summary
There you have it — Unsupervised Learning in a nutshell, the curious detective of ML!
But knowing what Unsupervised Learning is is not the end — it’s the beginning. The real stars of the show are the algorithms that are used.
Which I will introduce you to a fun new series just around the corner.
Until then, I’ll catch you at our next data cycle.

