What is unsupervised learning?
Unsupervised learning is a Machine Learning approach which trains Artificial Intelligence (AI) models to learn from unlabelled data. It aims to find solutions through hidden patterns, structures or relationships without prior guidance, through methods like clustering, dimensionality reduction and association rule learning.
What are unsupervised learning methods?
Here are some common unsupervised learning methods to interpret data that hasn't been labelled…
Clustering
It's one of the most common unsupervised learning methods, aiming to group similar data points together based on their features. For example, K-Means Clustering, divides the dataset into 'k' distinct clusters by minimising variations within the cluster.
Secondly, Hierarchical Clustering builds a hierarchy of clusters either by merging the most similar clusters or by splitting them.
Dimensionality reduction techniques
Dimensionality reduction techniques are methods used to simplify data by reducing the number of features or variables while keeping the important information.
For example, in a dataset of customer preferences with many details like age, income and purchase history, using a technique like Principal Component Analysis (PCA), a dimensionality reduction technique can be helpful.
Here, you can combine these details into fewer, more manageable groups that still capture the essence of customer preferences, making the data easier to understand and work with.
Association rule learning
This method is used to discover interesting relationships or associations between variables in large databases.
The Apriori algorithm, a type of association rule learning, identifies frequent item sets in the data and generates association rules based on these item sets. For example, it might reveal that customers who buy product A also tend to buy product B.
Examples of unsupervised learning
Here are examples of unsupervised learning techniques…
Customer segmentation
A marketer with a large dataset of customer purchase history without any predefined customer segments can use clustering algorithms like K-Means to group customers based on their purchasing behaviour.
This might find segments like "high spenders," "frequent buyers," or "window shoppers” tailoring marketing strategies for each group, improving the effectiveness of campaigns.
Playlist generation in music apps
A music streaming app often suggests playlists or songs based on your listening history. This is an example of unsupervised learning, where the algorithm identifies patterns in your listening behaviour to recommend similar tracks.
Image compression in photography
A smartphone picture is often compressed to save storage space and bandwidth. Unsupervised learning techniques like PCA are used to reduce the dimensionality of the image data while retaining most of the important information.
Recommendations in online shopping
When you browse products on an ecommerce website, you often see recommendations for other items you might like. These recommendations are generated using unsupervised learning algorithms that analyse your browsing and purchase history to identify patterns and suggest similar products.
How does unsupervised learning work?
Here's a simplified breakdown of how unsupervised learning works…
Data preparation
The first step is to collect and preprocess the data. This might involve cleaning the data, handling missing values, scaling or normalising features.
Model selection
Secondly, choose an appropriate unsupervised learning algorithm based on the problem.
Training
Further, the model is trained on the data. During this phase, the algorithm tries to identify patterns, group similar data points together or reduce the dimensionality of the data, depending on the specific methods used.
Evaluation
Since there are no labels to compare, the model's output to, evaluation can be more subjective. Common methods include visualising the results, for example, using a scatter plot for clustering or explained variance for PCA.
Interpretation
Finally, the results are interpreted. This might involve assigning meaning to the clusters, understanding the principal components or identifying anomalies in the data.
How is unsupervised learning different from supervised learning?
Unsupervised learning and supervised learning are 2 fundamental approaches in Machine Learning, each serving distinct purposes and utilising different data types. The primary difference lies in the presence of labelled data.
Supervised learning relies on labelled datasets, where each training set is paired with a known output label. This allows the model to learn a mapping function from inputs to outputs, much like having a teacher guide the learning process.
In contrast, Unsupervised Learning operates on unlabelled data, requiring the model to discover hidden patterns, structures or relationships without predefined output labels.
Additionally, evaluation in Supervised Learning is straightforward, using metrics like accuracy and precision, while Unsupervised Learning often relies on more subjective methods, such as visualisation and domain expertise.
Each approach has its own set of algorithms tailored to specific use cases, with Supervised Learning excelling in tasks like classification and regression and Unsupervised Learning being essential for exploratory data analysis and pattern discovery.
Get a free app prototype now!
Bring your software to life in under 10 mins. Zero commitments.