What is Supervised learning?
Supervised learning is a type of Machine Learning where an algorithm learns from training datasets to perform tasks like classifications, predictions or decisions. It involves different input and output pairs, with the goal of matching inputs to the correct outputs.
Example of supervised learning?
Imagine teaching a computer to recognise handwritten digits. You provide it with many examples of digits (0–9), each labelled with the correct number. The computer learns from these labelled examples to predict the digit in new, unseen handwriting. That's supervised learning in action.
How does supervised learning work?
Here's a simple breakdown of how it works:
Labelled training data
It begins with a dataset that includes both the input data and the corresponding correct outputs (labels). For example, if a model is being taught to recognise cats, your dataset would include images of cats labelled as "cat."
Feature extraction
The algorithm identifies relevant features in the input data. For example, in the case of images, these features might include edges, textures and colours.
Model training
The algorithm uses the labelled data to learn the relationship between the input features and the output labels. It adjusts its internal parameters to minimise the difference between its predictions and the actual labels.
Prediction
Post-training, the model can make predictions on new, unseen data through patterns it learned during training to predict the output for new inputs.
Evaluation
The model's performance is evaluated using a separate test dataset to ensure it generalises well to new data. So, supervised learning is like teaching a student with a set of examples and their correct answers, so the student can learn to solve similar problems on their own.
What are the types of supervised learning?
Supervised learning can be categorised into several types👇
Binary classification
Here, the output variable has 2 possible classes or labels in yes/no, true/false, spam/not spam. The goal is to predict which class a given input belongs to.
Ordinal classification
The output variable has ordered categories, where the task is to predict the correct category while considering the order of the classes. For example, rating customer satisfaction on a scale of 1 to 5. Here, the ratings have a clear order from least satisfied (1) to most satisfied (5).
Multiclass classification
The output variable has several un-ordered categories. For example, classifying types of fruits like apple, banana, orange. Each fruit type is distinct and there's no inherent order.
Regression
The output variable is a continuous value rather than a discrete label. The goal is to predict a numerical value based on the input features. For example, it’s often used in predicting a property's price.
Multi-label classification
Each input can belong to multiple classes simultaneously like an image can contain both a cat and a dog. The goal is to predict all relevant labels for a given input.
What are the advantages of supervised learning?
Supervised learning offers several advantages that make it a popular choice.
Clear objective
Supervised learning has a well-defined objective, which is to learn the mapping from inputs to outputs based on labelled data. This clarity makes it easier to design and evaluate models.
High accuracy
With sufficient and quality labelled data, supervised learning algorithms can achieve high accuracy, enabling it to make precise predictions.
Ease of evaluation
The performance of supervised learning models can be easily evaluated using various metrics such as accuracy, precision, recall and F1 score. This allows for straightforward comparison and improvement of different models.
Wide Applicability
Supervised learning is applicable to a wide range of problems, including classification (e.g. spam detection and image recognition) and regression (e.g. predicting house prices and stock prices).
Supervised vs. Unsupervised vs. Semi-supervised learning
Supervised learning differs from unsupervised and semi-supervised learning primarily in the type of data used for training and the learning objective.
In supervised learning, the model is trained on labelled data. Each example includes both the inputs and the correct outputs. This allows the model to learn a direct mapping from inputs to outputs, making it well-suited for tasks like classification and regression.
In contrast, Unsupervised learning uses unlabelled data, aiming to discover hidden patterns or structures, such as grouping similar data points in clustering tasks. It doesn't rely on predefined outputs, making it ideal for exploratory data analysis.
Semi-supervised learning bridges these 2 approaches by utilising a small amount of labelled data along with a large amount of unlabelled data. By leveraging both labelled and unlabelled data, semi-supervised learning can improve model performance. This is particularly useful when labelled data is scarce or expensive to obtain.
Get a free app prototype now!
Bring your software to life in under 10 mins. Zero commitments.