Supervised learning is one of the important models of learning involved in training machines using historical data. That means the learning is monitored because the output is already known, and the algorithm is tuned every time to improve the outcomes. Depending on the problem and output class, the specific algorithm is chosen to train our model because different algorithms perform well on different datasets. This article will give an overview of the Supervised Machine Learning classification and regression algorithms such as KNN Classification, Naive Bayes, Decision Tree, SVM, Logistic Regression, Linear and Polynomial Regressions.

## Table of contents

As we already discussed in the Quick Introduction to Machine Learning and Different types of Machine Learning articles, Supervised machine learning needs historical labeled data to train the model by applying a specific algorithm depending on the type of the dataset.

Basically, there are two types of problems that exist that Supervised Machine Learning help to solve: **classification** and **regression**. You’re choosing a specific algorithm depending on the type of problem you’re solving.

## Classification Algorithms in Supervised Machine learning

Classification is the categorization of objects by classes based on features (data points). Classification can be divided into two main categories – binary and multi-class classifications. Classes are sometimes called targets, labels, or categories.

**Binary Classification: **When the output category has only two classes, it is called binary classification. For example, spam detection in email service providers can be identified as a binary classification as only two classes are available (spam and not spam). The following dataset is an example of binary classification as there are only two possible outcomes (fail/pass column).

**Multi-class classification: **When the output category has more than two classes, it is called multi-class classification. For example, we can classify a movie as a comedy, romantic, or action using a multi-class classification. The movie genre in this example will have three possible output classes. The following dataset represents an example of multi-class classification with three possible outcomes.

### KNN Classification Algorithm

The KNN (K-nearest neighbor) algorithm assumes that items that are similar to each other are close together. KNN aims to predict the right class for the test data by computing the distance between the test data and all training points. The KNN algorithm calculates the likelihood of test data belonging to each of the ‘K’ training data classes, then chooses the class with the highest probability. Then it selects the K number of points closest to the test data.

KNN is a **lazy learner **algorithm because it memorizes the training dataset rather than learning a discriminative function. In KNN, there is no training time. The prediction/classification step is computationally costly because when we want to make a prediction, KNN calculates distances throughout the entire dataset to find the nearest neighbors.

In the KNN algorithm, the nearest neighbors are data points with the shortest distance in feature space from the new data point. The K represents the number of data points we consider in our algorithm implementation. As a result, the KNN algorithm has two key factors: the distance metric and the K value. Euclidean distance is the most popular distance metric used in the KNN algorithm.

### Naive Bayes Algorithm

It’s a classification technique based on Bayes’ Theorem and the predictor independence assumption. In simple terms, a Naive Bayes classifier assumes that one feature in a class does not affect the presence of any other feature. This approach is typically used in classification problems where the **classes/features are unrelated or independent**.

For example, if the fruit is red, round, and roughly 3 inches in diameter, it may be considered an apple. Even if these characteristics are reliant on one another or the presence of other characteristics, they all independently contribute to the probability that this fruit is an apple, which is why it is called Naive.

The Naive Bayes model is simple to construct and is especially good for huge data sets. Naive Bayes is renowned for outperforming even the most advanced classification systems due to its simplicity.

The following is the formula used in the Naive Bayes algorithm to predict the output class.

`P(c|x)`

is the posterior probability of class (c, target) given predictor (x, attributes)`P(c)`

is the prior probability of class`P(x|c)`

– the likelihood which is the probability of predictor given class`P(x)`

– the prior probability of predictor

### Decision Tree Algorithm

The most powerful and widely used tool for categorization and prediction is the decision tree. A decision tree is a flowchart-like tree structure in which each internal node represents an attribute test. Each branch reflects the test’s outcome, and each leaf node (terminal node) stores a class label.

Using a Decision Tree algorithm aims to create a training model that can predict the class or value of the target variable by learning simple decision rules inferred from the historical data.

The following is the simple form of the decision tree.

**Root Nodes:**It is the node present at the beginning of a decision tree from this node the population starts dividing according to various features.**Decision Nodes:**The nodes we get after splitting the root nodes are called Decision Node**Terminal Nodes:**The nodes where further splitting is not possible are called leaf nodes or terminal nodes**Sub-tree:**Just like a small portion of a graph is called a sub-graph similarly a sub-section of this decision tree is called a sub-tree.

### Support Vector Machine (SVM)

The Support Vector Machine (SVM) is a common Supervised Learning technique for solving classification and regression problems. However, it is mostly used in Machine Learning to solve classification problems. The goal of the SVM algorithm is to find the best line or decision boundary for categorizing n-dimensional space into classes so that machine can easily place subsequent data points in the right category.

Hyperplanes are decision boundaries that help classify the data points. This line is called a **hyperplane** in N-dimensional space (N — the number of features), and this line distinguishes between data points. Data points falling on either side of the hyperplane can be attributed to different classes.

SVM classification is used in face detection. It classifies parts of the image as a face and non-face and creates a square boundary around the face. SVMs allow text and hypertext categorization for both inductive and transductive models. They use training data to classify documents into different categories. Moreover, the SVM algorithm identifies the classification of genes, patients based on genes, and other biological problems.

### Logistic Regression Algorithm

Modeling the probability of a discrete result given an input variable is known as logistic regression. The most frequent logistic regression models have a binary outcome, which might be true or false, yes or no, and so on. For example, we may want to know the probability of the visitor choosing an offer on your website. The Logistic regression models can help us determine the visitor’s probability of accepting the offer based on the visitor’s features (age, sex, geolocation, interests, etc.). As a result, we can make better decisions about promoting our offer or about the offer itself.

The logistic regression algorithm uses a sigmoid function, which is a mathematical function having a characteristic “S”-shaped curve or sigmoid curve as shown below;

### Neural Network

A Neural Network or Artificial Neural Network is a set of algorithms that attempts to recognize underlying relationships in a data set using a similar method to how the human brain works. Neural networks can adapt to changing input and produce the best possible outcome without redesigning the output criteria.

The structure of the Neural network is a little bit complicated. An input layer, one or more hidden layers, and an output layer made from nodes form a neural network. Each node, or artificial neuron, is connected to the others and has a weight and threshold associated with it. If the node’s output exceeds a certain threshold value, the node is activated, and data is sent to the next layer of the network. Otherwise, no data is sent to the next layer through this node.

No activation function is applied on the input layer, but depending on the type of data and problem, different activation functions are applied on the hidden and output layers. For example, the sigmoid function is applied on the output layer in the case of a binary classification problem, and the softmax function is applied if it is a multi-class classification problem.

## Regression algorithms in Supervised Machine Learning

Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. It predicts continuous/real values such as temperature, age, salary, price, etc. The following dataset represents a regression problem if we’d like to predict the height based on the age input.

Notice that the output class contains continuous values, not categorical or discrete.

### Linear Regression Algorithm

Linear regression is used for finding the linear relationship between the target and one or more predictors. We should first evaluate whether or not there is a link between the variables of interest before attempting to fit a linear model to observed data. A scatterplot can be useful when determining the strength of a relationship between two variables. If the postulated explanatory and dependent variables appear to have no relationship (i.e., the scatterplot shows no increasing or decreasing trends), fitting a linear regression model to the data is unlikely to produce a useful model.

Based on the number of input or independent variables, linear regression can be divided into two sub-categories.

**Simple Linear Regression:**Simple linear regression is useful for finding relationships between two continuous variables. For every independent variable, there is a corresponding dependent variable. For example, a simple linear regression problem is the relationship between age and height.**Multiple Linear Regression:**Multiple linear regression is useful for finding relationships between more than two continuous variables. For every dependent variable, there are multiple corresponding independent variables. For example, temperature prediction has multiple input variables like wind, humidity, sunny, cloudy, etc.

### Polynomial Regression Algorithm

Polynomial regression is a regression approach that uses an N-th degree polynomial to represent the relationship between dependent and independent variables. It is a special case of Linear Regression in which the data is fitted with a polynomial equation having a curvilinear relationship between the dependent and independent variables.

## Summary

Supervised Learning algorithms form a class of Machine Learning that uses labeled datasets to train the ML model. Based on the type of problem, Supervised learning algorithms are divided into classification and regression algorithms. This article gave an overview of the Supervised Machine Learning classification and regression algorithms such as KNN Classification, Naive Bayes, Decision Tree, SVM, Logistic Regression, Linear, and Polynomial Regressions.