Confusion Matrix or its two types of error(Task 5)

SOURAV
2 min readJun 2, 2021

Confusion matrix is a famous question in many data science interviews.

The concept behind the confusion matrix is very simple, but its related terminology can be a little confusing. In this article, try to explain the confusion matrix in simpler terms.

What’s happening in our day to day modelling?

1) We are getting a business problem 2) Gathering data 3) Cleaning the data 4) Building all kinds of outstanding models, right? Then, we are getting output in probabilities. Wait Wait Wait! How can we say it’s an outstanding model? One way we can say this is by measuring the effectiveness of the model. Better the effectiveness, better the performance of the model. This is where the term Confusion matrix comes into the picture.

A confusion matrix is a performance measurement technique for Machine learning classification problems. It’s a simple table which helps us to know the performance of the classification model on test data for the true values are known.

Consider we are doing telecom churn modelling. Our target variable is churn (binary classifier). There are two possible predicted classes: ‘yes’ and ‘no’. ‘Yes’ means churn (leaving the network) and ‘No’ means not churn (not leaving the network). Below is our confusion matrix table

· The classifier made a total of 200 predictions (200 customers’ records were analyzed ).Out of 200 customers, the classifier predicted ‘yes’ 160 times, and ‘no’ 40 times.

· In reality, 155 customers are churn, and 45 customers are not churn .

Let’s see the important terms associated with this confusion matrix with the above example

True Positives (TP): These are the people in which we predicted yes (churn), and they are not leaving the network (not churn)

True Negatives (TN): We predicted no, and they are not leaving the network.

False Positives (FP): We predicted yes, but they are not leaving the network (not churn). It is also known as a “Type 1 error”

False Negatives (FN): We predicted no, but they are actually leaving the network (churn). It is also known as a “Type 2 error”

Just incorporated into our confusion table and added both row and columns

Below terms are computed from the confusion matrix for a binary problem.

Accuracy: How often is the classifier correct?

Accuracy = (TP +TN)/total

Misclassification Rate: Overall, how often is it wrong? It is also called “Error rate”

Misclassification rate = (FP+FN)/total

True Positive Rate (TPR): When it’s actually yes, how often does it predict yes?. It is also known as “Sensitivity” or “Recall”

TPR or Recall = TP/actual yes

False Positive Rate (FPR): When it’s actually no, how often does it predict yes?

FPR = FP/actual no

True Negative Rate (TNR): When it’s actually no, how often does it predict no?. It is also known as “Specificity”

TNR = TN/actual no

THATS ALL

#worldrecordholder #training #internship #makingindiafutureready #summer #summertraining #python #machinelearning #docker #rightmentor #deepknowledge #linuxworld #vimaldaga #righteducation

--

--