How “Confusion Matrix” and “ROC” in Salesforce Tableau help analyze predictions

Sanjeev Mehta
8 min readMay 12, 2021
Photo by Clay Banks on Unsplash

When we say that a particular test of certain new disease is showing 95% accuracy are we saying is it accurate in identifying 95% of the infected people as infected? Or is it saying 95% of the healthy people were identified as healthy? Or is it both? How precise that number is? Can we interpret the data ourselves or is there a tool which can help?

Statistics can be fun as well as confusing at the same time because it opens many doors of interpretation. It allows to understand data and get its meaning. However it has to be put in the right context to make a meaningful observation otherwise the same data can mean different in different scenario.

Salesforce Tableau is a powerful analytics platform from Salesforce.com which provides the power of data insight and predictions, using machine learning, in an intuitive and easy to use interface. It lets “people” vision into data and allows them to make informed decision using various point and click tool which otherwise would need a highly analytical and technical support.

In this article we will see how to interpret the predictive model that Salesforce Tableau suggests and what are the various components in the “Threshold Evaluation” Model Metrics page. To help understand we will use sample test data of certain disease and see what we can infer out of it.

In a typical process data will be collected and balanced out before it is even ready for analysis. It is very important to have data which covers a wide range of possibility and is not biased. Once such data is obtained it can be fed to Salesforce Tableau and processed using a story following which Salesforce Tableau will try to create Insights and Predictions.

Depending upon the dataset and objective a classifier or regression algorithm is applied, model is selected, trained and optimized to generate predictions. Once a model has been trained it is ready to predict the outcome based on input attributes that were used to train the model.

Finally, to describe how a classification model is performing we will take a look at the confusion matrix, threshold and ROC curve, available in the Threshold Evaluation page in Salesforce Tableau.

In this example we are trying to classify the outcome as person being infected or not. A sample set of data was created for this purpose. The following screenshot shows the Threshold Evaluation page that was generated in Salesforce Tableau with the test data and becomes our basis of exploring the various metrics.

Threshold Evaluation screen from Salesforce Tableau

Confusion Matrix

One of the most widely used way to evaluate performance of a classifier is the Confusion Matrix.

Confusion Matrix typically presents a table showing how many times something belonging to category A has been predicted as category A or some other category and thus is a good way to represent to what extent the confusion is. We will see that Confusion Matrix opens up a lot of additional metrics that need to be looked at before making any decision on the model performance and required threshold (cut-off).

A typical confusion matrix looks like the table below where the top section is the actual positive and negative cases. The left section depicts the predicted positive and negative cases (others prefer it where Actual and Predicted are swapped).

Confusion Matrix

In our example, Positive is equivalent to being infected whereas, Negative is equivalent to being NOT infected (or healthy).

Terminologies and Metrics:

TP (True Positive): Correctly predicted that the person is infected. We want to see more such predictions in this scenario (maximize).

TN (True Negative): Correctly predicted that the person is NOT infected (healthy).

FP (False Positive): Wrongly predicted that the person is infected although the person is healthy.

FN (False Negative): Wrongly predicted that the person is NOT infected (healthy) although the person is infected. We don’t want to see any such predictions in this scenario (minimize).

What we need is a balance between these metrics by setting appropriate threshold so that the model performs well and we get a good set of predictions.

Wait! Do we need to be accurate in our predictions or do we want to be more precise when predicting an infected person? The following metric definitions will help clear the air.

Confusion Matrix and Key Metrics section from Salesforce Tableau

Accuracy: Accuracy measures the fraction of total outcomes that the model was able to predict correctly (positive as well as negative). It determines how often is our prediction correct.

Formula: (TP + TN) / (TP+FP+FN+TN)

Accuracy = (117 + 132) / (117 + 202 + 132 + 49) = 0.498 (or 49.8%)

True Positive Rate (TPR) OR Recall: True Positive Rate (TPR) measures the fraction of all positives that the model was able to predict correctly as positives. It determines out of all infected people how many did the model predicted correctly as infected.

Formula: TP / (TP + FN)

TPR = Recall = 117 / (117 + 49) = 0 .705 (or 70.5%)

This metric also indicates how informed is it about infected people.

True Negative Rate (TNR) OR Specificity: True Negative Rate measures the fraction of all negatives that the model was able to predict correctly as negatives. It determines out of all healthy people how many did the model predicted correctly as healthy (not infected).

Formula: TN / (TN + FP)

TNR = 132 / (202 + 132) = 39.52%

This metric also indicates how informed is it about healthy people.

False Positive Rate (FPR): False Positive Rate measures the fraction of all negatives that the model falsely predicted as positives. It determines out of all healthy people how many did the model predicted incorrectly as infected.

Formula: FP / (FP + TN) = 1 — TNR (Specificity)

FPR = 202 / (202 + 132) = 60.48%

This metric also indicates how misinformed is it about healthy people.

False Negative Rate (FNR): False Negative Rate measures the fraction of all positives that the model falsely predicted as negatives. It determines out of all infected people how many did the model predicted incorrectly as healthy (not infected).

Formula: FN / (TP + FN)

FNR = 49 / (117 + 49) = 29.52 %

This metric also indicates how misinformed is it about infected people.

Precision: Precision describes the accuracy of positive predictions and measures the fraction of all positive predictions that were found to be positive. It determines out of total predicted to be infected how many are actually infected.

Formula: TP / (TP + FP)

Precision = 117 / (117 + 202) = 0.367 (or 36.7%)

This metric also indicates how trustworthy is it in predicting a positive case.

Negative Predictive Value (NPV): Negative Predictive Value describes the fraction of negative predictions that were found to be negative. It determines out of total predicted to be healthy how many are actually healthy.

Formula: TN / (TN + FN)

NPV = 132 / (49 + 132) = 0.729 (or 72.9%)

What it tells: How trustworthy is it in predicting a negative case (healthy person)

F1 score: The F1 score depicts balance between precision and recall and, represents the harmonic average of precision and recall.

Formula: 2 * Precision * Recall / (Precision + Recall)

F1 Score = (2 * 0.367 * 0.705) / (0.367 + 0.705) = 0.482

Informedness: Informedness measures how informed is the model about positive and negative cases. Note that Recall tells how informed is the model about infected people (positive cases) whereas True Negative Rate (TNR) tells how informed is the model about healthy people (negative cases).

Formula: Recall + TNR — 1

Informedness = 0.705 + 0.3952 -1 = 0.1

Markedness: Markedness measures how trustworthy is the model about positive and negative cases. Note that Precision tells how trustworthy is it in predicting a positive case whereas Negative Predicted Value (NPV) tells how trustworthy is it in predicting a negative case.

Formula: Precision + NPV— 1

Markedness = 0.367 + 0.729–1 = 0.096

Matthews Correlation Coefficient (MCC): MCC measures the overall quality of the model as it includes proper mix of all the four parts of the confusion matrix.

Formula: (TP * TN — FP * FN) / SQRT((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))

MCC = (117 * 132–202 * 49) / SQRT((117+ 202) * (117+ 49) * (132 + 202) * (132+ 49)) = 0.098

Role of threshold

To understand threshold and how it affects the confusion matrix, let us assume that predictions are made by the model in terms of probability. Given a set of features (attributes with values) in a dataset the model will try to predict a probability value as an outcome. So instead of saying TRUE (1) or FALSE(0) it will provide a value between 0 to 1. To interpret the probability value as the final predicted value (1 or 0, TRUE or FALSE, Infected or Healthy) we need a threshold parameter. If we say threshold is 0.5, any value above it will have predicted output as 1 and the rest as 0.

Threshold value thus directly affects the confusion matrix in terms of how many positive cases are predicted and how many negative cases are predicted. Manipulating the threshold changes the confusion matrix and it affects the accuracy, precision and other metrics.

Receiver Operating Characteristic (ROC)

The Receiver Operating Characteristic (ROC) curve plots the TPR against FPR obtained at various threshold levels. ROC is thus a probability curve and can be used to compare different classifiers by measuring the area under the curve (AUC).

A perfect classifier will have AUC = 1 whereas an AUC of 0.5 means it is a random classifier and the model is no better than random guessing. Although an AUC of 1.0 means that it is a perfect classifier, it is ideal enough to have a suspicious view point at the data and the process.

A trade off needs to be made between TPR and FPR since we need to maintain a high TPR keeping FPR needs low. Looking at the ROC curve we can again determine the threshold where the trade off is wisely met. Once the threshold is satisfactorily determined we can have confidence in the model to give us a good prediction.

Exciting enough!

Salesforce Tableau provides Threshold Controls which allows manipulating threshold and dynamically updating the metrics real time on screen. It provides us enough power at hand allowing us to don the analyst hat.

--

--

Sanjeev Mehta

20+ yrs of industry experience with love for technologies and Maths. Salesforce Certified System Architect, Technical Architect and Einstein Analytics