Published on

AI PROJECT-1.8 How to Evaluate the Model?



STEP-1: Evaluate the Model Deployed to SageMaker Hosting Services

  • To Evaluate and use model into production environment, we have to invoke endpoint with the test dataset and check whether the inferences you get returns a target accuracy we want to achieve.
  • First, we split the large dataset into chunks to avoid exceeding payload size limits, making predictions for each chunk, and then concatenating the results.
    • for array in split_array:: Iterates over the chunks of input data.
import numpy as np
def predict(data, rows=1000):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])
    return np.fromstring(predictions[1:], sep=',')    


  • rows argument = Specify the number of lines to predict at a time.
  • split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1)): Splits input data (data) into chunks of size specified by rows parameter. This is useful when making predictions on large datasets to avoid exceeding payload size limits.
  • for array in split_array:: Iterates over the chunks of input data.
  • xgb_predictor.predict(array): Calls the predict method on the xgb_predictor object (the deployed model endpoint) to obtain predictions for the current chunk (array).
  • .decode('utf-8'): Decodes the binary response from the prediction into a UTF-8 string.
  • ','.join([predictions, ...]): Concatenates the predictions from each chunk with a comma separator.
  • np.fromstring(predictions[1:], sep=','): Converts the concatenated predictions string (excluding the initial comma) into a NumPy array using a comma as the separator.

STEP 2: Create histogram of Predictions

  • Next, we will use matplotlib to create and display a histogram of the predictions obtained from the predict function.
import matplotlib.pyplot as plt



  • import matplotlib.pyplot as plt: Imports the matplotlib.pyplot module and aliases it as plt. This is a common convention to make plotting commands more concise.
  • predictions = predict(test.to_numpy()[:, 1:]): Calls the predict function on the testing data (test.to_numpy()[:, 1:]), obtaining an array of predictions.
  • plt.hist(predictions): Creates a histogram of the predictions using the hist function from matplotlib.pyplot. The histogram represents the distribution of predicted values.
  • Displays the generated histogram.

STEP 3: Get Confusion Matrix & Classification Report

Confusion matrix and classification report are two common tools for evaluating the performance of a classification model.

What is Confusion Matrix?

Confusion matrix: Provides a tabular summary of the model's performance, showing the counts of true positive, true negative, false positive, and false negative predictions.

Predicted PositivePredicted Negative
Actual PositiveTPFN
Actual NegativeFPTN

The confusion matrix consists of four main metrics:

  • True Positive (TP): Instances correctly predicted as positive.
  • True Negative (TN): Instances correctly predicted as negative.
  • False Positive (FP): Instances incorrectly predicted as positive (Type I error).
  • False Negative (FN): Instances incorrectly predicted as negative (Type II error).

From the confusion matrix, various performance metrics can be calculated, including:

  • Accuracy: (TP + TN) / (TP + TN + FP + FN)
  • Precision (Positive Predictive Value): TP / (TP + FP)
  • Recall (Sensitivity, True Positive Rate): TP / (TP + FN)
  • Specificity (True Negative Rate): TN / (TN + FP)
  • F1-Score: 2 * (Precision * Recall) / (Precision + Recall)

What is Classification Report?

The classification report provides additional metrics that give a more detailed view of the model's performance, including precision, recall, and F1-score.

Adjusting the cutoff threshold can impact the balance between precision and recall, and it's common to choose a threshold based on the specific requirements of the application or the desired trade-off between false positives and false negatives.

A classification report provides a more comprehensive summary of the model's performance. It includes precision, recall, F1-score, and support for each class in a multiclass problem. The key metrics in a classification report are:

  • Precision: The ratio of correctly predicted positive observations to the total predicted positives (TP / (TP + FP)).
  • Recall: The ratio of correctly predicted positive observations to the total actual positives (TP / (TP + FN)).
  • F1-Score: The harmonic mean of precision and recall, balancing both metrics.
  • Support: The number of actual occurrences of the class in the specified dataset.
import sklearn

print(sklearn.metrics.confusion_matrix(test.iloc[:, 0], np.where(predictions > cutoff, 1, 0)))
print(sklearn.metrics.classification_report(test.iloc[:, 0], np.where(predictions > cutoff, 1, 0)))


  • cutoff = 0.5: Sets a threshold value for converting probability predictions into binary predictions. Predictions greater than the cutoff are classified as 1, and those less than or equal to the cutoff are classified as 0.
  • np.where(predictions > cutoff, 1, 0): Applies the cutoff threshold to convert probability predictions (predictions) into binary predictions (0 or 1).
  • sklearn.metrics.confusion_matrix(test.iloc[:, 0], binary_predictions): Computes the confusion matrix using the true labels (test.iloc[:, 0]) and the binary predictions.
  • sklearn.metrics.classification_report(test.iloc[:, 0], binary_predictions): Generates a classification report, including metrics such as precision, recall, F1-score, and support.
  • print(confusion_matrix_result): Prints the confusion matrix.
  • print(classification_report_result): Prints the classification report.

STEP 4: Create a Plot

  • Now, we will create a plot to visualize the relationship between different cutoff values and the log loss metric.
  • Log loss is a common metric for evaluating the performance of probabilistic classifiers.
import matplotlib.pyplot as plt

cutoffs = np.arange(0.01, 1, 0.01)
log_loss = []
for c in cutoffs:
        sklearn.metrics.log_loss(test.iloc[:, 0], np.where(predictions > c, 1, 0))

plt.plot(cutoffs, log_loss)
plt.ylabel("Log loss")


  • cutoffs = np.arange(0.01, 1, 0.01): Generate an array of cutoff values ranging from 0.01 to 1 with a step size of 0.01.
  • log_loss = [] : Initializing an empty list to store log loss values.
  • for c in cutoffs : The loop iterates through each cutoff value.
  • sklearn.metrics.log_loss(test.iloc[:, 0], np.where(predictions > c, 1, 0)): Computes the log loss for binary predictions based on the current cutoff value.
  • plt.figure(figsize=(15, 10)): Sets the figure size for the plot.
  • plt.plot(cutoffs, log_loss): Plots log loss values against the corresponding cutoff values.
  • plt.xlabel("Cutoff"): Adds a label to the x-axis indicating the cutoff values.
  • plt.ylabel("Log loss"): Adds a label to the y-axis indicating the log loss values.
  • Displays the plot.

STEP 5: Find the minimum points of the error curve

    'Log loss is minimized at a cutoff of ', cutoffs[np.argmin(log_loss)], 
    ', and the log loss value at the minimum is ', np.min(log_loss)
  • This should return: Log loss is minimized at a cutoff of 0.53, and the log loss value at the minimum is 4.348539186773897.


  • We have fully completed our AI Project, where we have load the dataset, train the model, deploy and evaluate the model.