Hyperparameter Search and Classifier Threshold Selection

Hyperparameter search and classifier threshold selection
Machine Learning
Python
Published

April 28, 2024

The following notebook demonstrates how to use GridSearchCV to identify optimal hyperparameters for a given model and metric, and alternatives for selecting a classifier threshold in scikit-learn.

First we load the breast cancer dataset. We will forgo any pre-processing, but create separate train and validation sets:


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

np.set_printoptions(suppress=True, precision=8, linewidth=1000)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

data = load_breast_cancer()
X = data["data"]
y = data["target"]


# Create train, validation and test splits. 
Xtrain, Xvalid, ytrain, yvalid = train_test_split(X, y, test_size=.20, random_state=516)

print(f"Xtrain.shape: {Xtrain.shape}")
print(f"Xvalid.shape: {Xvalid.shape}")
Xtrain.shape: (455, 30)
Xvalid.shape: (114, 30)