Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes Theorem with the naive assumption of independence between every pair of features 1. The Naive Bayes classifier makes two strong assumptions:

  1. The value of a particular feature is independent of the value of any other feature, given the class variable.

  2. The set of features associated with an unclassified instance are assumed to follow a normal distribution.


To create a Gaussian Naive Bayes classifier (without scikit-learn):

  1. Ensure all explanatory variables are continuous: If the dataset contains categorical features, look into the Bernoulli or Multinomial form of Naive Bayes.

  2. For each explanatory variable, calculate the maximum likelihood estimate of the mean and variance for each class.

  3. To classify a new instance, calculate the posterior probability for each class. There will be as many posterior probabilities per unclassified instance as there are distinct classes.

  4. The new instance will be classified based on the class with the greatest posterior probability.

Example:

Consider a sample dataset representing business school admissions:

naive1



We have two additional instances that will be used to test the classifier:


naive2



For each feature, we calculate the mean and variance for admitted and not-admitted:


naive3



In the sample dataset, we have equiprobable priors (since \(P(admit) = P(!admit) = .5\)). However, the prior probabilities need not be derived from the dataset of interest. They can be based on external data sources (such as admissions from prior years).


Recall the general form of Bayes’ Theorem:

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$



The posterior probability for admitted is given by:

$$ P(admit|data) = \frac {P(admit)P(GPA|admit)P(GMAT|admit)}{P(data)}, $$



and for not-admitted:

$$ p(!admit|data) = \frac {P(!admit)P(GPA|!admit)P(GMAT|!admit)}{P(data)}, $$



 Where:

  • \(P(admit)/P(!admit)\) represents the prior probability, .50 in this example.

  • \(P(GPA|admit)P(GMAT|admit)\) represents the likelihood. We assume zero correlation between \(GPA\) and \(GMAT\) via the first assumption of Naive Bayes.

  • data is a stand-in for \(GMAT\) and \(GPA\) for a given instance.

The second assumption of Naive Bayes is that all explanatory variables follow a normal distribution. Thus, \(P(GMAT|admitted)\) is calculated by passing the observation’s \(GMAT\) score and \(GPA\) into the associated normal density function, parameterized by the corresponding estimates of mean and variance determined above.


For the admitted class:

$$ \begin{align*} P(GMAT|admit) &= \frac {1} {\sqrt{2 \pi \sigma^{2}_{GMAT|admit}}} exp\Big({-\frac {(GMAT - \mu_{GMAT|admit})^{2}}{2\sigma^{2}_{GMAT|admit}}}\Big)\\ \\ P(GPA|admit) &= \frac {1} {\sqrt{2 \pi \sigma^{2}_{GPA|admit}}} exp\Big({-\frac {(GPA - \mu_{GPA|admit})^{2}}{2\sigma^{2}_{GPA|admit}}}\Big) \end{align*} $$


For not-admitted:

$$ \begin{align*} P(GMAT|!admit) &= \frac {1} {\sqrt{2 \pi \sigma^{2}_{GMAT|!admit}}} exp\Big({-\frac {(GMAT - \mu_{GMAT|!admit})^{2}}{2\sigma^{2}_{GMAT|!admit}}}\Big) \\ \\ P(GPA|!admit) &= \frac {1} {\sqrt{2 \pi \sigma^{2}_{GPA|!admit}}} exp\Big({-\frac {(GPA - \mu_{GPA|!admit})^{2}}{2\sigma^{2}_{GPA|!admit}}}\Big) \end{align*} $$




Classifying Instances

Recall our test observations:


naive4



We calculate the admitted and not-admitted posterior for each instance: The observation will be classified as admitted/1 or not-admitted/0 based on the class with the greatest posterior probability.


For ID=000000009:


GMAT calculation for admitted:

$$ \begin{align*} p(GMAT|admit) &= \frac{1}{\sqrt{2 \pi \sigma^{2}_{GMAT|admit}}} exp\Big({-\frac{(GMAT - \mu_{GMAT|admit})^{2}}{2\sigma^{2}_{GMAT|admit}}}\Big) \\ \\ &= \frac{1}{\sqrt{2 \pi (3143)}} exp\Big({-\frac {(384 - 518.50)^{2}}{2(3143)}}\Big) \\ \\ &=\mathbf{.0004} \end{align*} $$

GPA calculation for admitted:

$$ \begin{align*} p(GPA|admit) &= \frac{1}{\sqrt{2 \pi \sigma^{2}_{GPA|admit}}} exp\Big({-\frac{(GPA - \mu_{GPA|admit})^{2}}{2\sigma^{2}_{GPA|admit}}}\Big) \\ \\ &= \frac{1}{\sqrt{2 \pi (0.0193)}} exp\Big({-\frac {(2.90 - 3.15)^{2}}{2(0.0193)}}\Big) \\ \\ &=\mathbf{0.568767} \end{align*} $$



GMAT calculation for not-admitted:

$$ \begin{align*} p(GMAT|!admit) &= \frac{1}{\sqrt{2 \pi \sigma^{2}_{GMAT|!admit}}} exp\Big({-\frac{(GMAT - \mu_{GMAT|!admit})^{2}}{2\sigma^{2}_{GMAT|!admit}}}\Big) \\ \\ &= \frac{1}{\sqrt{2 \pi (1128.25)}} exp\Big({-\frac {(384 - 409.25)^{2}}{2(1128.25)}}\Big) \\ \\ &=\mathbf{0.00895364} \end{align*} $$

GPA calculation for not-admitted:

$$ \begin{align*} p(GPA|!admit) &= \frac{1}{\sqrt{2 \pi \sigma^{2}_{GPA|!admit}}} exp\Big({-\frac{(GPA - \mu_{GPA|!admit})^{2}}{2\sigma^{2}_{GPA|!admit}}}\Big) \\ \\ &= \frac{1}{\sqrt{2 \pi (0.0474)}} exp\Big({-\frac {(2.90 - 2.610)^{2}}{2(0.0474)}}\Big) \\ \\ &=\mathbf{0.7546488} \end{align*} $$


Then, plugging values into the posterior expression, class probabilities for ID=000000009 are given by:


$$ \begin{align*} P(admit|data) &= \frac{P(admit)P(GPA|admit)P(GMAT|admit)}{P(data)} \\ \\ &= \frac {(.5)*(0.568767)*(.0004)}{(0.568767)*(.0004)*(.5) + (0.7546488)* (0.00895364)*(.5)} \\ \\ &= \mathbf{0.03266} \\ \\ P(!admit|data) &= \frac{P(!admit)P(GPA|!admit)P(GMAT|!admit)}{P(data)} \\ \\ &= \frac {(.5)*(0.7546488)*(0.00895364)}{(0.568767)*(.0004)*(.5) + (0.7546488)* (0.00895364)*(.5)} \\ \\ &= \mathbf{0.967340} \\ \\ \end{align*} $$



Thus, an individual with \(GPA=2.90\) and \(GMAT=384\) would almost certainly not be admitted according to the Gaussian Naive Bayes classifier.


Naive Bayes in scikit-learn

Implementing a Gaussian Naive Bayes classifier in scikit-learn is straightforward: All the details are abstracted away behind a simple, consistent and easy-to-use interface that makes sense without requiring intimate familiarity with the underlying mechanics. Generally, after we’ve decided on a model to use for a classification task, the next steps are:


  • Pre-process explanatory data (scale, impute and encode)

  • Instantiate model

  • Fit model to training data

  • Predict classes on test/holdout data

Using the sample admissions data above, we demonstrate how to carry out next steps with scikit-learn:

# ===========================================================
# scikit-learn implementation of Naive Bayes classifier     | 
# ===========================================================
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB


# Read dataset into pandas DataFrame =>
df = pd.DataFrame({
    'ID':          ['000000001','000000002','000000003','000000004',
                    '000000005','000000006','000000007','000000008'],
    'GPA'         :[3.14,3.22,2.96,3.28,2.72,2.85,2.51,2.36],
    'GMAT'        :[473,482,596,523,399,381,458,399],
    'ADMITTED_IND':[1,1,1,1,0,0,0,0]
    })

# Split data into desiign matrix (X) and response (y) =>
X = df[['GPA', 'GMAT']].as_matrix()
y = df['ADMITTED_IND'].values


# [1] Pre-process explanatory data 
# we use StandardScaler on explanatory data, which returns the features scaled 
# with 0 mean and unit variance =>
scl = StandardScaler()
X = scl.fit_transform(X)


# [2] Fit model to training data
# Instantiate model and call `fit` =>
clf = GaussianNB()
clf.fit(X, y)

# [3] Predict classes on test/holdout data
# Testing model on holdout observations =>

# scale test data by calling scaler's `transform` method (not fit!) =>
pre_000000009 = scl.transform([[2.90, 384]])
pre_000000010 = scl.transform([[3.40, 431]])

# For 000000009:
obs_000000009 = clf.predict(pre_000000009)

# For 000000010:
obs_000000010 = clf.predict(pre_000000010)


print("000000009 actual admission status: 0; predicited status: {}".format(obs_000000009))
print("000000009 actual admission status: 1; predicited status: {}".format(obs_000000010))

# returns:
#    000000009 actual admission status: 0; predicited status: [0]
#    000000009 actual admission status: 1; predicited status: [1]


We see that the class labels predicted by our model agree with the actual labels in both cases.

From the Gaussian Naive Bayes classifier, we can access both the label predictions and posterior probabilities for each class. For demonstartion purposes, we’ll generate the class predictions and probabiltities associated with the training set of eight instances, but in practice you’ll be interested in determining these metrics for the holdout dataset:

# continuing with clf object from above =>

# get model predicted classes =>
y_hat = clf.predict(X)

# printing y_hat yields:
        # array([1, 1, 1, 1, 0, 0, 0, 0], dtype=int64)

# get model predicted probabilities =>
p_hat = clf.predict_proba(X)[:,[1]]

# printing p_hat yields =>

        # array([[ 0.997114],
        #        [ 0.999608],
        #        [ 1.      ],
        #        [ 0.999998],
        #        [ 0.000098],
        #        [ 0.002745],
        #        [ 0.000001],
        #        [ 0.      ]])

Despite their naive design and apparently oversimplified assumptions, Naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers 2.

In this post, we covered how to implement a Gaussian Naive Bayes classifier with and without scikit-learn. Due to the API’s remarkable level of consistency, implementing a Random Forest or Support Vector Machine with scikit-learn would be virtually identical to the Naive Bayes implementation above. The library exposes a huge amount of functionality which can seem overwhelming at first, but the documentation is so well compiled with many simple usage examples that it encourages the interested individual to jump right in.

Until next time, happy coding!

Footnotes:

  1. scikit-learn documentaton: http://scikit-learn.org/stable/modules/naive_bayes.html
  2. https://en.wikipedia.org/wiki/Naive_Bayes_classifier