Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes Theorem with the naive assumption of independence between every pair of features 1. The Naive Bayes classifier makes two strong assumptions:

The value of a particular feature is independent of the value of any other feature, given the class variable.

The set of features associated with an unclassified instance are assumed to follow a normal distribution.
To create a Gaussian Naive Bayes classifier (without scikitlearn):

Ensure all explanatory variables are continuous: If the dataset contains categorical features, look into the Bernoulli or Multinomial form of Naive Bayes.

For each explanatory variable, calculate the maximum likelihood estimate of the mean and variance for each class.

To classify a new instance, calculate the posterior probability for each class. There will be as many posterior probabilities per unclassified instance as there are distinct classes.

The new instance will be classified based on the class with the greatest posterior probability.
Example:
Consider a sample dataset representing business school admissions:
We have two additional instances that will be used to test the classifier:
For each feature, we calculate the mean and variance for admitted and notadmitted:
In the sample dataset, we have equiprobable priors (since \(P(admit) = P(!admit) = .5\)). However, the prior probabilities need not be derived from the dataset of interest. They can be based on external data sources (such as admissions from prior years).
Recall the general form of Bayes’ Theorem:
The posterior probability for admitted is given by:
and for notadmitted:
Where:

\(P(admit)/P(!admit)\) represents the prior probability, .50 in this example.

\(P(GPAadmit)P(GMATadmit)\) represents the likelihood. We assume zero correlation between \(GPA\) and \(GMAT\) via the first assumption of Naive Bayes.

data is a standin for \(GMAT\) and \(GPA\) for a given instance.
The second assumption of Naive Bayes is that all explanatory variables follow a
normal distribution. Thus, \(P(GMATadmitted)\) is calculated by passing the
observation’s \(GMAT\) score and \(GPA\) into the associated normal density
function, parameterized by the corresponding estimates of mean and variance
determined above.
For the admitted class:
For notadmitted:
Classifying Instances
Recall our test observations:
We calculate the admitted and notadmitted posterior for each instance: The observation will be classified as admitted/1 or notadmitted/0 based on the class with the greatest posterior probability.
For ID=000000009:
GMAT calculation for admitted:
GPA calculation for admitted:
GMAT calculation for notadmitted:
GPA calculation for notadmitted:
Then, plugging values into the posterior expression, class probabilities for ID=000000009 are given by:
Thus, an individual with \(GPA=2.90\) and \(GMAT=384\) would almost certainly not
be admitted according to the Gaussian Naive Bayes classifier.
Naive Bayes in scikitlearn
Implementing a Gaussian Naive Bayes classifier in scikitlearn is straightforward: All the details are abstracted away behind a simple, consistent and easytouse interface that makes sense without requiring intimate familiarity with the underlying mechanics. Generally, after we’ve decided on a model to use for a classification task, the next steps are:

Preprocess explanatory data (scale, impute and encode)

Instantiate model

Fit model to training data

Predict classes on test/holdout data
Using the sample admissions data above, we demonstrate how to carry out next steps with scikitlearn:
# ===========================================================
# scikitlearn implementation of Naive Bayes classifier 
# ===========================================================
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
# Read dataset into pandas DataFrame =>
df = pd.DataFrame({
'ID': ['000000001','000000002','000000003','000000004',
'000000005','000000006','000000007','000000008'],
'GPA' :[3.14,3.22,2.96,3.28,2.72,2.85,2.51,2.36],
'GMAT' :[473,482,596,523,399,381,458,399],
'ADMITTED_IND':[1,1,1,1,0,0,0,0]
})
# Split data into desiign matrix (X) and response (y) =>
X = df[['GPA', 'GMAT']].as_matrix()
y = df['ADMITTED_IND'].values
# [1] Preprocess explanatory data
# we use StandardScaler on explanatory data, which returns the features scaled
# with 0 mean and unit variance =>
scl = StandardScaler()
X = scl.fit_transform(X)
# [2] Fit model to training data
# Instantiate model and call `fit` =>
clf = GaussianNB()
clf.fit(X, y)
# [3] Predict classes on test/holdout data
# Testing model on holdout observations =>
# scale test data by calling scaler's `transform` method (not fit!) =>
pre_000000009 = scl.transform([[2.90, 384]])
pre_000000010 = scl.transform([[3.40, 431]])
# For 000000009:
obs_000000009 = clf.predict(pre_000000009)
# For 000000010:
obs_000000010 = clf.predict(pre_000000010)
print("000000009 actual admission status: 0; predicited status: {}".format(obs_000000009))
print("000000009 actual admission status: 1; predicited status: {}".format(obs_000000010))
# returns:
# 000000009 actual admission status: 0; predicited status: [0]
# 000000009 actual admission status: 1; predicited status: [1]
We see that the class labels predicted by our model agree with the actual
labels in both cases.
From the Gaussian Naive Bayes classifier, we can access both the label
predictions and posterior probabilities for each class. For demonstartion
purposes, we’ll generate the class predictions and probabiltities associated
with the training set of eight instances, but in practice you’ll be interested
in determining these metrics for the holdout dataset:
# continuing with clf object from above =>
# get model predicted classes =>
y_hat = clf.predict(X)
# printing y_hat yields:
# array([1, 1, 1, 1, 0, 0, 0, 0], dtype=int64)
# get model predicted probabilities =>
p_hat = clf.predict_proba(X)[:,[1]]
# printing p_hat yields =>
# array([[ 0.997114],
# [ 0.999608],
# [ 1. ],
# [ 0.999998],
# [ 0.000098],
# [ 0.002745],
# [ 0.000001],
# [ 0. ]])
Despite their naive design and apparently oversimplified assumptions, Naive Bayes classifiers have worked quite well in many complex realworld situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers 2.
In this post, we covered how to implement a Gaussian Naive Bayes classifier with and without scikitlearn. Due to the API’s remarkable level of consistency, implementing a Random Forest or Support Vector Machine with scikitlearn would be virtually identical to the Naive Bayes implementation above. The library exposes a huge amount of functionality which can seem overwhelming at first, but the documentation is so well compiled with many simple usage examples that it encourages the interested individual to jump right in.
Until next time, happy coding!
Footnotes:
 scikitlearn documentaton: http://scikitlearn.org/stable/modules/naive_bayes.html
 https://en.wikipedia.org/wiki/Naive_Bayes_classifier