Journey Details

# Data Mining Classification

Classification tutorial for data mining class

Build intelligent machines that can recognise and detect objects, letters, fingerprints, or faces from video records, documents, photos, sound, and songs.

In this journey, we will explore methods of developing intelligent machines by building automated classifiers: pattern recognition systems.Â

The problem of building a classifier can be defined as finding an optimal hyperplane (decision boundary) for a set of class labels.

Â  Fig. 1. Hyperplane (decision boundary) for a binary class decision problem.

Fig. 1 above shows a line (decision boundary). Points on the right hand side of the hyperplane areÂ classified as P (positive) and the points on the left are classified as N (negative). Assuming that the squares are actually positive and circles are negative, we can calculate performance metricsÂ of the hyperplane, such as accuracy, precision, recall, and specificity.

The main objective of machine learning approaches (or method for building classifiers) is finding a hyperplane that maximise a set of performance measures. For 2D feature space, the hyperplane can be expressed as y = ax + b, which has two parameters a and b. We can simply vary a and b until we find the desired performance metrics. The problem of this approach is that this will take tooÂ long.Â However, thisÂ will always find the best solution. Any other approaches will always be less optimal in term of performance. For N dimensional feature space, we need to use a matrix formula: wâ€¢x +b > 0, whereÂ w andÂ x areÂ vectors.

In this journey, we will explore various methods of finding the hyperplane more efficiently.Â