22
Web Intelligence
3.3
Classification
Classification is an everyday task, some examples are
· Selecting one of several directions (e.g. left, north or up) based on past
experience, perception and the navigational goal.
· Recycling your garbage (what material to put in which bin) based on
garbage characteristics (e.g. paper, plastic or organic).
· Interpreting symbols with several fonts and styles as characters in text
(e.g. bold, italic, Times New Roman or Courier), or in other words read-
ing. Search Engines submission pages use such tests to discern between
(unwanted) link submissions by software agents and (wanted) submissions
by people.
The class or concept is the selected outcome (set of directions, types of bins, or
characters in the alphabet for the above examples). Selection of the class is based
on apriori knowledge acquired through training or experience.
From a data analysis perspective, classification can be defined as below:
Definition 3.1 (Classification, Han and Kamber [2001]). Classification
is the process of finding a set of models (or functions) that describe and distin-
guish data classes or concepts, for the purpose of being able to use the model to
predict the class of object whose class is unknown. The derived model is based on
the analysis of the set of training data (i.e., data objects whose class label is
unknown)
The classification model is usually found by using one or several classification
algorithms (classifiers) that processes the training data. The training data are
frequently in the form of feature vectors.
3.3.1
Typology
Informative - model the densities of classes and select the class that most likely
produce the features. Naive Bayes, Hidden Markov Models and Fisher
Discriminant Analysis are examples of informative classifiers.
Discriminative - model the class boundary and membership probability di-
rectly. Logistic Regression, C4.5, Artificial Neural Networks, Support Vec-
tor Machines and Generalized Additive Models are examples of discrimina-
tive classifiers. The classifiers proposed in paper F, G and H are discrimi-
native.