In pattern recognition, the idea is to mimic the human brain in classifying objects into class. Similar to object oriented programming, an object has methods or attributes called features which are common to objects belonging to a certain class. Pattern recognition has two tasks, (a) finding the appropriate features such as color, area or perimeter for efficient separation of classes and (b) finding a decision function that can classify objects into class. We can define the features as quantifiable properties and combine them together to form a feature vector. From the feature vector of objects, we use a method/function to classify them.
In this activity we employ the minimum distance classification to classify objects. We use five features: area, perimeter, % red, % blue and % green in classifying chronic lymphocytic leukemia (CLL) cells from lymphocytes (T cells or B cells) [1]. In order to classify objects, an a priori knowledge of a class and its set of features must first be establish. We can think of it as training the classifier for further classification later on. Below is the image of a collection of CLL cells and lymphocytes which we will use in our classification algorithm.
From the above image, five cropped images of lymphocytes and CLL cells were used as training sets and five more images of both classes were used as test sets. In classifying the objects, the minimum distance classification is used which can be represented using the equation below:
where x is an array where each row represents an object and each column represents a feature, m is the mean feature for a certain class W. For each class W, we can obtain d and an object can be classified to a class if it has the largest value of d.
Below are the images used for the training sets and test sets for both classes (CLLs and lymphocytes). In order to extract the area and perimeter features, thresholding was used to segment the images (b) and bwlabel was used to clean the images from isolated spots (c) (see fig 2). Pixel count and the scilab's builtin command follow was used to extract the area and perimeter respectively.
Figure 2: Training set of images for both CLLs and lymphocytes with the corresponding image processing operations used (a) rgb (b) bw/thresholding and (c) bwlabel.
Figure 3: Test set of images for both CLLs and lymphocytes with the corresponding image processing operations used (a) rgb (b) bw/thresholding and (c) bwlabel.
The table below shows the mean features obtained from the training set. It can be seen that the area and perimeter features segments the classes pretty well as shown in figure 4.
Table 1: Mean features of CLLs and lymphocytes.
The table below shows the result of the classification after computing d for both classes (CLL and lymphocytes) applying on the test sets of figure 3. The algorithm correctly identifies 10/10 objects giving 100% correct classification.
Table 2: Computed d of lymphocytes and CLLs.
In this activity, I give myself a grade of 10 for attaining the objectives of the activity, correctly classifying objects.
I would like to acknowledge Jay Samuel Combinido for very useful discussions.
References
[1] http://www.vet.uga.edu/vpp/clerk/waikart/index.php
[2] Applied Physics 186 activity 14 manual, (c)2008 Maricor Soriano
No comments:
Post a Comment