Assignment 1

Pattern Classification
Fredrik Georgsson

The assigment is to be carried out in groups of one or two persons. Every member of a group must activelly participate in all moments of the assingment and should be familiar with the entire report. Normal rules for limited co-operation when solving assignments do not apply for this assignment.

Last day to hand in the report for the assignment is November 23:rd 2005


The Task

The assignment consists of implementing and testing a nuymber of classifiers in MatLab.

The Classifiers are:

The classifiers should handle arbitrary dimensionality of the feature space and two classes coded with 0 or 1. The classifiers should be tested on the data sets shown below.


Data

Data to test and evaluate the classifiers can be found here :

  1. moln
  2. split
  3. tricky

The datafiles contains the vectors features (2 x n) and targets (1 x n). Where the 2 says that we have two-dimensional patterns. n is the number of patterns. The first 20% of the data should be used for training and the remaining 80% should be used for testing.

In the examples below, the following notations are used fro training and test data:
ftrain/ftest are m x n-matrixes where each column contains a m-dimensional pattern (m=2 for our data)
ctrain/ctest are 1 x n-matrixes where each column contains the correct classification for each pattern.


Description

The classifiers should be implemented with two functions each according to the following example (knn):

knn_train(ftrain, ctrain, param)
The function should performe a training of the classifier by using ftrain and ctrain. param contains parameters to the classifier. In the case of knn the training is nothing more than to make ftrain and ctrain available for knn_test by global variables. Param contains the parameter k.

cout=knn_test(ftest, param)
The function shall apply the trained classifier to the patterns in ftest.
cout is a 1 x n-matrix containing the classifications that knn_test produced for the patterns in ftest.

Observe that the functions should be able to accept entire vectors of patterns and return a vector with the correspondning classification, see the function dummy_test below. Challange: by using Matlab's vector-operations this can, for instance for the ML-classifier, be implemented without any loops through the individual patterns. This, however, is not a required specification for the assignment.

To help the work and to improve the documentation of the assignment you shall use a function decisionboundary.m that calcultaes and plot a "decision plot". The function is used in the following manner:
decisionboundary(algorithm, features, targets, param)
for example:
decisionboundary('knn_test', ftest, ctest, 3)
Besides the "decision plot" each data point (features, targets) is pltted in the graph.
decisionboundary plots 2-dimensional graphs and only shows the first two dimensions in ftest.
decisionboundary repeatedly calls knn_test to calculate a matrix of points so that a "decision plot" can be generated.

To test the classifiers you shall write a function
perf = classeval(algorithm, ftest, ctest, param)
classeval takes an algorithm (or more correctly the name of an algorithm eller) and applies it to all patterns in ftest and calculates a performance measure perf for the classifier. The perfromance measure error rate" (see chapter 2.3 in DHS) is choosen fro the assignment.
The knn-example:
perf = classeval('knn_test', ftest, ctest, k)
classeval
calls knn_test with ftest and k as parameters. perf is the error-rate (fraction miss-classified) for knn_test when it is
applied on the pattern in ftest. Help regarding how classeval should be implemented can be found in the source code for the functionen decisionboundary. Specially note how feval is used to call a function represented by a string.


Matlab

In order to freschen up your MatLab-skills or find solutions to specific problems you can view the examples in Basic matlab tutorial (Dragoljub Pokrajac-Pokie 2/2001).
Take special care to use the fine debugging fascilities in MatLab. Set break points in the functions you are developing and control that values of parameters is what you expect. Error during the execution of complex expressions can be found by evaluating the expression part by part until the error arrises.
When using a new function and beeing uncertain on the proper syntax, use help to find out the proper syntax and then test the function outside your own code in order to understand how it actually works.

Useful files and examples:
The following functions are supplied and could be used:
classtest.m loads test-data and then calls a dummy classifier dummy_train.m, dummy_test.m. The function decisionboundary.m calculates and plots a decision plot.

Functions for statistics:
Each column in the matrix X shall contain a pattern ("sample").
cov(X,1) calculates the covariance matrix for the columns in X.
mean(X) calculates the mean-value for each row in X.

Other
eval(str) evaluates a MatLab expression represented by the string str


Övrigt

This assignment, as for the whole course, requires some knowledge in statistics and linear algebra. Summaries of important points can be found in Appendices of the course-book.


The Report

The report should foolow the general recomendations (linked from the homepage of the course). Take special care to make sure that the following information is included: