k-Nearest neighbour classifier - example implementation help page

The example implementation offers prepared data sets to visualise the behaviour of a knn-predictor. In addition own data sets can be used as reference or test data.

Data files must follow the following format conventions:

  • One pattern per line.
  • Every pattern consists of a space separated list of numerical attributes.
  • Reference data sets (i.e. training data sets) must include one column with nominal class information (treated as a string of characters, typically the first or the last column).
  • Validation data sets (i.e. test data sets) need no class information (because classes are assigned by the classifier, based on similariry to patterns in the reference data set). If class information is provided for the validation data, the classificator will display, if the prediction is correct.

Example data sets

The following example data sets are incuded in the knn demo:

  • Pen-Based Recognition of Handwritten Digits data set by E. Alpaydin, Fevzi. Alimoglu, Department of Computer Engineering, Bogazici University, Istanbul, Turkey as available from the UCI Machine learning directory.
  • Olive oils dataset by Michele Forina, University of Genova. It incudes a dataset of analytical data for 416 olive oil samples from nine different regions of Italy; for each sample the normalized concentrations of eight fatty acids are given.

References

The knn-lib Java Script library:

  • Please use the following reference if results obtained by applying the software are published
    (download bibTeX file):

    Holzer, M., Dominik, A., 2012. knn-lib - Java Script library for building k-nearest neighbour classificators. THM - University of Applied Sciences Giessen, Website, [Online]. Available at: http://www.life-science-it.org/pages/research/projectKNN.html [Accessed ].

UCI mAchine Learning Directory:

  • Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repositoryi http: University of California, School of Information and Computer Science.

Character recognition data set:

  • F. Alimoglu (1996) Combining Multiple Classifiers for Pen-Based Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University.
  • F. Alimoglu, E. Alpaydin, Methods of Combining Multiple Classifiers Based on Different Representations for Pen-based Handwriting Recognition, Proceedings of the Fifth Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN 96), June 1996, Istanbul, Turkey.

Olive oil data set:

  • The data set is provided by Prof. Michele Forina, University of Genova and published by Gasteiger and Zupan in the book: J. Zupan, J. Gasteiger Neural Networks in Chemistry and Drug Design: An Introduction 2nd Edition, Wiley-VCH, Weinheim, 1999