Classification and clustering of sequences : application to handwriting recognition

HKUST Electronic Theses

Classification and clustering of sequences : application to handwriting recognition

by Law Hiu Chung

THESIS 1999

M.Phil. Computer Science

xxii, 243 leaves : ill. (some col.) ; 30 cm

Abstract

This thesis consists of two main parts. In the first part we study the recognition of isolated handwritten digits using a new feature representation called pseudo spatio-temporal (PST) representation. This has been inspired by ideas from speech recognition. Basically this means that each handwriting image is converted to a temporal sequence of simple spatial patterns. We apply discrete hidden Markov models (DHMM) to classify the resulting sequences, with reasonable results obtained for the NIST SD1 digit database. We then study two ways to reduce the error rates. The first is to combine the classifiers for different PST representations. The second is to consider the use of continuous-density HMM (CHMM). We have studied two variants of tying in CHMM and found out that tying between different classes gives highest accuracy among all single classifiers. Despite the improvement, the error rates are still higher than state-of-the-art results in the literature. We have investigated some possible reasons for this.

During the above study we realize that unsupervised learning on the PST representation can be helpful. Unfortunately, existing works in the literature do not tackle this well. Therefore in the second part of the thesis we try to study a somewhat independent subject: unsupervised learning on general sequences. We propose a novel combination of Kohonen's self-organizing map (SOM) with HMM, and call the new model "HMM-SOM". The basic idea is to regard each neuron in the SOM as an HMM. The reference vector of a neuron is interpreted as the parameter vector of that HMM. We combine the batch SOM learning algorithm with the EM algorithm of HMM to obtain a learning algorithm for HMM-SOM. We have conducted experiments for this new method with both synthetic sequences and real sequences made up from the PST representation of different images. A variant of the winning rule based on topographic vector quantization (TVQ) as well as a combination of the generalized Lloyd's algorithm and HMM are also studied. Both graphical methods and numerical criteria are used for evaluation. Some interesting results have been obtained for HMM-SOM, although we expected the results to be better than actually observed. Some possible reasons for the limitations are discussed.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent. Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Computer Science Authors Law, Hiu Chung Subjects Writing Data processing Pattern perception Pattern recognition systems Mathematical models Language English Call number Thesis COMP 1999 Law DOI 10.14711/thesis-b643190

Full record

Classification and clustering of sequences : application to handwriting recognition

by Law Hiu Chung

Post a Comment Cancel reply