Hansheng Lei's Homepage

Research Statement

Current Research

Feature selection, representation and extraction are integral to statistical pattern recognition systems. Usually features are represented as vectors that capture expert knowledge of measurable discriminative properties of the classes to be distinguished. The feature selection process entails manual expert involvement and repeated experiments. Automatic feature selection is necessary when (i) expert knowledge is unavailable, (ii) distinguishing features among classes cannot be quantified, or (iii) when a fixed length feature description cannot faithfully reflect all possible variations of the classes as in the case of sequential patterns (e.g. time series data). Automatic feature selection and extraction are also useful when developing pattern recognition systems that are scalable across new sets of classes. For example, an OCR designed with explicit feature selection process for the alphabet of one language usually does not scale to an alphabet of another language.

One approach to avoiding explicit feature selection is to use a (dis)similarity representation instead of a feature vector representation. The training set is represented by a similarity matrix and new objects are classified based on their similarity with samples in the training set. A suitable similarity measure can also be used to increase the classification efficiency of traditional classifiers such as Support Vector Machines (SVMs).

We establish new techniques for sequential pattern recognition without explicit feature extraction for applications where: (i) a robust similarity measure exists to distinguish classes and (ii) the classifier (such as SVM) utilizes a similarity measure for both training and evaluation. We investigate the use of similarity measures for applications such as on-line signature verification and on-line handwriting recognition. Paucity of training samples can render the traditional training methods ineffective as in the case of on-line signatures where the number of training samples is rarely greater than 10. We present a new regression measure (ER2) that can classify multi-dimensional sequential patterns without the need for training with large number of prototypes. We use ER2 as a preprocessing filter in cases when sufficient training prototypes are available in order to speedup the SVM evaluation. We demonstrate the efficacy of a two stage recognition system by using Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) in the supervised classification framework of SVM. We present experiments with off-line digit images where the pixels are simply ordered in a predetermined manner to simulate sequential patterns. The Generalized Regression Model (GRM) is described to deal with the unsupervised classification (clustering) of sequential patterns.

Previous Research

I began researching on-line signature verification, which is a typical one-class problem [2] and the pattern is sequential. Two particular aspects pose challenges in the field of online signature verification. On one side, intra-personal variation can be large. Some people provide signatures with poor consistency. The speed, pressure and inclinations pertaining to the signatures made by the same person can differ greatly, which makes it quite challenging to extract consistent features. On the other side, we can only expect a few samples from one person and no forgeries in practice. This makes it very difficult to determine the consistency of extracted features. Due to the limited number of training samples, the determination of threshold that decides rejection or acceptance is also an open problem.

Being aware of the two challenging aspects above, I made two significant contributions so far for on-line signature verification. First, I proposed a model to measure the consistency of features extracted from on-line signatures [3]. I found the shape related features are most consistent, while dynamic features such as pressure and acceleration have very poor consistency. Second, ER2, an intuitive measure for signatures is proposed [4]. Given two signatures to compare, it is natural to ask, "How are they similar?" or "What are their differences?” It is intuitive to answer the similarity with a value between 0%-100% and this value should make sense. ER2 is such a measure.

Future Research

My research philosophy is from application to theory and vice versa. Application is important to solve short-distance problems, while theory is a long-distance source of creation. Without theory, the application will lack a solid foundation.

My future work will expand into machine learning and data mining using pattern recognition techniques. Image understanding and computer vision are also in my scope. All of these fields have strong underlying connections. I will continue to investigate the feasibility of regular pattern classification without explicit feature extraction. I believe the future of pattern recognition will be featureless, that is, the features can be either the raw data or learned implicitly through the integrated techniques of machine learning, data mining and pattern recognition.

References

[1] V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.

[2] D. Tax. One-class classification. Ph.D. thesis, TU Delft, 2001.

[3] H. Lei, V. Govindaraju. A Study on the Consistency of Features for On-line Signature Verification. Joint IAPR International Workshops On Syntactical and Structural Pattern Recognition (SSPR 2004) and Statistical Pattern Recognition (SPR 2004).

[4] H. Lei, S. Palla, V. Govindaraju. ER-squared: an Intuitive Similarity Measure for On-line Signature Verification. Submitted to the 9th International Workshop on Frontiers in Handwriting Recognition, 2004.