Preliminary results yield an accuracy of more than
91%, with the entire set of attributes, without using
feature selection. We consider that feature selection
will further boost the classification accuracy. We
also managed to improve the classification time by
using a smaller number of instances per class (14).
The results have also shown that peak
performances are obtained on a 14 instances/class
dataset using 8 clusters and a 20 instances/class
dataset using 7 clusters.
Our current work focuses on connecting the two
steps of the training process, and addressing the
classification stage. Also, for generalizing the scope
of the system, during the training process several
issues need to be considered.
The first is that the classes are not split uniformly
into clusters (instances from the same class are
distributed among at most 4 clusters). At present, we
solve this issue by adding all the instances to the
cluster having the maximum number of instances
from that particular class. However, on a global
model, such situations should have a specific
approach. A possible solution is to distribute all the
instances of a class to all clusters which contain a
number of instances above a threshold from that
class. We need to investigate how this approach
influences the complexity, the performance and the
time of the induced sub-models, as it may produce
the necessity of an additional clustering step.
A second issue which needs addressing is the
time required for the SimpleKMeans method to split
the dataset into clusters. We experimentally
observed that the clustering time increases with the
number of clusters. As for 2-5 clusters it takes
several minutes to build the clusters, for values like
8 or 9 clusters, the time required is of up to 2-3 days.
Moreover, as the number of classes increases, we
might need to introduce additional clustering steps.
We are currently evaluating a methodology for
automatically establishing the parameters of the
hierarchical structure: number of clustering levels,
number of clusters per level, optimal size (in terms
of number of classes) of the training subset
submitted to the Naïve Bayes classifiers.
ACKNOWLEDGEMENTS
Research described in this paper was supported by
the IBM Faculty Award received in 2009 by Rodica
Potolea from the Computer Science Department at
the Technical University of Cluj-Napoca, Romania.
REFERENCES
Azar, D., 1997.“Hilditch's Algorithm for Skeletonization”,
Pattern Recognition course, Montreal.
Bărbănţan, I., Vidrighin, C., Borca, R., 2009. “An Offline
System for Handwritten Signature Recognition”,
Proceedings of IEEE ICCP, pp. 3-10.
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. “On
Custering Validation Techniques”, Journal of
Intelligent Information Systems, 107–145.
Hall, M.A., 2000. “Correlation based Feature Selection for
Machine Learning.” Doctoral dissertation,
Department of Computer Science, The University of
Waikato, Hamilton, New Zealand.
Han, J., Kamber, M., 2006. “Data Mining – Concepts and
Techniques”, Morgan Kaufmann, 2
nd
edition.
Justino, J., R., Yacoubi, A., 2000. “An Off-Line Signature
Verification System Using Hidden Markov Model and
Cross-Validation”, Proceedings of the 13th Brazilian
Symposium on Computer Graphics and Image
Processing.
Kohavi, H., John, R., Pfleger, G., 1994. “Irrelevant
Features and the Subset Selection Problem”, Machine
Learning: Proceedings of the Eleventh International
Conference, 121-129, Morgan Kaufman Publishers,
San Francisco.
McCabe, A., Trevathan, J., Read, W., 2008. “Neural
Network-based Handwritten Signature Verification”,
Journal of Computers, Vol 3, No.8.
Ozgunduz, E., Senturk, T., 2005. “Off-line Signature
Verification and Recognition by Support Vector
Machines”, 13
th
European Signal Processing
Conferenece, Antalya.
Prasad, A., G., Amaresh, V., M. An Offline Signature
Verification System.
Saitta, S., Raphael, B. and Smith, F.C.I., 2007. “A
Bounded Index for Cluster Validity”, Proceedings of
the 5th international conference on Machine Learning
and Data Mining in Pattern Recognition - Lecture
Notes In Artificial Intelligence; Vol. 4571, Springer-
Verlag, pp. 174-187.
Vidrighin, Bratu, C., Muresan, T., Potolea, R., 2008.
“Improving Classification Accuracy through
Feature Selection”, Proceedings of IEEE ICCP, pp.
25-32.
Witten, I., R., Frank, E., 2005. Data Mining,”Practical
Machine Learning Tools and Techniques”, Morgan
Kaufmann Publishers, Elsevier Inc.
A HIERARCHICAL HANDWRITTEN OFFLINE SIGNATURE RECOGNITION SYSTEM
147