Exploiting Correlation-based Metrics to Assess Encoding Techniques

Giuliano Armano, Emanuele Tamponi

Abstract

The performance of a classification system depends on various aspects, including encoding techniques. In fact, encoding techniques play a primary role in the process of tuning a classifier/predictor, as choosing the most appropriate encoder may greatly affect its performance. As of now, evaluating the impact of an encoding technique on a classification system typically requires to train the system and test it by means of a performance metric deemed relevant (e.g., precision, recall, and Matthews correlation coefficients). For this reason, assessing a single encoding technique is a time consuming activity, which introduces some additional degrees of freedom (e.g., parameters of the training algorithm) that may be uncorrelated with the encoding technique to be assessed. In this paper, we propose a family of methods to measure the performance of encoding techniques used in classification tasks, based on the correlation between encoded input data and the corrisponding output. The proposed approach provides correlation-based metrics, devised with the primary goal of focusing on the encoding technique, leading other unrelated aspects apart. Notably, the proposed technique allows to save computational time to a great extent, as it needs only a tiny fraction of the time required by standard methods.

References

  1. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 25(17):3389-3402.
  2. Fisher, R. (1925). Statistical methods for research workers. Edinburgh Oliver & Boyd.
  3. Henikoff, S. and Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89(22):10915-10919.
  4. Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2):195-202.
  5. Lewandowski, D., Cooke, R. M., and Tebbens, R. J. D. (2007). Sample-based estimation of correlation ratio with polynomial approximation. ACM Trans. Model. Comput. Simul., 18(1):3:1-3:17.
  6. Rencher, A. C. (2002). Methods of Multivariate Analysis. John Wiley & Sons, second edition.
  7. Rost, B. (1996). Phd: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266:525-539.
  8. Rost, B., Sander, C., and Schneider, R. (1994). Redefining the goals of protein secondary structure prediction. Journal of Molecular Biology, 235(1):13 - 26.
Download


Paper Citation


in Harvard Style

Armano G. and Tamponi E. (2013). Exploiting Correlation-based Metrics to Assess Encoding Techniques . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 308-314. DOI: 10.5220/0004267503080314


in Bibtex Style

@conference{icpram13,
author={Giuliano Armano and Emanuele Tamponi},
title={Exploiting Correlation-based Metrics to Assess Encoding Techniques},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={308-314},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004267503080314},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Exploiting Correlation-based Metrics to Assess Encoding Techniques
SN - 978-989-8565-41-9
AU - Armano G.
AU - Tamponi E.
PY - 2013
SP - 308
EP - 314
DO - 10.5220/0004267503080314