A Novel Algorithm for String Matching with Mismatches

Vinod-prasad P.

Abstract

We present an online algorithm to deal with pattern matching in strings. The problem we investigate is commonly known as ‘string matching with mismatches’ in which the objective is to report the number of characters that match when a pattern is aligned with every location in the text. The novel method we propose is based on the frequencies of individual characters in the pattern and the text. Given a pattern of length M, and the text of length N, both defined over an alphabet of size σ, the algorithm consumes O(M) space and executes in O(MN/σ) time on the average. The average execution time O(MN/σ) simplifies to O(N) for patterns of size M ≤ σ. The algorithm makes use of simple arrays, which reduces the cost overhead to maintain the complex data structures such as suffix trees or automaton.

References

  1. Abrahamson, K., (1987). Generalized string matching, SIAM Journal of Computing, 16 (6), 1039-1051.
  2. Amir, A., Lewenstein M., and Porat E. (2004) A Faster algorithms for string matching with k mismatches. Journal of Algorithms, 257-275.
  3. Atallah, J., Chyzak F., and Dumas P. (2001). A randomized algorithm for approximate string matching, Algorithmica, 29(3), 468-486.
  4. Austen, J., (1813). Pride and prejudice, Retrieved from https://www.gutenberg.org/ebooks/1342.
  5. Baeza-Yates, R., Perleberg, H. (1996) Fast and practical string matching. Information Processing Letters, 59, 21-27.
  6. Boyer, R., Moore, S., (1977). A fast string searching algorithm. Communications of the ACM, 20, 762- 772.
  7. Clifford, P., Clifford, R., (2007). Simple deterministic wildcard matching, Information Processing Letters, 101(2), 53 - 54.
  8. Crochemore M., Hancart, C., Lecroq T., (2007). Algorithms on Strings, Cambridge University Press.
  9. Galil, Z., Giancarlo, R., (1986). Improved string matching with k mismatches, SIGACT News, 17(4), 52-54.
  10. Gusfield, D., (1999). Algorithms on strings, trees and sequences. Cambridge University Press,
  11. Horspool, N., (1980). Practical fast searching in strings. Software Practice and Experience, 10, 501-506.
  12. Kalai, A., (2002). Efficient pattern-matching with don't cares, SODA, 655-656.
  13. Karp, R., Rabin, M., (1987). Efficient randomized patternmatching algorithms. IBM Journal Research and Development, 31, 249-60.
  14. Landau, G., Vishkin, U., (1986). Efficient string with k mismatches. Theoretical Computer Science, 43, 239- 249.
  15. McCreight, E., (1976). Space-economical suffix tree construction algorithm, Journal of ACM, 23, 262-272.
  16. Navarro, G., (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31-88.
  17. Nicolae, M., Rajasekaran, S., (2013). On String Matching with Mismatches, retrieved from http://arxiv.org/pdf/1307.1406.pdf.
  18. Tarhio, J., Ukkonen, E., (1993). Approximate BoyerMoore string matching. SIAM Journal of Computing, 2, 243-260.
  19. Ukkonen, E. (1995). On-line construction of suffix trees, Algorithmica, 41, 249-260.
Download


Paper Citation


in Harvard Style

P. V. (2016). A Novel Algorithm for String Matching with Mismatches . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 638-644. DOI: 10.5220/0005752006380644


in Bibtex Style

@conference{icpram16,
author={Vinod-prasad P.},
title={A Novel Algorithm for String Matching with Mismatches},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={638-644},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005752006380644},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Novel Algorithm for String Matching with Mismatches
SN - 978-989-758-173-1
AU - P. V.
PY - 2016
SP - 638
EP - 644
DO - 10.5220/0005752006380644