We also found that the precision is high for
Metaphone and NYSIIS. In detail, for the NC Address
dataset of size 800, the cumulative precision of
Metaphone and NYSIIS is nearly 0.06 respectively,
whereas for Soundex and DMetaphone the value is
nearly 0.02. However, the value of cumulative
precision is dependent on type of errors for the English
dictionary dataset. But, it is observed that Metaphone
has highest precision varying from 0.2 to 0.07, while
Soundex and DMetaphone has the lowest value
varying from 0.008 to 0.002. This clearly shows that
Soundex and DMetaphone have high noise of all the
five algorithms, which decrease their performance
simultaneously increasing their processing time.
Overall, the experiments can be concluded that
Metaphone is a better algorithm comparatively for
English dictionary words while NYSIIS is better
algorithm for street names.
4 CONCLUSION AND FUTURE
WORK
This paper elicits the efficient algorithm by calculating
the precision and recall on different inputs for the
street names from NC Address dataset and English
dictionary in the database. In spite of errors being
typographical, the phonetic matching algorithms are
still able to address them in acceptable level.
The algorithms are fruitful in terms of accuracy,
but they are not very productive as the precision is
very low due to number of false positives. Metaphone
and NYSIIS are more efficient of the five analyzed
algorithms, comparatively, for different inputs having
different types of errors. Caverphone has relatively
more efficiency than DMetaphone and Soundex.
As per the observations, Soundex has high recall
compared to other algorithms but because of its low
precision the algorithm is not very efficient. Due to its
high accuracy, the algorithm is still used in various
applications having high tolerance to false negatives.
From the above experimental results, it is evidential
that there is no unique algorithm which is effective for
all types of databases.
Though the experiment gives near suggestions
from the five algorithms, it would not detect all the
close matches, as the matched word from the database
is an extraction with exact replica of complete
phonetic code generated. A more transparent analysis
can be performed to obtain the efficient algorithm by
considering a threshold in obtaining the near matches.
The threshold can be fixed based on employment of
string matching algorithms like Levenshtein Edit
Distance (LED) algorithm or Boyer-Moore string
matching algorithm on the phonetic codes to improve
the accuracy and F-measure. Moreover, efficiency on
Street names can be improved if other languages’
phonetic structures are introduced to the system.
REFERENCES
Balabantaray, RC, Sahoo, B, Lenka, SK, Sahoo, DK &
Swain, M May 2012. An Automatic Approximate
Matching Technique Based on Phonetic Encoding for
Odia Query. IJCSI International Journal of Computer
Science Issues, Vol. 9, Issue 3, No 3.
Beider, A & Morse, SP March, 2010. Phonetic Matching:
A Better Soundex. [Online] Available from:
http://stevemorse.org/phonetics/bmpm2.htm
Bhattacharjee, AK, Mallick, A, Dey, A & Bandypoadhay,
S September 2013. Enhanced Technique for Data
cleaning in text files. International Journal of
Computer Science Issues, Vol. 10, Issue 5, No 2.
Carstensen, A September 2005. An Introduction to Double
Metaphone and the Principles behind Soundex.
[Online] Available from: http://www.b-eye-
network.com/view/1596
Chan, K, Vasardani, M & Winter, S August 2015. Getting
lost in Cities: Spatial Patterns of Phonetically
Confusing Street Names. Transactions in GIS, Vol. 19,
Issue 4, August 2015.
Christen, P December 2006. A Comparison of Personal
Name Matching: Techniques and Practical Issues. Sixth
IEEE International Conference on Data Mining -
Workshops (ICDMW'06), pp. 290-294, December
2006.
Hood, D December, 2004. Caversham Project Occasional
Technical Paper.
Kelkar, BA & Manwade, KB June 2012. Identifying Nearly
Duplicate Records in Relational Database. IRACST -
International Journal of Computer Science and
Information Technology & Security (IJCSITS), Vol. 2,
No.3
Kukich, K December 1992. Techniques for automatically
correcting words in text. ACM Computing Surveys, Vol.
24, No.4
Lawler, J March 1999, An English Words List, [Online]
Available from: http://www-personal.umich.edu/
Nikita, March 2013. Phonetic Algorithms. [Online]
Available from: http:// ntz-develop.blogspot.com/
2011/03/phonetic-algorithms.html
Pande, BP & Dhami, HS August 2011. Application of
Natural Language Processing Tools in Stemming.
International Journal of Computer Applications (0975
– 8887) Volume 27– No.6
Philips, L June 2000. The Double Metaphone Search
Algorithm. [Online] Available from: http://www.
drdobbs.com/the-double-metaphone-search-algorithm
Shah, R, & Singh, DK February, 2014. Analysis and
Comparative Study on Phonetic Matching Techniques.