major challenge is that there are no unique identiers
between the huge data sources that have different At-
tributes. By applying graph analysis, the two sources
can be joined together by removing the records that
are obviously not a match or by proposing records that
appear together many times, which indicate potential
matching.
Our proposed solution enhances the quality of re-
sults and reduces the total number of required com-
parisons by using the weights and frequency relations
between nodes to decide whether there is a match.
The total number is significantly reduced since the
comparing step is not against the whole cluster. Using
record linkage, along with graph analysis, shows a lot
of opportunities and a very promising area of study.
6 FUTURE STUDIES
Using the block rank shows a lot of potential opportu-
nities, yet it will be further explored and investigated
to enhance the overall similarity score by finding the
best formula of the weighted average between string
similarity and block rank similarity, depending on the
availability and rareness of attributes being matched.
Also, we will enhance the block rank ranges which
will divide the whole space to several blocks, we will
try to find a relation between the ranges and the nodes
interactions within the graph.
REFERENCES
Bhattacharya, I. and Getoor, L. (2007). Collective entity
resolution in relational data. ACM Transactions on
Knowledge Discovery from Data.
Blakely, T. and Salmond, C. (2002). Probabilistic record
linkage and a method to calculate the positive predic-
tive value. International Journal of Epidemiology.
D. Dey, V. M. and Liu, D. (2011). efficient techniques for
online record linkage. IEEE Transactions on Knowl-
edge and Data Engineering.
D. Zhang, B. R. and Gemmell, J. (2015). Principled
graph matching algorithms for integrating multiple
data sources. IEEE Transactions on Knowledge and
Data Engineering.
Dunn, H. (1946). Record linkage. American Journal of
Public Health and the Nations Health.
G. You, S. Hwang, Z. N. and Wen, J. (2011). Socialsearch
enhancing entity search with social network matching.
In Proceedings of the 14th International Conference
on Extending Database Technology.
Goga, O. (2014). Matching user accounts across online so-
cial networks methods and applications. Laboratoire
d’Informatique de Paris.
J. Mugan, R. Chari, L. H. E. M. M. S. Y. Q. and Coffman, T.
(2014). Entity resolution using inferred relationships
and behavior. IEEE International Conference on Big
Data (Big Data).
Jupin, J. and Shi, J. (2015). A proposition for resilient
graph-based record linkage using parallel processing
on distributed networks. Resilience Week.
Kalashnikov, D. and Mehrotra, S. (2006). Domain-
independent data cleaning via analysis of entity-
relationship graph. ACM Transactions on Database
Systems.
L. Ding, L. Zhou, T. F. and Joshi, A. (2005). How the
semantic web is being used an analysis of foaf doc-
uments. in Proceedings of the 38th Annual Hawaii
International Conference on System Sciences.
L. Gu, R. Baxter, D. V. and Rainsford, C. (2003). Record
linkage: Current practice and future directions. Com-
monwealth Scientific and Industrial Research Organi-
sation, Mathematical and Information Sciences.
M. Balduzzi, C. T. E. D. and C.Kruegel (2010). Abusing
social networks for automated user proling. Springer
Recent Advances in Intrusion Detection.
M. Bilgic, L. Licamele, L. G. and Shneiderman, B. (2006).
D-dupe: an interactive tool for entity resolution in so-
cial networks. IEEE Symposium on Visual Analytics
Science and Technology.
O. Peled, M. Fire, L. R. and Elovici, Y. (2015). En-
tity matching in online social networks. International
Conference on Social Computing.
O.Hassanzadeh (2009). Framework for evaluating
clustering algorithms in duplicate detection.
Proc.VLDBEndowment2.
Rowe, M. and Ciravegna, F. (2008). Disambiguating iden-
tity through social circles and social data. in Collective
Intelligence Workshop ESWC 2008.
S. Liu, S. Wang, F. Z. J. Z. and Krishnan, R. (2014). Hydra
large scale social identity linkage via heterogeneous
behavior modeling. In Proceedings of ACM SIGMOD
International Conference on Management of Data.
S. Randall, J. Boyd, A. F. and Semmens, J. (2014).
Use of graph theory measures to identify errors in
record linkage. Computer Methods and Programs in
Biomedicine.
S. Vosoughi, H. Z. and Roy, D. (2015). Digital stylometry
linking proles across social networks. International
Conference on Social Informatics.
T. Iofciu, P. Fankhauser, F. A. and Bischo, K. (2011). Identi-
fying users across social tagging systems. In ICWSM.
Veldman, I. (2009). Matching proles from social network
sites. University of Twent.
Y. Wang, J. Li, Q. L. and Ren, Y. (2015). Prediction of
purchase behaviors across heterogeneous social net-
works. Supercomput.
Clink - A Novel Record Linkage Methodology based on Graph Interactions
171