major challenge is that there are no unique identiers
between the huge data sources that have different At-
tributes. By applying graph analysis, the two sources
can be joined together by removing the records that
are obviously not a match or by proposing records that
appear together many times, which indicate potential
Our proposed solution enhances the quality of re-
sults and reduces the total number of required com-
parisons by using the weights and frequency relations
between nodes to decide whether there is a match.
The total number is significantly reduced since the
comparing step is not against the whole cluster. Using
record linkage, along with graph analysis, shows a lot
of opportunities and a very promising area of study.
Using the block rank shows a lot of potential opportu-
nities, yet it will be further explored and investigated
to enhance the overall similarity score by finding the
best formula of the weighted average between string
similarity and block rank similarity, depending on the
availability and rareness of attributes being matched.
Also, we will enhance the block rank ranges which
will divide the whole space to several blocks, we will
try to find a relation between the ranges and the nodes
interactions within the graph.
Clink - A Novel Record Linkage Methodology based on Graph Interactions