Authors:
Francesco Gargiulo
1
;
Antonio Picariello
2
and
Vincenzo Moscato
2
Affiliations:
1
Italian Aerospace Research Centre, via Maiorise, Capua (CE) and Italy
;
2
Department of Computing, University Federico II of Napoli, Napoli and Italy
Keyword(s):
Distributed Index, Large Databases, Multidimensional Data Index, Decentralized K-Nearest Neighbour Query.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Data Engineering
;
Data Management and Quality
;
Data Structures and Data Management Algorithms
;
Information Retrieval
;
Ontologies and the Semantic Web
;
Pattern Recognition
;
Software Engineering
Abstract:
The main objective of this work is the proposal of a decentralized data structure storing a large amount of data under the assumption that it is not possible or convenient to use a single workstation to host all data. The index is distributed over a computer network and the performance of the search, insert, delete operations are close to the traditional indices that use a single workstation. It is based on k-d trees and it is distributed across a network of "peers", where each one hosts a part of the tree and uses message passing for communication between peers. In particular, we propose a novel version of the k-nearest neighbour algorithm that starts the query in a randomly chosen peer and terminates the query as soon as possible. Preliminary experiments have demonstrated that in about 65% of cases it starts a query in a random peer that does not involve the peer containing the root of the tree and in the 98% of cases it terminates the query in a peer that does not contain the root
of the tree.
(More)