major computer science areas (Intelligence, Multime-
dia, Sensors, File System and Library), and present
the future trends and directions in LOD.
The Open Data world is related to the Linked
Data world. In fact, standardized proposals are typ-
ically used to describe published Linked Open Data.
The RDF (Data Set Description Framework) (Miller,
1998) is widely used for this purpose. Further-
more, the standard query language for RDF is called
SPARQL Standard Protocol and RDF Query Lan-
guage (Clark et al., 2008). W.r.t. our proposal, it is
very general and devoted to retrieve those data sets
with certain features. However, highly skilled people
in computer science are able to use it. In contrast, our
query technique is very easy for not skilled people and
closer to the concept of query in information retrieval.
Similar considerations are done in (Auer et al.,
2007), where the authors presents DBPedia: the RDF
approach is not suitable for non expert users that need
a flexible and simple query language.
In this paper we do not consider Linked Open
Dara: we query a corpus of Open Data Sets, in gen-
eral not related to each other, thus, not linked at all.
The idea of extending our approach to a pool of
federated Open Data Corpora is exciting. A pioneer
work on this topic is (Schwarte et al., 2011), but they
still rely on SPARQL as query language.
Finally, the heterogeneity of Open Data asks for
the capability of NoSQL databases. In (Kononenko
et al., 2014), the authors report their experience
with Elasticsearch (distributed full-text search en-
gine), highlighting strengths and weaknesses.
10 CONCLUSION
This paper presents a technique to retrieve items (rows
in CSV files or objects in JSON vectors) contained in
open data data sets from those published by an open
data portal. User blindly query the published corpus.
The technique both focuses the search on relevant
terms and expands the search by generating neigh-
bour queries, by means of a string matching degree.
The experimental evaluation shows that the technique
is promising and effective.
New and more extensive experiments will be per-
formed in the future, as well as the technique will
be refined and further improved, by adding seman-
tic information to drive the choice of similar terms.
In particular, as far as this point is concerned, we are
thinking to exploit dictionaries, such as WordNet, that
provide relationships between words. We think that
given a term in the query. we could discover its syn-
onyms and use them to rewrite the query (obtaining
new neighbour queries).
REFERENCES
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak,
R., and Ives, Z. (2007). Dbpedia: A nucleus for a web
of open data. In The semantic web, pages 722–735.
Springer.
Clark, K. G., Feigenbaum, L., and Torres, E. (2008). Sparql
protocol for rdf. World Wide Web Consortium (W3C)
Recommendation, 86.
H
¨
ochtl, J. and Lampoltshammer, T. J. (2016). Adequate-
analytics and data enrichment to improve the quality
of open data. In Proceedings of the International Con-
ference for E-Democracy and Open Government Ce-
DEM16, pages 27–32.
Jaro, M. A. (1989). Advances in record-linkage methodol-
ogy as applied to matching the 1985 census of tampa,
florida. Journal of the American Statistical Associa-
tion, 84(406):414–420.
Khosro, S. C., Jabeen, F., Mashwani, S., and Alam, I.
(2014). Linked open data: Towards the realization of
semantic web - a review. Indian Journal of Science
and Technology, 7(6):745–764.
Kononenko, O., Baysal, O., Holmes, R., , and Godfrey,
M. (2014). Mining modern repositories with elastic-
search. In MSR. June 29-30 2014, Hyderabad, India.
Liu, J., Dong, X., and Halevy, A. Y. (2006). Answering
structured queries on unstructured data. In WebDB.
2006, Chicago, Illinois, USA, volume 6, pages 25–30.
Citeseer.
Manning, C. D., Raghavan, P., Sch
¨
utze, H., et al. (2008).
Introduction to information retrieval, volume 1. Cam-
bridge university press Cambridge.
Miller, E. (1998). An introduction to the resource descrip-
tion framework. Bulletin of the American Society for
Information Science and Technology, 25(1):15–19.
Schwarte, A., Haase, P., Hose, K., Schenkel, R., and
Schmidt, M. (2011). Fedx: a federation layer for
distributed query processing on linked open data. In
Extended Semantic Web Conference, pages 481–486.
Springer.
Shahi, D. (2015). Apache solr: An introduction. In Apache
Solr, pages 1–9. Springer.
Winkler, W. E. (1999). The state of record linkage and cur-
rent research problems. In Statistical Research Divi-
sion, US Census Bureau. Citeseer.
WEBIST 2017 - 13th International Conference on Web Information Systems and Technologies
136