Authors:
            
                    Mauro Pelucchi
                    
                        
                    
                    ; 
                
                    Giuseppe Psaila
                    
                        
                    
                     and
                
                    Maurizio Toccu
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    University of Bergamo, Italy
                
        
        
        
        
        
             Keyword(s):
            Retrieval of Open Data, Blind Querying, Single Item Extraction.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Searching and Browsing
                    ; 
                        Web Information Systems and Technologies
                    ; 
                        Web Interfaces and Applications
                    
            
        
        
            
                Abstract: 
                Public Administrations openly publish many data sets concerning citizens and territories in order to increase
the amount of information made available for people, firms and public administrators. As an effect, Open Data
corpora has become so huge that it is impossible to deal with them by hand; as a consequence, it is necessary
to use tools that include innovative techniques able to query them.
In this paper, we present a technique to select open data sets containing specific pieces of information, and
retrieve them in a corpus published by a portal of open data. In particular, users can formulate structured
queries blindly submitted to our search engine prototype (i.e., being unaware of the actual structure of data
sets). Our approach reinterpret and mixes several known information retrieval approaches, giving at the same
time a database view of the problem. We implemented this technique within a prototype, that we tested on a
corpus containing more that over 2000 data sets
                . We noted that our technique provides focused results w.r.t.
the baseline experiments performed with Apache Solr.
                (More)