Integration with the EEG/ERP Portal - The
EEG/ERP Portal is based on Java technologies
and this is why search engines providing Java
API are easier to be integrated to the working
infrastructure and therefore preferred.
Other Features and Extension – Since we offer
comfort to users of the EEG/ERP Portal, it is
necessary to have a set of built-in features such
as result highlighting, faceted search, synonym
search, etc.
Independence of Data Sources - The chosen
search engine must be able to accept data from
various sources and not to be limited only to
one specific data source, such as relational
database. The reason behind this is the
mentioned need to index LinkedIn articles as
well as to enable further possible indexing
scenarios in the future, such as indexing .pdf or
XML files.
Independence of other Technologies - This
criterion means that the search engine should
not rely on a specific technology to be used.
Dependence of Hibernate Search on Hibernate
or heavy orientation of Sphinx on MySQL may
serve as the examples of the search engines
which perform well if certain conditions are
met, but cannot run or do not perform well if
not.
Community - Numerous and active developer
community also plays a big role in the final
choice. The bigger community around the
search engine is, the higher is the chance that
the engine development will not stop early,
new features will be introduced and found bugs
will be resolved quickly.
The search engines were evaluated based on
these criteria. Since it was stated for the speed
criterion that its differences for the discussed search
engines were not significant, it is not included in the
final evaluation.
Note that the last criterion, community cannot be
evaluated by an exact manner. This criterion
involves the following four points: the size of
mailing lists, the number of search results about the
search engines found on Google, the number of
related blog posts as well as the number of posts on
specialized websites.
According to the evaluation, the Lucene based
full-text search Solr was chosen.
4 INDEX DESIGN
The full-text search engines create an index
document that ensures faster search through source
texts or databases. This section is focused on
identifying domain entities and design of a common
index structure for the electrophysiological database
that the EEG/ERP Portal uses and the LinkedIn
social network.
4.1 Identification of Domain Entities
Data and metadata are logically stored into tables.
Unfortunately, our database contains 71 tables.
Therefore, we do not attach our data model as a
figure. Instead of the data model, we describe core
data that we store. There are tables containing core
data. These tables represent domain entities. Other
tables extend core data. In relation to POJO classes,
the domain entities are represented by parent classes
in the full-text search context. There exist one-to-
one or one-to-many associations between parent
POJO classes and child classes.
The following list shows domain entities. It also
captures relations between parent and child classes
(Koren, 2013):
Article - Articles are represented by the Article
class. It contains the title and text string fields
that hold values of article title and its text,
respectively. Articles and news are stored in
the database and also in the LinkedIn social
network. Articles can be commented on, so
each article may have one or more article
comments associated. Text of the comments
themselves is contained in the ArticleComment
instances.
Experiment - This class has the field
environmentNote to describe the experiment
environment. Apart from this field, it also
refers to related Weather, Disease, DataFile,
Hardware and Software objects that include
information about weather, diseases of a tested
subject, related data files, and used hardware
and software during an experiment,
respectively.
Person - This class contains a person’s name,
surname and a note about the person. Although
there are also associations to other classes, it is
not necessary to include them for indexing and
searching purposes.
Research group - This class keeps a name, title
and description of a research group and as in
the case of the Person class, its structure
HEALTHINF2014-InternationalConferenceonHealthInformatics
240