engines usually include intelligent algorithms to
avoid storing multiple copies of the same documents
in the index. Ranking of the relevance of documents
is also an issue in federated search. Federated search
engines can not perform relevance ranking of
documents effectively. They can only predict which
document is a better match by examining the titles,
snippets, and URIs from the results. In contrast,
integrated search engines can rank documents based
on contents in the documents, number of links to the
documents, and other factors.
Our federated search system also suffers the
same limitation as mentioned above for federated
search. Particularly, the NotesSQL driver used to
connect to the Notes database lacks the capability of
discovering document attachments in Notes forms,
thus is incapable of providing search results on
attached documents.
Although federated search system is subject to
lower-quality than integrated search, it possesses
other advantages. It is highly scalable because it
links directly to data sources and can compose the
search results in real-time. Thus, the information
obtained is more up to date than the integrated
search system. In contrast, the integrated search
engine has to produce a local index containing all
documents that it finds, and may eventually run out
of capacity due to the rapidly growing number of
documents to index. Moreover, there is always a
time lag between when a document is changed and
when the one-index search engine updates its index.
Resource requirement may be another advantage
of the federated search server. Federated search
system leverages distributed engines, thus may not
require much resources to support the runtime
performance. On the contrary, integrated search
system needs to frequently produce and store a local
index, thus may require excessive resources to
realize acceptable performance.
5 CONCLUDING REMARKS
In this paper, we present a general architecture and
compare two implementations of a service-based
information search system to search on multiple
content repositories. To illustrate the heterogeneity
of the complex environment, we select as data
sources a combination of a Notes database, an IBM
DB2 Content Manager server that uses Web services
for search interfaces, and a Windows file system.
First implementation is a federated search system
that integrates and maps the various schemas of the
sources into a common interface. This system
leverages data search capabilities in native source
locations, thus offering the advantages of scalability
and accessibility to real-time information. It is
simple to implement, and does not require much
resources to support the runtime. However, it suffers
the limitations of incapable to perform relative
ranking of search results, ineffective in eliminating
duplications, and unable to find document
attachments in Notes forms from Notes databases.
Second implementation is an integrated search
system. This system uses crawlers and indexers to
collect and analyze information from different
sources into a single index. It offers the ability to
perform relative ranking of documents, and
eliminates duplications in search results. However,
it undergoes the limitation of scalability, and may
require excessive resources for acceptable
performance in frequently refreshing the forever
growing index of information.
REFERENCES
Lyman et. al., 2003. How Much Information 2003?. From
http://www.sims.berkeley.edu/research/projects/how-
much-info-2003/
Haas et. al., 2002. Data Integration through Database
Federation. IBM Systems Journal, Vol. 41, No. 4, 578-
596.
Dogpile site. From http://www.dogpile.com/.
Metacrawler site. From http://www.metacrawler.com/.
Myriad Search site. From http://www.myriadsearch.com/.
IBM Lotus Notes/Domino site. From http://www-
142.ibm.com/software/sw-
lotus/products/product4.nsf/wdocs/noteshomepage.
Mahmoud, Q. H., 2005. Service-Oriented Architecture
(SOA) and Web Services: The Road to Enterprise
Application Integration (EAI). Sun Developer
Network Web site:
http://java.sun.com/developer/technicalArticles/WebS
ervices/soa/.
Weerawarana et. al., 2005. Web Services Platform
Architecture : SOAP, WSDL, WS-Policy, WS-
Addressing, WS-BPEL, WS-Reliable Messaging, and
More. Prentice Hall.
Web Services Architecture. From
http://www.w3.org/TR/2004/NOTE-ws-arch-
20040211/.
Chen et. al., 2005. Semantic Query Transformation for
Integrating Web Information Sources. In Proc. 7th
Int’l Conference on Enterprise Information Systems,
176-181.
Fu et. al., 2005. An Intelligent Event Adaptation
Mechanism for Business Performance Monitoring. In
ICEBE 2005, 2005 IEEE Int’l Conference on e-
Business Engineering, 558-563.
ICEIS 2007 - International Conference on Enterprise Information Systems
370