A SEMANTIC SEARCH ENGINE FOR A BUSINESS NETWORK
A Personalized Vision of the Web applied to a Business Network
Angioni Manuela, Emanuela De Vita, Lai Cristian, Marcialis Ivan, Paddeu Gavino
and Tuveri Franco
CRS4, Center of Advanced Studies, Research and Develepment in Sardinia, Parco Scientifico e Tecnologico
Ed.109010 Pula (CA), Italy
Keywords: NLP, Text Categorization, User Profiling, Semantic Search Engine.
Abstract: The Web’s evolution during the last few years shows that the advantages from the users’ point of view are
not so macroscopic. Despite information is still the primal element, is ever more evident the need to redefine
the information paradigm so that the net and the information can become really user-centric by an inverse
process that brings the information to the user and not more the user to information. Define new tools is
needed to create a privileged window of observation on information and knowledge: each user with his
specific interest. Not more a single available space of information but shared data for everyone. What each
user needs is a specific private space of information according to his point of view, his way to classify and
manage the information, related to his network of contacts in the way each person choose to live the Web,
the net and the knowledge. In this paper we illustrate a part of a project named A Semantic Search Engine
for a Business Network where the introduction of Natural Languages, user profiling, automatic information
classification according to users’ personal schemas will contribute to redefine the vision of information and
delineate processes of Human-Machine Interaction.
1 INTRODUCTION
The way of interacting and the modality to access
the information is continuously changing. It is going
more and more toward tools able to follow and assist
the user in its networking activities through the use
of technologies related to natural languages, the
classification of the information and the user profile
(Marcialis and De Vita, 2008). In this scenario the
changes carried out by the great innovators in the
field of information processing are emerging.
Google is still the frontier of search engines, but
several efforts have been completed in order to
exceed it, such as Bing, who has obtained good
results regarding search suggestions and research
results with natural language.
Several attempts to reduce the time consuming of
online searches have been proposed and tested
through meta search engines that simultaneously
search on more search engines or with new features
specialized in searching on social network (Mislove,
2006).
The introduction of the query in natural language
is a common element that is already prefiguring the
advent of the Web 3.0 with tools such as the
computational knowledge engine Wolfram Alpha,
able to answer queries by means of a vast repository
of data organized with the help of sophisticated
Natural Language Processing algorithms or
Aardvark (Horowits and Kamvar, 2010) that allows
users, experts on certain topics, to answer to queries
made by other users in a more efficient mechanism
for online search. Another example is Twine
(Wissner and Spivack, 2009), able to improve the
relevance of results by means of filters that try to
reduce the noise due to less relevant answers.
The passage from the unstructured to the
structured information through the use of ontologies
has not produced the expected innovation in search
engines due to the lack of tagged resources.
New tools able to reduce or even to eliminate the
search phase performed by the user are needed, but
certainly commercial search engines, that make
profit by the number of access to their pages, are not
interested in produce them.
The rethinking of search engines involves the
emerging of some questions about the method of
search through repeated queries and their successive
475
Manuela A., De Vita E., Cristian L., Ivan M., Gavino P. and Franco T. (2010).
A SEMANTIC SEARCH ENGINE FOR A BUSINESS NETWORK - A Personalized Vision of the Web applied to a Business Network.
In Proceedings of the 5th International Conference on Software and Data Technologies, pages 475-480
DOI: 10.5220/0003040304750480
Copyright
c
SciTePress
refinement. Someone thinks that search engine
should be considered “only a primitive form of
decision support” (Spivack, 2010). So, the vision of
a Web where search engines are able to provide
results without direct questions from users,
anticipating their needs, could be now plausible. A
Web in service of the user, automatically informed
by the system with suggested resources related with
his life style and his common behaviour without the
need to ask for them.
In this paper we illustrate a section of a project
named A Semantic Search Engine for a Business
Network where some of the ideas previously
described are applied. It involves the development of
a business network able to create a point of contact
between the academic and research world in general
and the productive one, with the aim of encourage
the cooperation and the sharing of ideas, of different
point of views, information material or needs, and in
order to support the productive world and decision-
making connected with it.
The infrastructure will be designed as a
distributed architecture, with regard to information
and existing content and by the development of tools
thought in order to put users into the center of
information, giving them a privileged window of
observation on information and knowledge applied
to a specific application field.
The remainder of the paper is organized as
follows: Section 2 describes the project in a general
way, while Section 3 discusses in a detailed way the
above questions. Finally, Section 4 draws
conclusions.
2 GOALS OF THE PROJECT
In order to focus better on purposes and objectives
of the project, some considerations are required.
Web is changing. The way to access the
information is not the same of some years ago.
Social networks, blogs, RSS and new features in
search engines are all news in the ICT context if
compared with some years ago. The trend,
hopefully, is the definition of new tools developed in
order to follow the user in his activity and support
him with the automatic generation and delivery of
contents without his explicit request and according
to his interest. The Web depending on user needs
and interests. Not more a single available and shared
space of information for everyone, but a specific
private space of information available according to
the user point of view, his way to organize, classify
and manage information, related to his network of
contacts in the way each person choose to live the
Web, Internet and the knowledge. Currently, the
management of information is a key question in the
Web.
The automatic categorization of information
through a predefined taxonomy, organized in a
hierarchical category system, is often a restrictive
and forced path. The same resource could be
classified in different way from different people and
the same user could place the same page under
different categories according to the reading context
or to the content he is interested in. The
classification of a document is, as well, depending
by the personal culture, experience and context of
life. Moreover, documents are often realized using
heterogeneous contents, talk about several topics and
are obviously related to several categories.
Otherwise, with the Web 2.0, folksonomy, social
tagging and social bookmarking place the user as
start point in a categorization work where each user
labels resource. This step moves from a hierarchical
logic to a more simpler way where all tags are at the
same level.
Passing from the user management of
information to an automatic one, a classification
system should be able to categorize information
according to user preferences and to relate his
classification to a common set of categories based
on a predefined taxonomy.
By means of a such categorization tool, each user
manages in a personal way his bookmarks, accedes
to a quantity of Web sites, about scientific, news,
entertainment or other topics, selecting, choosing
and categorizing through the system. The system is
able to manage a flow of information coming from a
big set of predefined channels and updatable
depending on the user preferences. Channels should
be social networks, blogs, RSS services, news
services, Web sites and search engines too, selected
by the user. The system categorizes information
from these channels delivering contents that meet
user preferences by means of a match algorithm
based on user profile and document classification.
The user can see categories associated to each
resource labelled and ordered according his schema.
The vision of the Web and of search engines, as
described below, is applied to a project in starting
phase and will converge in a system able to support
and follow users in their activities. In particular the
idea behind the project is the realization of a
business network able to guarantee the match and
the cooperation of academic and research world
with the productive one in order to sustain related
production and decisional processes.
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
476
3 THE CLIENT APPLICATION
The semantic search engine is designed as a support
tool for the user, an active assistant able to give in
"real time" references for the use of the information,
reporting as more interesting the information that
might match with the personal interest specified in
the user profile.
In the project have been identified two types of
main users: the company and the generic user,
including employees, researchers, professionals, and
in general people having specific interests and skills,
according to the resources associated with them and
emerging by their daily activities, that the system is
able to track.
The business network is a point of contact
between the academic and research world in general
and the productive one. The aim is to encourage the
cooperation and the sharing of ideas, of different
point of views, information material or needs, and to
support the productive world and decision-making
connected with it.
The system manages the user profile in order to
control how the user preferences evolve during
sessions of work. Information is monitored at time
interval and new sessions can modify user
preferences. The system starts with a predefined user
profile and evolves subsequently, using text
categorization tools in order to categorize resources
that are actually read, saved, commented. Only in
these cases the system will modify the user criterion
of classification for subsequently analysis.
The system follows step by step the evolution of
user interests and suggests him, through the analysis
of his profile, topics of interest, documents, contacts,
etc, according to his interests. Moreover, the system
is able to associate user profiles to companies or
project profiles, automatically generating in real-
time networks of expertise based on several
configurable parameters and requirements.
Figure 1 shows a general description of the client
application and the flow of the data coming from
several distributed sources, such as social networks,
blogs, RSS pages, visited Web pages, etc. The client
side of the system is composed by four modules: the
User Profiling Module, the Collaborative Filtering
and Recommendation System Module, the
Classifier and the Matching Module, each
responsible of the functionalities described below.
The level of communication between the modules
and the distributed information is regulated by a
layer that receives the data coming from the sources,
and after an analysis and an opportune elaboration,
is able to deliver to each module the portion of
Figure 1: The client application.
information that they are able to manage. Each
module performs his activity, sometimes
collaborating with other modules, and the result of
the process is saved on a database.
The interface allows queries in natural language
and presents results according to the user profile and
preferences.
More details of the modules involved in the
system are described below.
3.1 The Business Network
The business network defines a communication level
between users belonging to a community. The
business network facilitates the sharing of
knowledge, ability, expertise, skills, interests and
resources between users belonging to the community
that need or are interested in specific topics. In fact,
it is not always easy to rise these feature, especially
the immaterial expertise. But even publications or
ongoing or past projects in which someone is
involved, are often dispersed between public
databases, or can be found only in the intranet of
each company, or sometimes exists only in the head
of someone, and it is not easy to explicit them. All
the members of the community are linked together
by the net of their skills: they are both depository of
expertise in the service of users who need it: on the
other hand they can need skills (papers, suggestions,
projects, contacts) that other members can make
available. This can be achieved with the
development of an application, running on the
computer of the user, that filters his activities and
modifies his status, walls and links of the social
network that the user subscribed, according to his
permissions. Simultaneously records the activities
A SEMANTIC SEARCH ENGINE FOR A BUSINESS NETWORK - A Personalized Vision of the Web applied to a
Business Network
477
on the user database.
The application shares this data with the other
users that subscribed the community so that each
user, according to the settings and the permissions,
should know which resources have been visited,
from whom and when. The application
communicates these information to a plug-in
installed on the user's browser that alerts the user
and updates the visualization of information
according to his preferences.
3.2 Management of the Social Network
As said before, users involved in the project are
organized as a community, configured according to
their activities, through the management and by
reporting organized content, information
dynamically updated and personalized according to
the specific user profile.
The system will provide access to sources of
shared documentation, to monitoring data, to support
tools for sharing information between users, to
networks of contacts explicitly specified in the
community.
The architecture of the general platform is still
under discussion.
3.3 Search Engine Module
The search engine module is contained in the Data
Management Module, still under definition. The
search engine indexes information coming from data
sources and manages information related to the
users, communities, companies, events, etc.
3.4 User Profiling
User profiling is a crucial process of the system
because it has to define the user's interest, allowing
the collaborative filtering and the recommendation
tools to select and send information useful for the
user itself. The module is able to classify and
manage user information through the analysis of the
resources he visited: the registration to rss resources,
blogs, to social networks and the associated map of
contacts, the collection of feedback, etc.
A profile for both users, companies and
researchers, is defined creating in such a way a
history depending on their activities and behaviour.
So, the system will be able to identify user
requirements and to predict its future behaviour and
interests, in order to automatically propose resources
useful to its activities without the need to search for
them.
Data collected in this way are used by the system to
find similarities, complementarities and links
between companies and researchers, thus facilitating
the match between supply and demand, particularly
for intangibles such as interest, expertise, know-
how.
The user should be able to access to its profile in
order to check the reliability of the image that the
system is bringing out, providing a positive or
negative feedback to the matching proposed by the
system.
3.5 Collaborative Filtering
and Recommendation System
During his activities, the user is supported by a
module that helps him through two very important
features: a collaborative filtering (De Vita et al.,
2008) and a recommendation system. This module
filters information by means of parameters based on
the user preferences and his profile and gives advice
to the user for news regarding communities and
network activities that should be of interest. Advices
are about:
New activities
Users having similar interests
Companies having similar profile
Researchers having similar profile (based on
their curriculum vitae)
Events of the network: workshop, conferences
Documents, papers, notes, projects, reviews
classified that match users interests.
Announcements of competition, calls, etc
By means of the indications given by the user to the
system it is possible to refine the profile.
3.6 Data Categorization
The system, with the user profile module, compares
user profiles to company profiles through data
categorization. It matches similar profiles, compares
curricula of the user with request coming from
companies, filters news and contents coming from
the search engine working on the semantic of texts.
The classifier is based on a hierarchy of
categories proposed by WordNet Domains (Magnini
et al., 2002). These categories are the set of starting
used by the system for the text categorization of
resources. The user has the possibility to confirm the
categorization proposed or to redefine it with labels
not presents in the original taxonomy or to move the
resources to better defined values of the involved
categories. The user can also rename categories. The
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
478
system needs to keep references with new names
and different values given by the user to resources.
In order to modify this kind of information a
feedback from the user is necessary. The user
instructs the system until the reach of a number of
documents big enough to be representative for each
category. This allows eventually to pass from a
semantic classifier to a faster statistical one.
The classifier performs a semantic
disambiguation through the identification of relation
between terms in order to identify composed terms,
word sense disambiguation, name entities,
geographic location.
The main phases are:
Parsing of the text of resources (Web pages,
documents, notes, etc)
Analysis and syntactic disambiguation (Sleator
and Temperley, 1993) (Liu, 2004)
Semantic disambiguation and identification of
real senses of words in sentences by means of a
density function (Addis et al., 2009)
Identification of name entities, geographic
locations
Classification of the textual resource by
categories and values (Angioni et al., 2008a)
Identification of semantic relations between
concepts (Angioni et al., 2008b)
3.7 Matching Module
The module is responsible to perform the matching
between the information coming from the several
data sources and by the users’ profile, identifying
those of real interest for each user.
It is able to organize data coming from users and
companies profile, managing the textual resources,
such as notes, papers, comments, profile data,
previously analyzed by the classifier and aggregate
the information.
Finally it send notifications to users and the
information as elaborated by the specific algorithm
of matching.
3.8 Data Sources
As we said, the system will be able to retrieve
information from several textual and multimedia
sources, and from Web services, even if
conditionally.
Figure 2 shows in a summary way the data
sources and a module named Information Wrapper
that uniforms data coming from data banks (DBLP,
ACM DL sites or institutional databases) and, under
particular conditions, from social networks.
Some sources such as news services, social
networks, blogs, RSS feeds, will be selected by the
user or they will be automatically proposed by the
system, by means of the preferences expressed by
default or defined by the user profile and by the
interests identified by the viewed pages.
Figure 2: The information wrapper.
Other content will consist of personal and corporate
profiles extracted from the HTML home pages,
abstracts of scientific publications, bibliographies
extracts from data banks.
Initially the contents are classified according to a
predefined taxonomy. Then the user requests,
aggregates and organizes the information coming
from the news services according to its interests.
The system therefore has to be able to manage
only the resources having a significant content for
the user, eliminating in such a way the redundancy
of the received information, the repetitions and the
duplications, and avoiding waste of time to read
unnecessary and irrelevant contents or to search
information among all the resources available on the
Web.
Finally, the system has to be able to add
"meaning" to the actions performed by the user,
creating an area of personalized and organized
information, a powerful guide able to predict its
tastes and needs.
4 CONCLUSIONS
The introduction of Natural Language Processing in
search engines, the user profiling, the automatic
classification of information according to the
personal schemas of the users are redefining the
vision of information on the Web and are delineating
new processes of Human-Machine Interaction.
Moreover the deployment of new services and
tools of the Web as Social Networks, RSS feed and
of new users’ supports based on NLP are defining
A SEMANTIC SEARCH ENGINE FOR A BUSINESS NETWORK - A Personalized Vision of the Web applied to a
Business Network
479
new evolutionary scenarios and creating new
expectations for the Web.
In this paper we illustrated a starting project
named A Semantic Search Engine for a Business
Network that defines a scenario where all above
tools converge in a system that, in our intention, will
put the user into the center of information giving
him a privileged window of observation on
information and knowledge.
The proposed approach aims at the development
of a business network able to create a bridge
between the academic and the research world in
general and the productive one, allowing a point of
contact between users’ needs on one hand and
available skills, expertise and ability on the other.
The project aims both at implement the features
described and at define and implement the described
scenario. A validation to support the value of the
expressed ideas will be one of the goal of the above
mentioned project, where experimental results will
be product.
REFERENCES
Addis, A., Angioni, M., Armano, G., Demontis, R.,
Tuveri, F., Vargiu, E., 2008. A Novel Semantic
Approach to Create Document Collections. In Antonio
Palma dos Reis, editor, Proceedings Of Intelligent
Systems And Agents Pages 53-60, 2008. IADIS Press.
Selected for the best paper award.
Angioni, M., Demontis, R., Tuveri, F., 2008a. A Semantic
Approach for Resource Cataloguing and Query
Resolution. Communications of SIWN. Special Issue
on Distributed Agent-based Retrieval Tools, 5: 62-66.
Angioni, M., Demontis R., Deriu, M., Tuveri, F., 2008b.
SemanticNet: a WordNet-based Tool for the
Navigation of Semantic Information. In A. Tanacs,
D.Csendes, V. Vincze, C. Fellbaum, and P. Vossen,
editors, Proceedings Of GWC. University of Szeged.
De Vita, E., Deriu, M., Marcialis, I., Paddeu, G., 2008.
Personalization and Collaborative Filtering for
Information Retrieval on the Web. Communications of
SIWN. Special Issue on Distributed Agent-based
Retrieval Tools, 5(-): 51-56.
Marcialis, I., De Vita, E., 2008. SEARCHY: An Agent to
Personalize Search Results. A. Mellouk, editor, Third
International Conference On Internet And Web
Applications And Services. Volume -. Pages 512-517.
IARIA. Institute of Electrical and Electronics
Engineers (IEEE). Authorized distributor of all IEEE
proceedings.
Horowits, D., Kamvar, S., 2010. The Anatomy of a Large-
Scale Social Search Engine. Submitted to WWW2010,
Raleigh, NC, USA.
Liu, H., 2004. MontyLingua: An end-to-end natural
language processor with common sense, viewed 30
March 2010,
<http://web.media.mit.edu/~hugo/montylingua>.
Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.,
2002. The Role of Domain Information in Word Sense
Disambiguation. Natural Language Engineering,
special issue on Word Sense Disambiguation, 8(4), pp.
359-373, Cambridge University Press.
Mislove, A., Gummadi, K., Druschel, P., 2006. Exploiting
Social network for Internet Search. In Proceedings of
the 5th Workshop on Hot Topics in Networks, Irvine,
CA.
Sleator, D. D., Temperley, D., 1993. Parsing English with
a Link Grammar. in Third International Workshop on
Parsing Technologies.
Spivack, N., 2010. Eliminating the Need for Search-Help
Engines, viewed 30 March 2010,
<http://www.novaspivack.com/uncategorized/eliminat
ing-the-need-to-search>
Wissner, J., Spivack, N., 2009. Case Study: Twine. In
W3C, Semantic Web Use Cases and Case Studies,
viewed 30 March 2010,
<http://www.w3.org/2001/sw/sweo/public/UseCases/T
wine>
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
480