General search engines, such as Google and Yahoo,
provide full-text search of resources available on the
Internet. The problem with using general search
engines is that its efficiency is unpredictable. Often,
it takes much more time than expected to find
anything useful and requires sufficient knowledge
and experience to come up with the right keywords
that will produce the expected search results.
Moreover, although there is great amount of
resources to search from, those with sufficient
quality are much less. There are search engines
devoted to specific fields of interest. For example,
DBCLS is a search engine specialized in life science
(Codase, 2010). Also there are web sites dedicated to
code sharing, called code search engines (CSE) sites,
such as, Google Code Search (Google, 2010),
Koders, Codase (Koders, 2010). CSE sites allow
users to post programs along with tags that represent
the type of programming language, additional user
specified tags (keywords) and text and are
customized for sharing programs. However, other
than the tag that represents language types, they
basically perform a text search on the tags, text, and
program and suffer from the same efficiency/quality
problem of general search engines. There is a type of
CSEs that collect programming knowledge in the
form of code snippets (DZone, 2010) (Snipplr,
2010). These sites allow users to share small parts of
programs and to retrieve them as templates for
coding. However, even these sites do not provide
any additional means for supporting a more efficient
search.
We propose a system in which knowledge related
to program development can be shared efficiently
among a group of developers. The aim of the system
is to increase the productivity of program
development by providing access to a shared
repository of knowledge useful for developers. This
repository stores knowledge in units of logical
fragments. A logical fragment can represent varying
sizes of knowledge, from a few lines of code to a
few files of programs. The knowledge is not limited
to program code and can represent configuration
files, documents, etc. A logical fragment is annotated
by a set of tags that describe its function, called the
function tag, and a tag describing its environmental
attributes, called the project tag.
The paper is organized as follows: the next
section presents the background and direction of our
approach. The following section describes the
specific method used to register and retrieve
knowledge in our system. The final section provides
summary and presents future work.
2 BACKGROUND
This research is concerned with knowledge
management within an organization to support
software engineering. We are especially interested in
increasing the productivity of programmers while
increasing the quality of the code that they produce.
Knowledge management is a means of solving
business challenges and many methods have been
practiced (Alavi and Leider, 2001). The first
generation systems were focused on the documents
created by users. The second generation systems
focused on the people who possess knowledge.
These systems were realized as web sites that
provide a public place where questions on particular
topics could be posted and people with the
knowledge could answer. A particular group of web
sites that support interaction and exchange of
knowledge between people sharing common
interests (Social Networking Sites) were also
introduced. The first generation systems suffered
from the high-cost of accumulating information and
the second generation systems suffered the difficulty
of evaluating the effects since effects were difficult
to visualize. In more recent attempts, methods that
try to evaluate the contents that each user possesses
and to connect them have also emerged.
Existing CSEs can be classified into two types.
In the first type, the system parses various files,
which were registered as one set of files related to a
project, and performs searches. In the second type,
lines of code are registered individually as snippets.
In both types, in addition to the code, additional
information provided by users and the system are
subject to search. For example, types of
programming languages or licenses are used by
some systems. Based on a finding that many
programmers perform search related to API
(application programming interface),
troubleshooting, implementation, development tools,
language syntax and semantics, a system to assist
problem solving by automatically collecting and
extracting significant information from web pages
with sample codes, Java archive (JAR) files, Java
documents (JavaDoc) pages has been proposed
(Thummalapenta and Xie, 2007). There are also
many approaches related to component-based
software engineering that aim at solving the problem
with programming efficiency and quality by reusing
software as components (Heineman and Councill,
2001 ).
Compared to current CSEs, the system we
propose introduces a greater structure to the format
in which programs and related information are
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
500