via OAI-PMH to harvest the available metadata.
Regarding this scenario, the fact that scientific
production of institutional articles end up scattered
in scientific journals consists a serious problem. In
this sense, access and identification of this scientific
knowledge by the community (and even by the insti-
tution itself that produced it) is often hampered. Like-
wise, institutions also lack information about how
much of the teaching staff is aware of the availabil-
ity of their science production free of all restrictions
on access. Moreover, another difficulty is identify-
ing open access articles when this information is not
found on metadata.
In the field of scientific publication, open access
means publications on the Internet that allow reading,
copying, distribution or re-use for lawful purposes
without technical, financial or legal barriers—as well
as guaranteeing the author’s moral and patrimonial
rights (Open Society Institute, 2002). The philoso-
phy behind open access is a trend that has been ob-
served in recent years towards the use of tools, strate-
gies and methodologies to communicate new scien-
tific research.
In this context, this article proposes a methodol-
ogy for loading open access articles on digital repos-
itories obtained through information extraction from
curricula of an institution’s researchers.
Brazilian researchers have their scientific produc-
tion registered at an academic national database, the
Lattes Platform (available at: http://lattes.cnpq.br/).
For implementation purposes, a study divided into
4 parts was realized: (1) gathering and processing of
metadata from a researchers’ curricula database; (2)
development of a script for collecting open access sci-
entific articles; (3) selection of a software for loading
and converting metadata to the Dublin Core format;
and (4) populating a digital repository by importing
the acquired data. For this study purposes, the IDR of
a Brazilian institution was used.
This work is organized as follows: the second sec-
tion talks about metadata, its characteristics and im-
portance in indexing digital objects in digital environ-
ments; persistent identifiers like DOIs and handles,
their uses and finalities for preservation of digital ob-
jects on the long term are contextualized in the third
section; the fourth section brings the concepts about
Digital Libraries, such as their history, importance
and characteristics; the fifth section describes the pro-
posed method and analyzes its application; and, fi-
nally, the main points of this paper are summarized
and suggestions for future works are presented in the
conclusion.
2 METADATA
Metadata are information related to a stored resource,
either physical or not, that not only identify and de-
scribe it, but also document its behavior, function and
use, as well as its relationship to other digital objects
and how it should be managed. “Metadata are struc-
tured in the form of text and keywords and gener-
ally contain direct information, such as author name,
creation date, subject, but can also be complex and
harder to define, as the opinion consensus of various
people on the same book” (Langiano, 2005). Thus,
metadata prove to be essential to facilitate discovery
of relevant content in digital libraries.
Furthermore, an item or object available in dig-
ital media should survive the successive generations
of hardware and software. Given such complexity
and the importance in designing digital objects’ meta-
data, a study was proposed to categorize them into
five types (Baca, 1998):
• “Administrative: used in the management and ad-
ministration of information resources, such as ver-
sion control and copyright information;
• Technical: related to the operation or behavior
of system metadata, for example, scanning pro-
cesses;
• Descriptive: used to describe and identify re-
source information, for example, specialized in-
dexes and search aids;
• Preservation: related to the preservation of infor-
mation resources, for example, policies relating to
the backup of digital objects;
• Use: related to the level and type of use of infor-
mation resources.”
In this article, descriptive metadata are used for
identification of bibliographic content of scientific
works.
2.1 Metadata Schema
Metadata schemas are sets of elements, designed for
a specific purpose that are used to describe an infor-
mation resource. The elements’ definitions or mean-
ings are known as the schema’s semantics, and the
values of a given element are its contents. Metadata
schemas generally specify the names of elements and
the corresponding semantics (Say
˜
ao, 2007b). Meta-
data should be carefully planned and support interop-
erability with other digital libraries, hence facilitat-
ing the location and use of digital objects. Metadata
schemas and metadata standards exist to enable the ef-
fective sharing of resources between institutions and
users.