(ZBW) is considering the opportunity of extending
its repository by including scientific blog posts from
the domain of economics and offer it to its users
alongside the standard research publications. Science
bloggers seek to make the most of DLs’ dissemina-
tion channels and reach higher audience and visibil-
ity, whereas DLs seek to complement their collections
and increase their value offer to its users. In the face
of increasing scientific blog contributions and adop-
tion in the scientific workflow, this is the single most
important motivation for this study.
2.2 Use Cases
Following are the main use cases that motivated our
research:
(i) Heterogeneous data integration: Blog collec-
tions do not adhere to a standardized metadata struc-
ture and often rely on different vocabularies from the
ones adopted by DLs. In a situation like this, a user
interested in resources in both DL and scientific blog
resources would have to query these collections sep-
arately, using the different vocabulary terms. Thus,
there is an opportunity to alleviate this situation and
combine these collections in a uniform ”query space”.
EconStor, our DL of choice, metadata are already
structured and represented in RDF. As a framework,
this representation is well equipped to handle combi-
nation of heterogeneous resources.
(ii) Semantic annotation of blog posts: More and
more resources are being published as LOD, benefit-
ing both publishers and consumers of those resources.
Making their resources machine-understandable en-
ables publishers increase the audience reach, includ-
ing additional (re)use from software agents. On the
other hand, LOD publishing enables consumers to
(re)use these resources in new scenarios not covered
originally by publishers. A final and important note
at this point: in case scientific blog post publishers
are interested in making their content available in this
way, they should not face or be concerned with any
technological barriers in the process.
(iii) Dataset profiling: Linking scientific blog re-
sources with relevant entries in external collections
and knowledge bases (KB) indexed using different
controlled vocabularies (CV) – thesauri and classifi-
cation schemes, in our case – is another value-adding
step for the end user. In this way, depending on the
external resource(s) it links to, we provide different
”profiles” for every blog post in our collection, en-
abling a more elaborate and rich (search) experience
to the user. The user potentially benefits from related
resources coming from different disciplines – eco-
nomics, social sciences, or agriculture, in our case;
retrieve additional information (and context) from a
KB such as DBpedia; include relevant resources from
a German-specific or international, multilingual col-
lection; etc.
(iv) Dataset analysis: Summarizing datasets by
offering useful statistics as exploration tips for users
is quite important. This especially holds for large
datasets that could prove challenging for users to
comprehend (e.g., identifying resources of certain
features, closer to their area of interest). Highly com-
mented/discussed blog posts (expressed via user com-
ments, shares, etc.), the most featured/covered sub-
jects in the collection, ”trending” subjects for a given
period of time (based on the number of blog posts
for a given subject), top contributing authors (or ”ex-
pert groups”) per subject/topic (based on the number
of blog posts that an author has for a given subject),
or, in the context of aligned CVs of different (linked)
datasets, relevant publications by authors in external
KBs, are just some of the available analysis options.
3 RELATED WORK
A lot of diversified but concrete research work has
been done on mapping relational data (RDB) to graph
representation, as well as publishing and integrating
heterogeneous collections, including social web data.
(Auer et al., 2010) present common motivations for
representing RDB in Resource Description Frame-
work (RDF) data model; the use cases for integrating
RDB with structured sources or existing RDF on the
web (Linked Data) correspond to a great extent with
our motivation for this work. Motivated by semantic
annotation of dynamic web pages and mass genera-
tion of LOD data, (Spanos and Mitrou, 2012) survey
the proposed approaches for mapping and integrating
RDB content to/with that published as LOD, whereas
(Powell et al., 2010) demonstrate fusing library and
non-library data from disparate resources, relying on
RDF as a common data model and using graph-based
analysis and visualization to generate useful informa-
tion on the resulting data.
Some contributions focus on reusing and en-
riching social (user generated) content from exter-
nal resources, such as LOD Cloud, for example:
(Holgersen et al., 2012) in their research consider
bibliographic-related data (in the form of comments
and ratings for books); (Hu et al., 2013) focus on
social content from publication submission and re-
view process of a journal, in the form of review-
ers’ comments, editors’ decisions, author replies, etc.;
whereas (Passant et al., 2010) reuse collaboratively-
built knowledge in the enterprise, contained in differ-
Bringing Scientific Blogs to Digital Libraries
285