animals, architecture, art, asia, australia, autumn, baby,
band
C
, barcelona, beach, berlin, bike
C
, bird, birds,
birthday, black, blackandwhite
D
, blue, bw
C
, califor-
nia, canada, canon
B
, car, cat, chicago, china, christ-
mas, church
B
, city, clouds, color, concert, dance, day,
de
C
, dog, england, europe, fall, family, fashion, festi-
val, film, florida, flower, flowers, food, football, france,
friends
B
, fun, garden, geotagged, germany, girl, graf-
fiti, green, halloween, hawaii, holiday, house, india, in-
stagramapp
D
, iphone, iphoneography
D
, island, italia,
italy, japan, kids
C
, la
C
, lake, landscape, light, live
B
, lon-
don, love, macro
B
, me
D
, mexico, model
B
, museum, mu-
sic, nature, new
C
, newyork, newyorkcity
D
, night, nikon,
nyc, ocean, old
C
, paris, park, party, people, photo, pho-
tography, photos, portrait, raw
B
, red, river, rock
B
, san
C
,
sanfrancisco
D
, scotland, sea, seattle, show
C
, sky, snow,
spain, spring
C
, square
B
, squareformat
D
, street, sum-
mer, sun, sunset, taiwan, texas, thailand, tokyo, travel,
tree, trees, trip
C
, uk, unitedstates
D
, urban
B
, usa, va-
cation, vintage
C
, washington
C
, water, wedding, white,
winter, woman, yellow, zoo
Figure 6: By default, 110 out of 142 popular Flickr tags
(77.5%) are mapped correctly on a valid DBpedia resource
through TagNet (A score). Tags that need additional atten-
tion to resolve ambiguity are marked in bold and labeled
with a B, C or D score (see table 1).
and marked on figure 6. It shows that 110 tags re-
ceived an A score, meaning that they were correctly
mapped on a corresponding DBpedia resource in the
first hit. For example, the tag fall is resolved into
http://dbpedia.org/resource/Autumn and nyc maps on
http://dbpedia.org/resource/New York City. To de-
note the connection with DBpedia, we added the db
prefix to tags with an A score such that TagNet knows
which repository to use to map the tag on a URI.
For tags with a B score, additional detail should
be added to overcome ambiguity. For instance,
canon is in the first place known by DBpedia
as a city in Georgia, a priest, a list of topics re-
lated to Dutch history, etc, while in the context
of photography it refers to a company special-
ized in the manufacturing of imaging and optical
products. From the related DBpedia resource
(http://dbpedia.org/resource/Canon (company),
two isas can be extracted (company and
organisation) which give rise to a sematag
db:canon||company,organisation – encoded
in a Flickr-compatible format as discussed in
section 5.1 – that uniquely identifies the re-
source in TagNet. Similar, friends is recog-
nized by default as a sitcom while the resource
http://dbpedia.org/resource/Friendship is actu-
ally the best match for this tag’s meaning. The
Friendship resource has no specific isas, but by
including its name as an alias to a friend tag (i.e.
friend|db:friendship), TagNet can distinguish
between the different senses. As such, each DB-
pedia resource can be described unambiguously
by a human-understandable sematag that can be
dereferenced to a URI via TagNet and vice versa.
Note that we prefer to augment a tag with isas (if
available) over aliases derived from a resource’s label
since these specific aliases often tend to be spelling
variants of the tag name or informally refer to its isas.
For instance, an alias of canon in the sense of the
Japanese multinational would be canon (company).
Tags with a C or D score need extra attention. No
resources with a matching name exist in DBpedia or
they are not in line with the meaning of the tag. This
leaves us with two options: i) lookup the tag in a sec-
ondary repository or ii) replace the tag by a similar
tag or add A-rated aliases or isas to the tag. By relying
on WordNet as secondary repository, seven more tags
(band, bike, kids, new, old, show, washington)
were attributed an A or B score and thus could be up-
graded to sematags with a wn: prefix (e.g. wn:bike).
To clarify the semantics of the remaining 15
tags, we have to find at least one meaningful
isa or alias for each tag. For instance, the tag
me and iphoneography can be annotated with
db:person and db:blog isas respectively. Tags
like newyorkcity and sanfrancisco need an
alias that is spelled differently (e.g. db:nyc,
db:san francisco) or substituted by this alias,
while the tag instagramapp can be understood using
an instagram alias.
Tags like blackandwhite (and by extension also
typical Twitter hashtags such as #savetheplanet)
are more difficult to map on linked data as they de-
note a very specific property, a state of mind or ex-
pression which is hard to describe using formal se-
mantics. In summary, we showed that 127 out of 142
random tags (89.4%) could be mapped with minimal
effort on known concepts using DBpedia as primary
and WordNet as secondary vocabulary. By systemati-
cally enriching a tag with additional tags, a (sema)tag
becomes an alternate notation for a URI that scales
better to tag-based systems like Flickr, as it is hu-
man readable and supports free text queries (includ-
ing synonym and hypernym matching).
6 ARCHITECTURE AND
IMPLEMENTATION
TagNet is developed as a Java Web application, using
Servlet technology in the back-end and AJAX tech-
nology in the front-end. It offers a REST API and
TagNet:UsingSoftSemanticstoSearchtheWeb
185