feature extraction and the annotation algorithm used
for label assignment. Section 4 reports the results
of evaluation on the Li photography, IAPR-TC12 and
ESP datasets. Section 5 concludes the paper and gives
some possible directions for further research.
2 RELATED WORK
The AIA approaches can be divided into three groups:
generative, discriminative and nearest neighbours
models (Murthy et al., 2015). We will concentrate
mainly on the third type of methods because they are
most important for our work. We also mention some
methods utilized in CBIR that are relevant for our pa-
per.
One of the earliest approaches based on texture
features was proposed in (Manjunath and Ma, 1996).
The authors propose using Gabor wavelets to con-
struct features for texture representation. A database
of 116 texture classes is used for evaluation and
it is shown that Gabor wavelets outperform previ-
ously reported results achieved with different types of
wavelets.
An interesting system called “SIMPLIcity” was
presented in (Wang et al., 2001). The image fea-
tures are extracted using wavelet-based methods. It
classifies the images into semantic categories such as
“textured”, “photograph”, etc. The classification is
intended to enhance image retrieval performance.
Blei and Jordan (Blei and Jordan, 2003) proposed
the “Corr-LDA” method that is based on latent Dirich-
let allocation (LDA) (Blei et al., 2003). It is a gen-
eral method that can handle various types of annotated
data. it is built upon a probabilistic model represent-
ing the correspondence of data and associated labels.
An approach based on LBP features was proposed
in (Tian et al., 2008). Histograms of LBP values are
created in this method and support vector machines
(SVM) are used as classifier. It is applied on medical
images categorization and annotation.
A family of baseline methods based on a KNN
model was proposed in (Makadia et al., 2008) and
(Makadia et al., 2010). Simple features such as color
histograms in different colour spaces and Gabor and
Haar wavelets were used for image representation.
These features are combined using two schemes to
obtain final measure of similarity among images. The
label assignment is based on a novel label transfer
method. The authors proved that such a simple ap-
proach can achieve very good results and even out-
performs some more sophisticated methods.
Another example of a method using KNN clas-
sifier is presented in (Guillaumin et al., 2009). The
method called “TagProp” is a discriminatively trained
nearest neighbours model. It combines several sim-
ilarity metrics. Using a metric learning approach it
can efficiently choose such metrics that model differ-
ent aspects of the images. The method brought a sig-
nificant improvement of state-of-the-art results.
A recent study of Murthy et al. (Murthy et al.,
2015) employs convolutional neural networks to cre-
ate image features. Word embeddings are used to rep-
resent associated tags. State-of-the-art performance is
reported on three standard datasets.
Another approach was proposed in (Giordano
et al., 2015). This work concentrates on creating
large annotated corpora using label propagation from
smaller annotated data. It is inspired by the data-
driven methods that rely on large amounts of anno-
tated data. It should help to solve the issue of anno-
tating new data which is a labour intensive task and
it is not possible to do it manually. It is generally a
two-step KNN model. The features are extracted us-
ing histograms of oriented gradients (HoG) (Dalal and
Triggs, 2005).
For more comprehensive survey of AIA tech-
niques pleas refer to (Zhang et al., 2012).
3 IMAGE ANNOTATION
METHOD
This section describes the image annotation method
which can be divided into three steps.
3.1 Feature Extraction
Given an image we first have to perform the
parametrization. The usual scheme in the texture de-
scriptor based approaches is to divide the image into
equally sized rectangular regions. A histogram of de-
scriptor values is then constructed for each region. In
our work, we use regular grid that divides the image
into cells× cells regions. In the rest of this work, we
will use the parameter cells to specify the division of
the image. The set of resulting histograms represents
the image. It can either be treated as one long vector
created by concatenation of the particular histograms
or let the histograms be independent. The descriptors
used for this task are described in Sections 3.1.1, 3.1.2
and 3.1.3.
3.1.1 Local Binary Patterns
This method was proposed in (Ojala et al., 1996). It
computes its value from the 3×3 neighbourhood of a