Table 1: Bias of results to existence/absence of a model.
With a model Without a model
Query Image
Top Results
automatically remove those before the visual
descriptors are extracted.
In Section 2, we refer to some related work in the
fields of image retreival, image indexing, fashion
retrieval, and skin detection. In Section 3, we
describe the system into which we apply skin
removal and in Section 4, we describe our skin
removal technique. In Section 5, we present our
judging methodology and results. Finally in Section
6, we conclude the paper and discuss future work.
2 RELATED WORK
Although there is a lot of work being done on image
retrieval in general, there is little work done on the
specific domain of clothing retrieval. Recently,
Grana et al, 2012, presented their work on fashion
retrieval based solely on color using a color bag of
words signature. They describe garments by a single
dominant color and therefore focus only on images
with a unique color classification. Arguing that
uniform color space division and color space
clustering don’t reflect fashion color jargon, they use
color classes that label garments in their training set
to split the color space in a way that minimizes error
between these color classes. They use automatic pre-
processing to remove skin and mannequin parts and
then use GrabCut (Rother et al., 2004) to remove
clothing items that are not the main garment
depicted in an image. However, they don’t provide a
description of their skin removal approach in this
pre-processing step and its impact on retrieval.
Skin detection has been approached by
researchers with different methodologies including
explicit color space thresholding and histogram
models with naïve Bayes classifiers which we
discuss later (Kakumanu et al., 2007). However, we
noticed that the precision of most of the proposed
techniques is not high. That is mainly because those
techniques depend on analysing the images in the
visible color spectrum without any attention to the
context. This is not optimal because many factors
(like illumination, camera characteristics, shadows,
makeup, etc…) affect the skin color significantly. A
workaround is to move the problem to the non-
visible color spectrum (Infra-red range), in which
the skin color seems to be more consistent across
different conditions. However, the equipment
needed is more expensive and usually not available
in consumer devices.
3 EXISTING SYSTEM
We integrate our skin removal component in an
existing clothing retrieval system (running on a
commercial search engine) which we briefly
describe in this section. Figure 3 shows a high-level
overview of the system. In the coming subsections,
we briefly describe the features extracted. In the
following section, we describe our skin removal
component and how it fits in this system.
The features generated for each image are
contours to capture shape, and a single RGB value
that captures the most dominant color. The image
indexing and retrieval system is based upon the
Edgel index by Cao et. Al, 2011.
3.1 Visual Representation
When a query image is submitted, a list of candidate
similar edges is retrieved from an inverted index
(Sivic and Zisserman, 2003). This list is ranked
based on a composite score of the edge similarity,
salient color similarity, and textual description
similarity. Our interest is in improving the edge
similarity score by removing unwanted edges and
therefore improving this metric’s semantic quality.
By removing such edges, we also potentially impact
the salient color extracted as motivated in figure 2.
3.1.1 Image Pre-processing
To reduce computation and storage costs while
preserving information, the image is first downsized
to a maximum dimension of 200 pixels (Cao et. Al,
2011). The downsized image is then segmented
using GraphCut (Felzenszwalb and Huttenlocher,
2004). The output is a segmented image where each
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
694