suitable for estimating the popularity of a draft design
for fashion designers, without the need of evaluating
the responses from social media platforms (which can
lead to design copies) or comparing with a large
quantity of other items (which is computationally
inefficient and raises fairness issues).
The rest of this paper is arranged as follows.
Section 2 summarizes the related works and
highlights their current limitations. Section 3
elaborates on how the method evaluates the image
popularity of an outfit, and Section 4 explains our
model architecture. Section 5 evaluates the model
using datasets and discusses the effects of several of
our design choices. The study is finally summarized
in Section 6.
2 RELATED WORKS
In this section, we discuss two areas of literature that
are related to the current study, namely, fashion
popularity prediction and fashion recommendation.
2.1 Fashion Popularity Prediction
In predicting whether an outfit can be trending or
popular, the most commonly used indicator is the
number of likes received in social networks. In social
media, users that find a post interesting can leave
“likes”. This measure often exhibits a long tailed
distribution, and thus the common practice is to
perform a logarithmic transform before further
processing, as in Simo-Serra et al. (2015) and Lo et
al. (2019).
Simo-Serra et al. (2015) investigated the
relationship between “fashionability”, defined based
on the number of likes received by a post on a
fashion-dedicated social media network named
Chictopia, and the information from the post. In their
work, they created a Conditional Random Field
model that predicts “fashionability” by using a score
from 0 to 10 on factors ranging from the attributes of
the clothes (e.g., color, garment) to contextual
information (e.g., the follower count and location of
the poster). Although this previous study laid the
foundation of many fashion popularity prediction
models, the measure relies heavily on the tags
provided by users but neglects the images themselves.
As such, several more intricate visual patterns on the
clothes can be missed out in the prediction. Wang et
al. (2015), given a pair of garment images, report
which one is expected to receive more likes on social
media platforms. The method considers the
appearance and visual attributes of the outfit and
predicts which image can receive more likes by using
classification and feature extraction. Based on the
classification labels and deep features of the image,
the method deduces which one is more “attractive”
using Sum Product Network. While this previous
work provides a means to compare fashion images,
the method becomes inefficient when the number of
images increases due to the required pairwise
comparison. Lo et al. (2019) feature a model that, in
addition to the deep image feature and garment type,
considers the chronological order of social media
posts. Thus, this sequential model accepts—instead
of a single image and its meta-data—a series of
images and their garment types, ordered by time and
with the number of likes known for all images except
the last, which the model aims to predict. However,
all the abovementioned works ignore the sales records
that reflect the attractiveness of products to the
market.
2.2 Fashion Recommendation
Another area of related work, albeit distantly, is
fashion recommendation. The goal of this type of
system is to recommend an outfit that is in line with
trends, or in which users may be interested. Simo-
Serra et al. (2015) suggest the types and colors of
clothing and accessories that the poster may have
worn by formulating the recommendation as a
maximization problem of “fashionability” score. As
their model predicts scores using the clothing
attribute labels, the system tests each garment-related
attribute and finds those with the best scores.
In enabling personalized recommendations, these
systems consider user preferences in the form of
ratings to other outfits or purchase history, in addition
to image features and/or their description. The
simplest systems can be designed using collaborative
filtering techniques, such as Singular Value
Decomposition. One sophisticated model is that of
Kang et al. (2017), who extract the image features
using the Siamese Convolutional Neural Network and
recommend items using Bayesian Personalized
Ranking model trained on the review histories and
interaction logs from e-commerce platforms along
with the item images. Zhang and Caverlee (2019)
recommend a time-aware model based on Recurrent
Recommendation Network and consider the users’
review history on Amazon with pictures of fashion
influencers on Instagram. However, these works
recommend for individuals based on personal
preferences only but do not predict the popularity.