Authors:
Viral Parekh
and
Karimulla Shaik
Affiliation:
Flipkart Internet Private Limited, India
Keyword(s):
Multi-Scale Features, Feature Pyramid Network, Multi-Task Learning, Visual Attribute Extraction.
Abstract:
Visual attribute extraction of products from their images is an essential component for E-commerce applications like easy cataloging, catalog enrichment, visual search, etc. In general, the product attributes are the mixture of coarse-grained and fine-grained classes, also a mixture of small (for example neck type, sleeve length of top-wear), or large (for example pattern of print on apparel) regions of coverage on products which makes attribute extraction even more challenging. In spite of the challenges, it is important to extract the attributes with high accuracy and low latency. So we have modeled attribute extraction as a classification problem with multi-task learning where each attribute is a task. This paper proposes solutions to address above mentioned challenges through multi-scale feature extraction using Feature Pyramid Network (FPN) along with attention and feature fusion for multi-task setup. We have experimented incrementally with various ways of extracting multi-scale
features. We use our in-house fashion category dataset and iMaterialist 2021 for visual attribute extraction to show the efficacy of our approaches. We observed, on average, ∼ 4% improvement in F1 scores of different product attributes in both datasets compared to the baseline.
(More)