Authors:
Salma Ksibi
1
;
Mahmoud Mejdoub
2
and
Chokri Ben Amar
1
Affiliations:
1
University of Sfax, Tunisia
;
2
University of Sfax, College of AlGhat and Majmaah University, Tunisia
Keyword(s):
Person Re-identification, Fisher Vector, Gaussian Weight, Deep Hand-crafted Feature, Deep CNN, XQDA.
Related
Ontology
Subjects/Areas/Topics:
Applications and Services
;
Color and Texture Analyses
;
Computer Vision, Visualization and Computer Graphics
;
Enterprise Information Systems
;
Entertainment Imaging Applications
;
Features Extraction
;
Human and Computer Interaction
;
Human-Computer Interaction
;
Image and Video Analysis
;
Image Formation and Preprocessing
;
Image Generation Pipeline: Algorithms and Techniques
;
Motion, Tracking and Stereo Vision
;
Segmentation and Grouping
;
Tracking and Visual Navigation
;
Video Surveillance and Event Detection
;
Visual Attention and Image Saliency
Abstract:
Gaussian Fisher Vector (GFV) encoding is an extension of the conventional Fisher Vector (FV) that effectively
discards the noisy background information by localizing the pedestrian position in the image. Nevertheless,
GFV can only provide a shallow description of the pedestrian features. In order to capture more complex
structural information, we propose in this paper a layered extension of GFV that we called LGFV. The representation
is based on two nested layers that hierarchically refine the FV encoding from one layer to the
next by integrating more spatial neighborhood information. Besides, we present in this paper a new rich
multi-level semantic pedestrian representation built simultaneously upon complementary deep hand-crafted
and deep Convolutional Neural Network (CNN) features. The deep hand-crafted feature is depicted by the
combination of GFV mid-level features and high-level LGFV ones while a deep CNN feature is obtained by
learning in a classification mode an effective emb
edding of the raw pedestrian pixels. The proposed deep
hand-crafted features produce competitive accuracy with respect to the deep CNN ones without needing neither
pre-training nor data augmentation, and the proposed multi-level representation further boosts the re-ID
performance.
(More)