loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: N. Wondimu 1 ; 2 ; U. Visser 3 and C. Buche 1 ; 4

Affiliations: 1 Lab-STICC, Brest National School of Engineering, 29280, Plouzané, France ; 2 School of Information Technology and Engineering, Addis Ababa University, Addis Ababa, Ethiopia ; 3 University of Miami, Florida, U.S.A. ; 4 IRL CROSSING, CNRS, Adelaide, Australia

Keyword(s): Saliency Prediction, Video Saliency, Human Attention, Gaze Prediction, ConvLSTM, Video Saliency Dataset.

Abstract: Cognitive and neuroscience of attention researches suggest the use of spatio-temporal features for an efficient video saliency prediction. This is due to the representative nature of spatio-temporal features for data collected across space and time, such as videos. Video saliency prediction aims to find visually salient regions in a stream of images. Many video saliency prediction models are proposed in the past couple of years. Due to the unique nature of videos from that of static images, the earliest efforts to employ static image saliency prediction models for video saliency prediction task yield reduced performance. Consequently, dynamic video saliency prediction models that use spatio-temporal features were introduced. These models, especially deep learning based video saliency prediction models, transformed the state-of-the-art of video saliency prediction to a better level. However, video saliency prediction still remains a considerable challenge. This has been mainly due to the complex nature of video saliency prediction and scarcity of representative saliency benchmarks. Given the importance of saliency identification for various computer vision tasks, revising and enhancing the performance of video saliency prediction models is crucial. To this end, we propose a novel interactive video saliency prediction model that employs stacked-ConvLSTM based architecture along with a novel XY-shift frame differencing custom layer. Specifically, we introduce an encoder-decoder based architecture with a prior layer undertaking XY-shift frame differencing, a residual layer fusing spatially processed (VGG-16 based) features with XY-shift frame differenced frames, and a stacked-ConvLSTM component. Extensive experimental results over the largest video saliency dataset, DHF1K, show the competitive performance of our model against the state-of-the-art models. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.129.72.233

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wondimu, N. ; Visser, U. and Buche, C. (2023). Interactive Video Saliency Prediction: The Stacked-convLSTM Approach. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-623-1; ISSN 2184-433X, SciTePress, pages 157-168. DOI: 10.5220/0011664600003393

@conference{icaart23,
author={N. Wondimu and U. Visser and C. Buche},
title={Interactive Video Saliency Prediction: The Stacked-convLSTM Approach},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2023},
pages={157-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011664600003393},
isbn={978-989-758-623-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Interactive Video Saliency Prediction: The Stacked-convLSTM Approach
SN - 978-989-758-623-1
IS - 2184-433X
AU - Wondimu, N.
AU - Visser, U.
AU - Buche, C.
PY - 2023
SP - 157
EP - 168
DO - 10.5220/0011664600003393
PB - SciTePress