A Review of Spatio-Temporal Action Detection Models in Computer Vision Based on Deep Learning
Abstract
Spatio-Temporal Action Detection (STAD) is a cutting-edge task in computer vision, aiming to recognize and localize action categories, temporal intervals, and spatial positions from videos. With the development of deep learning, methods based on 3D CNNs, two-stream networks, and Transformers have emerged continuously, greatly advancing this field. This paper systematically reviews the mainstream spatiotemporal action detection models in recent years, analyzes their key modules, advantages, disadvantages, and development trends, and summarizes commonly used datasets and evaluation metrics.
Full Text:
PDFDOI: https://doi.org/10.22158/assc.v7n5p140
Refbacks
- There are currently no refbacks.
Copyright © SCHOLINK INC. ISSN 2640-9682 (Print) ISSN 2640-9674 (Online)