A Review of Spatio-Temporal Action Detection Models in Computer Vision Based on Deep Learning

Mingzhu Zhang

Abstract


Spatio-Temporal Action Detection (STAD) is a cutting-edge task in computer vision, aiming to recognize and localize action categories, temporal intervals, and spatial positions from videos. With the development of deep learning, methods based on 3D CNNs, two-stream networks, and Transformers have emerged continuously, greatly advancing this field. This paper systematically reviews the mainstream spatiotemporal action detection models in recent years, analyzes their key modules, advantages, disadvantages, and development trends, and summarizes commonly used datasets and evaluation metrics.


Full Text:

PDF


DOI: https://doi.org/10.22158/assc.v7n5p140

Refbacks

  • There are currently no refbacks.


Copyright © SCHOLINK INC.  ISSN 2640-9682 (Print)  ISSN 2640-9674 (Online)