A Review of Spatio-Temporal Action Detection Models in Computer Vision Based on Deep Learning

Mingzhu Zhang

doi:10.22158/assc.v7n5p140

A Review of Spatio-Temporal Action Detection Models in Computer Vision Based on Deep Learning

Mingzhu Zhang

Abstract

Spatio-Temporal Action Detection (STAD) is a cutting-edge task in computer vision, aiming to recognize and localize action categories, temporal intervals, and spatial positions from videos. With the development of deep learning, methods based on 3D CNNs, two-stream networks, and Transformers have emerged continuously, greatly advancing this field. This paper systematically reviews the mainstream spatiotemporal action detection models in recent years, analyzes their key modules, advantages, disadvantages, and development trends, and summarizes commonly used datasets and evaluation metrics.

Full Text:

PDF

DOI: https://doi.org/10.22158/assc.v7n5p140

Refbacks

There are currently no refbacks.

Username
Password
Remember me

Advances in Social Science and Culture

A Review of Spatio-Temporal Action Detection Models in Computer Vision Based on Deep Learning

Abstract

Full Text:

Refbacks