Activity Detection
63 papers with code • 1 benchmarks • 12 datasets
Detecting activities in extended videos.
Libraries
Use these libraries to find Activity Detection models and implementationsDatasets
Most implemented papers
A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection
This paper presents a smartphone app that performs real-time voice activity detection based on convolutional neural network.
Temporal Gaussian Mixture Layer for Videos
We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos.
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks
In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network.
Structure-Aware Convolutional Neural Networks
Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures.
The Second DIHARD Diarization Challenge: Dataset, task, and baselines
This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain.
Personalized Activity Recognition with Deep Triplet Embeddings
The novel subject triplet loss provides the best performance overall, and all personalized deep embeddings out-perform our baseline personalized engineered feature embedding and an impersonal fully convolutional neural network classifier.
Argus: Efficient Activity Detection System for Extended Video Analysis
We propose an Efficient Activity Detection System, Argus, for Extended Video Analysis in the surveillance scenario.
Dual Attention in Time and Frequency Domain for Voice Activity Detection
The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.
audino: A Modern Annotation Tool for Audio and Speech
The tool allows audio data and their corresponding annotations to be uploaded and assigned to a user through a key-based API.
RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns
The Respiration Pattern is first extracted from the video focusing on the abdominal-thoracic region of a speaker using an optical flow based method.