Search Results for author: Zhixi Cai

Found 7 papers, 4 papers with code

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

no code implementations • 6 Apr 2024 • Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi

Understanding human social behaviour is crucial in computer vision and robotics.

Paper
Add Code

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

no code implementations • 19 Mar 2024 • Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

Reinforcement Learning (RL) Visual Reasoning

Paper
Add Code

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

1 code implementation • 26 Nov 2023 • Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Kalin Stefanov

The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets.

2k DeepFake Detection +2

Paper
Code

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase

no code implementations • 10 May 2023 • Shreya Ghosh, Rakibul Hasan, Pradyumna Agrawal, Zhixi Cai, Susannah Soon, Abhinav Dhall, Tom Gedeon

To this end, we design a user interface to generate an automatic feedback mechanism that integrates Pavlok and a deep learning based model to detect certain behaviours via an integrated user interface i. e. mobile or desktop application.

Paper
Add Code

Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

1 code implementation • 3 May 2023 • Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat

The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations.

Ranked #1 on Temporal Forgery Localization on ForgeryNet

Binary Classification DeepFake Detection +2

Paper
Code

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation • CVPR 2023 • Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Ranked #1 on Emotion Classification on CMU-MOSEI

Action Classification Attribute +9

193

Paper
Code

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

1 code implementation • 13 Apr 2022 • Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions.

Ranked #1 on DeepFake Detection on LAV-DF

Benchmarking DeepFake Detection +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.