Search Results for author: Michael Wray

Found 22 papers, 13 papers with code

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision

no code implementations • 15 Apr 2024 • Siddhant Bansal, Michael Wray, Dima Damen

Our results demonstrate that VLMs trained for referral on third person images fail to recognise and refer hands and objects in egocentric images.

Object Question Answering +1

Paper
Add Code

Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos

no code implementations • 16 Feb 2024 • Shijia Feng, Michael Wray, Brian Sullivan, Youngkyoon Jang, Casimir Ludwig, Iain Gilchrist, Walterio Mayol-Cuevas

Determining when people are struggling from video enables a finer-grained understanding of actions and opens opportunities for building intelligent support visual interfaces.

Decision Making Video Understanding

Paper
Add Code

Video Editing for Video Retrieval

no code implementations • 4 Feb 2024 • Bin Zhu, Kevin Flanagan, Adriano Fragomeni, Michael Wray, Dima Damen

The teacher model is employed to edit the clips in the training set whereas the student model trains on the edited clips.

Retrieval Text Retrieval +2

Paper
Add Code

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

1 code implementation • 12 Dec 2023 • Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic

We address the task of generating temporally consistent and physically plausible images of actions and object state transformations.

Object

Paper
Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

1 code implementation • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

290

Paper
Code

Learning Temporal Sentence Grounding From Narrated EgoVideos

1 code implementation • 26 Oct 2023 • Kevin Flanagan, Dima Damen, Michael Wray

Compared to traditional benchmarks on which this task is evaluated, these datasets offer finer-grained sentences to ground in notably longer videos.

Sentence Temporal Sentence Grounding

Paper
Code

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval

1 code implementation • 9 Oct 2022 • Adriano Fragomeni, Michael Wray, Dima Damen

When the clip is short or visually ambiguous, knowledge of its local temporal context (i. e. surrounding video segments) can be used to improve the retrieval performance.

Retrieval Sentence +2

Paper
Code

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modelling Object State Change Classification

205

Paper
Code

Egocentric Video-Language Pretraining

2 code implementations • 3 Jun 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Ranked #2 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Contrastive Learning +11

205

Paper
Code

Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

no code implementations • 25 Oct 2021 • Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

Given a gallery of uncaptioned video sequences, this paper considers the task of retrieving videos based on their relevance to an unseen text query.

Retrieval Unsupervised Domain Adaptation +1

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

7 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,123

Paper
Code

On Semantic Similarity in Video Retrieval

3 code implementations • CVPR 2021 • Michael Wray, Hazel Doughty, Dima Damen

Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa.

Retrieval Semantic Similarity +2

Paper
Code

Supervision Levels Scale (SLS)

1 code implementation • 22 Aug 2020 • Dima Damen, Michael Wray

We propose a three-dimensional discrete and incremental scale to encode a method's level of supervision - i. e. the data and labels used when training a model to achieve a given performance.

Paper
Code

Rescaling Egocentric Vision

7 code implementations • 23 Jun 2020 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.

Ranked #6 on Action Anticipation on EPIC-KITCHENS-100

Action Anticipation Action Detection +4

117

Paper
Code

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations • 29 Apr 2020 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Object

Paper
Code

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

no code implementations • ICCV 2019 • Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

We report the first retrieval results on fine-grained actions for the large-scale EPIC dataset, in a generalised zero-shot setting.

Cross-Modal Retrieval POS +3

Paper
Add Code

Learning Visual Actions Using Multiple Verb-Only Labels

1 code implementation • 25 Jul 2019 • Michael Wray, Dima Damen

We collect multi-verb annotations for three action video datasets and evaluate the verb-only labelling representations for action recognition and cross-modal retrieval (video-to-text and text-to-video).

Action Recognition Cross-Modal Retrieval +1

Paper
Code

Towards an Unequivocal Representation of Actions

no code implementations • 10 May 2018 • Michael Wray, Davide Moltisanti, Dima Damen

This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e. g. 'open door', 'open cupboard'), and distinguish differing ones (e. g. 'open door' vs 'open bottle') using verb-only labels.

Action Recognition Retrieval +1

Paper
Add Code

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

2 code implementations • ECCV 2018 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention.

Ranked #6 on Action Anticipation on EPIC-KITCHENS-55 (Unseen test set (S2)

Action Anticipation

147

Paper
Code

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

no code implementations • ICCV 2017 • Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen

Manual annotations of temporal bounds for object interactions (i. e. start and end times) are typical training input to recognition, localization and detection algorithms.

Object

Paper
Add Code

Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition

no code implementations • 24 Mar 2017 • Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

This work deviates from easy-to-define class boundaries for object interactions.

General Classification Object

Paper
Add Code

SEMBED: Semantic Embedding of Egocentric Action Videos

no code implementations • 28 Jul 2016 • Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels.

General Classification Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.