Search Results for author: Alexander Hauptmann

Found 41 papers, 16 papers with code

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

1 code implementation • ECCV 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann

We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.

Ranked #1 on Trajectory Forecasting on ActEV

Adversarial Attack Adversarial Defense +2

247

Paper
Code

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation • 1 Apr 2024 • Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

Paper
Code

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

no code implementations • 3 Nov 2023 • Changdae Oh, Hyesu Lim, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

Robust fine-tuning aims to ensure performance on out-of-distribution (OOD) samples, which is sometimes compromised by pursuing adaptation on in-distribution (ID) samples.

Autonomous Driving Medical Diagnosis

Paper
Add Code

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin

no code implementations • 18 Sep 2023 • Gabriel Moreira, Manuel Marques, João Paulo Costeira, Alexander Hauptmann

Recent research in representation learning has shown that hierarchical data lends itself to low-dimensional and highly informative representations in hyperbolic space.

Few-Shot Learning Representation Learning

Paper
Add Code

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

1 code implementation • CVPR 2023 • Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann

The model uses a hierarchical transformer with intra-frame off-set attention and inter-frame self-attention.

Action Recognition Temporal Action Localization

Paper
Code

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

1 code implementation • NAACL 2021 • Po-Yao Huang, Mandela Patrick, Junjie Hu, Graham Neubig, Florian Metze, Alexander Hauptmann

Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextualized multilingual multimodal embeddings.

Image Retrieval Text-to-video search +1

Paper
Code

Learning To Hallucinate Examples From Extrinsic and Intrinsic Supervision

no code implementations • ICCV 2021 • Liangke Gui, Adrien Bardes, Ruslan Salakhutdinov, Alexander Hauptmann, Martial Hebert, Yu-Xiong Wang

Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks.

Contrastive Learning Few-Shot Learning +1

Paper
Add Code

Spatial-Temporal Alignment Network for Action Recognition and Detection

no code implementations • 4 Dec 2020 • Junwei Liang, Liangliang Cao, Xuehan Xiong, Ting Yu, Alexander Hauptmann

The experimental results show that the STAN model can consistently improve the state of the arts in both action detection and action recognition tasks.

Action Detection Action Recognition

Paper
Add Code

Event-Related Bias Removal for Real-time Disaster Events

no code implementations • Findings of the Association for Computational Linguistics 2020 • Evangelia Spiliopoulou, Salvador Medina Maza, Eduard Hovy, Alexander Hauptmann

Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis.

General Classification

Paper
Add Code

Support-set bottlenecks for video-text representation learning

no code implementations • ICLR 2021 • Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi

The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.

Contrastive Learning Representation Learning +3

Paper
Add Code

Robust Long-Term Object Tracking via Improved Discriminative Model Prediction

1 code implementation • 11 Aug 2020 • Seokeon Choi, Junhyun Lee, Yunsung Lee, Alexander Hauptmann

We propose an improved discriminative model prediction method for robust long-term tracking based on a pre-trained short-term tracker.

Object Tracking

Paper
Code

From A Glance to "Gotcha": Interactive Facial Image Retrieval with Progressive Relevance Feedback

no code implementations • 30 Jul 2020 • Xinru Yang, Haozhi Qi, Mingyang Li, Alexander Hauptmann

Facial image retrieval plays a significant role in forensic investigations where an untrained witness tries to identify a suspect from a massive pool of images.

Face Image Retrieval Retrieval

Paper
Add Code

MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos

1 code implementation • 30 Jun 2020 • Xiaoyu Zhu, Junwei Liang, Alexander Hauptmann

This provides the first benchmark for quantitative evaluation of models to assess building damage using aerial videos.

Instance Segmentation Region Proposal +1

Paper
Code

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting

no code implementations • ACL 2020 • Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann

In this paper, we investigate how to utilize visual content for disambiguation and promoting latent space alignment in unsupervised MMT.

Translation Unsupervised Machine Translation

Paper
Add Code

SimAug: Learning Robust Representations from 3D Simulation for Pedestrian Trajectory Prediction in Unseen Cameras

1 code implementation • 4 Apr 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann

We refer to our method as SimAug.

Ranked #2 on Trajectory Prediction on ActEV

Adversarial Attack Adversarial Defense +2

247

Paper
Code

ZSTAD: Zero-Shot Temporal Activity Detection

no code implementations • CVPR 2020 • Lingling Zhang, Xiaojun Chang, Jun Liu, Minnan Luo, Sen Wang, ZongYuan Ge, Alexander Hauptmann

An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos.

Action Detection Activity Detection

Paper
Add Code

Gun Source and Muzzle Head Detection

no code implementations • 29 Jan 2020 • Zhong Zhou, Isak Czeresnia Etinger, Florian Metze, Alexander Hauptmann, Alexander Waibel

We have interesting results both in bounding the shooter as well as detecting the gun smoke.

Head Detection object-detection +1

Paper
Add Code

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

1 code implementation • CVPR 2020 • Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.

Ranked #1 on Multi-future Trajectory Prediction on ForkingPaths

Autonomous Driving Human motion prediction +5

247

Paper
Code

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

no code implementations • IJCNLP 2019 • Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann

With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.

Image Retrieval object-detection +2

Paper
Add Code

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Paper
Add Code

Learning Spatial Awareness to Improve Crowd Counting

no code implementations • ICCV 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Ranked #5 on Crowd Counting on WorldExpo’10

Crowd Counting Weakly-supervised Learning

Paper
Add Code

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Paper
Add Code

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data

no code implementations • 2 Jun 2019 • Shizhe Chen, Qin Jin, Alexander Hauptmann

The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant.

Bilingual Lexicon Induction Sentence +2

Paper
Add Code

Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and Beyond

2 code implementations • 26 May 2019 • Junwei Liang, Jay D. Aronson, Alexander Hauptmann

Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.

Gunshot Detection Shooter Localization +2

Paper
Code

ExCL: Extractive Clip Localization Using Natural Language Descriptions

1 code implementation • NAACL 2019 • Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann

The task of retrieving clips within videos based on a given natural language query requires cross-modal reasoning over multiple frames.

147

Paper
Code

Peeking into the Future: Predicting Future Person Activities and Locations in Videos

2 code implementations • CVPR 2019 • Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei

To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen.

Ranked #1 on Activity Prediction on ActEV

Future prediction Human motion prediction +4

350

Paper
Code

Traffic Danger Recognition With Surveillance Cameras Without Training Data

no code implementations • 29 Nov 2018 • Lijun Yu, Dawei Zhang, Xiangqun Chen, Alexander Hauptmann

Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories.

3D Reconstruction Position

Paper
Add Code

Perceiving Physical Equation by Observing Visual Scenarios

no code implementations • 29 Nov 2018 • Siyu Huang, Zhi-Qi Cheng, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann

To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world.

Paper
Add Code

CADP: A Novel Dataset for CCTV Traffic Camera based Accident Analysis

1 code implementation • 16 Sep 2018 • Ankit Shah, Jean Baptiste Lamare, Tuan Nguyen Anh, Alexander Hauptmann

Our experiments indicate a considerable improvement in object detection accuracy: +8. 51% for CM and +6. 20% for ACM.

Object object-detection +2

Paper
Code

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset

no code implementations • 1 Sep 2018 • Ankit Shah, Harini Kesavamoorthy, Poorva Rane, Pramati Kalwad, Alexander Hauptmann, Florian Metze

Moments capture a huge part of our lives.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance

1 code implementation • 22 Aug 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity.

Crowd Counting Density Estimation

Paper
Code

RUC+CMU: System Report for Dense Captioning Events in Videos

no code implementations • 22 Jun 2018 • Shizhe Chen, Yuqing Song, Yida Zhao, Jiarong Qiu, Qin Jin, Alexander Hauptmann

This notebook paper presents our system in the ActivityNet Dense Captioning in Video task (task 3).

Caption Generation Dense Captioning +1

Paper
Add Code

Focal Visual-Text Attention for Visual Question Answering

2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Ranked #1 on Memex Question Answering on MemexQA

Memex Question Answering Question Answering +1

Paper
Code

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning

no code implementations • 19 Apr 2018 • Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann

A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures.

Attribute Neural Architecture Search

Paper
Add Code

Video Captioning with Guidance of Multimodal Latent Topics

no code implementations • 31 Aug 2017 • Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann

For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.

Caption Generation Multi-Task Learning +1

Paper
Add Code

MemexQA: Visual Memex Question Answering

1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Paper
Code

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

1 code implementation • 16 Jul 2016 • Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.

BIG-bench Machine Learning

Paper
Code

The Solution Path Algorithm for Identity-Aware Multi-Object Tracking

no code implementations • CVPR 2016 • Shoou-I Yu, Deyu Meng, WangMeng Zuo, Alexander Hauptmann

The tracker is formulated as a quadratic optimization problem with L0 norm constraints, which we propose to solve with the solution path algorithm.

Active Learning Decision Making +2

Paper
Add Code

The Best of Both Worlds: Combining Data-independent and Data-driven Approaches for Action Recognition

no code implementations • 17 May 2015 • Zhenzhong Lan, Dezhong Yao, Ming Lin, Shoou-I Yu, Alexander Hauptmann

First, we propose a two-stream Stacked Convolutional Independent Subspace Analysis (ConvISA) architecture to show that unsupervised learning methods can significantly boost the performance of traditional local features extracted from data-independent models.

Action Recognition Multi-class Classification +3

Paper
Add Code

Self-Paced Learning with Diversity

no code implementations • NeurIPS 2014 • Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann

Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.

Paper
Add Code

Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

no code implementations • CVPR 2013 • Shoou-I Yu, Yi Yang, Alexander Hauptmann

A device just like Harry Potter's Marauder's Map, which pinpoints the location of each person-of-interest at all times, provides invaluable information for analysis of surveillance videos.

Face Recognition Human Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.