1 code implementation • 1 Apr 2024 • Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
An ideal model for dense video captioning -- predicting captions localized temporally in a video -- should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to produce outputs before processing the entire video.
no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.
no code implementations • 16 Feb 2024 • Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang
Our research undertakes a thorough exploration of the state-of-the-art perceiver resampler architecture and builds a strong baseline.
no code implementations • 11 Jan 2024 • Rouwan Wu, Xiaoya Cheng, Juelin Zhu, Xuxiang Liu, Maojun Zhang, Shen Yan
Despite significant progress in global localization of Unmanned Aerial Vehicles (UAVs) in GPS-denied environments, existing methods remain constrained by the availability of datasets.
no code implementations • 14 Dec 2023 • Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
When taking locations as inputs, the model performs location-conditioned captioning, which generates captions for the indicated object or region.
3 code implementations • 6 Dec 2023 • Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society.
1 code implementation • ICCV 2023 • Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.
Ranked #1 on Action Segmentation on COIN
no code implementations • 18 Apr 2023 • Yang Liu, Shen Yan, Yuge Zhang, Kan Ren, Quanlu Zhang, Zebin Ren, Deng Cai, Mi Zhang
Vision Transformers have shown great performance in single tasks such as classification and segmentation.
no code implementations • CVPR 2023 • Shen Yan, Yu Liu, Long Wang, Zehong Shen, Zhen Peng, Haomin Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou
Despite the remarkable advances in image matching and pose estimation, image-based localization of a camera in a temporally-varying outdoor environment is still a challenging problem due to huge appearance disparity between query and reference images caused by illumination, seasonal and structural changes.
1 code implementation • ICCV 2023 • Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo
In particular, our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
no code implementations • 13 Feb 2023 • Shen Yan, Xiaoya Cheng, Yuxiang Liu, Juelin Zhu, Rouwan Wu, Yu Liu, Maojun Zhang
Despite the significant progress in 6-DoF visual localization, researchers are mostly driven by ground-level benchmarks.
no code implementations • ICCV 2023 • Long Wang, Shen Yan, Jianan Zhen, Yu Liu, Maojun Zhang, Guofeng Zhang, Xiaowei Zhou
Specifically, given an initial pose, we project the object model to the image plane to obtain the initial contour and use a lightweight network to predict how the contour should move to match the true object boundary, which provides the gradients to optimize the object pose.
no code implementations • 9 Dec 2022 • Shen Yan, Tao Zhu, ZiRui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu
We explore an efficient approach to establish a foundational video-text model.
Ranked #1 on Video Captioning on ActivityNet Captions (using extra training data)
1 code implementation • CVPR 2023 • Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan
We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e. g., more aggressive image crop augmentations produce less confident learning targets.
1 code implementation • CVPR 2022 • Qingyan Meng, Mingqing Xiao, Shen Yan, Yisen Wang, Zhouchen Lin, Zhi-Quan Luo
In this paper, we propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance that is competitive to ANNs yet with low latency.
1 code implementation • 11 Mar 2022 • Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang
In this work, instead of fixing a set of hand-picked default augmentations alongside the searched data augmentations, we propose a fully automated approach for data augmentation search named Deep AutoAugment (DeepAA).
Ranked #2 on Data Augmentation on ImageNet
1 code implementation • CVPR 2022 • Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.
Ranked #5 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)
1 code implementation • NeurIPS 2021 • Shen Yan, Colin White, Yash Savani, Frank Hutter
While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research.
no code implementations • 2 Oct 2021 • Yahya H. Ezzeldin, Shen Yan, Chaoyang He, Emilio Ferrara, Salman Avestimehr
Training ML models which are fair across different demographic groups is of critical importance due to the increased integration of ML in crucial decision-making scenarios such as healthcare and recruitment.
1 code implementation • 14 Feb 2021 • Shen Yan, Kaiqiang Song, Fei Liu, Mi Zhang
Our experiments show that CATE is beneficial to the downstream search, especially in the large search space.
Ranked #15 on Neural Architecture Search on CIFAR-10
no code implementations • 8 Feb 2021 • Amin Reihani, Shen Yan, Yuxuan Luan, Rohith Mittapally, Edgar Meyhofer, Pramod Reddy
Quantifying the temperature of microdevices is critical for probing nanoscale energy transport. Such quantification is often accomplished by integrating resistance thermometers into microdevices.
Mesoscale and Nanoscale Physics
no code implementations • 8 Nov 2020 • Taha Ameen ur Rahman, Alton S. Barbehenn, Xinan Chen, Hassan Dbouk, James A. Douglas, Yuncong Geng, Ian George, John B. Harvill, Sung Woo Jeon, Kartik K. Kansal, Kiwook Lee, Kelly A. Levick, Bochao Li, Ziyue Li, Yashaswini Murthy, Adarsh Muthuveeru-Subramaniam, S. Yagiz Olmez, Matthew J. Tomei, Tanya Veeravalli, Xuechao Wang, Eric A. Wayman, Fan Wu, Peng Xu, Shen Yan, Heling Zhang, Yibo Zhang, Yifan Zhang, Yibo Zhao, Sourya Basu, Lav R. Varshney
Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions.
Information Theory Information Theory
no code implementations • 17 Oct 2020 • Mi Zhang, Faen Zhang, Nicholas D. Lane, Yuanchao Shu, Xiao Zeng, Biyi Fang, Shen Yan, Hui Xu
The era of edge computing has arrived.
no code implementations • 17 Sep 2020 • Shen Yan, Yang Pen, Shiming Lai, Yu Liu, Maojun Zhang
Conventional image retrieval techniques for Structure-from-Motion (SfM) suffer from the limit of effectively recognizing repetitive patterns and cannot guarantee to create just enough match pairs with high precision and high recall.
1 code implementation • NeurIPS 2020 • Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang
Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias.
1 code implementation • 3 Jan 2020 • Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, Liu Ren
Unsupervised domain adaptation studies the problem of utilizing a relevant source domain with abundant labels to build predictive modeling for an unannotated target domain.
Ranked #49 on Domain Generalization on PACS
no code implementations • 17 Nov 2019 • Zhibo Wang, Shen Yan, XiaoYu Zhang, Niels Lobo
(Very early draft)Traditional supervised learning keeps pushing convolution neural network(CNN) achieving state-of-art performance.
2 code implementations • ECCV 2020 • Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis
We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.
no code implementations • 31 Aug 2019 • Shen Yan, Biyi Fang, Faen Zhang, Yu Zheng, Xiao Zeng, Hui Xu, Mi Zhang
Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover.
no code implementations • IWSLT (EMNLP) 2018 • Shen Yan, Leonard Dahlmann, Pavel Petrushkov, Sanjika Hewavitharana, Shahram Khadivi
Pre-training a model with word weights improves fine-tuning up to 1. 24% BLEU absolute and 1. 64% TER, respectively.
no code implementations • 13 Dec 2017 • Shen Yan
In this work, we mainly study the influence of the 2D warping module for one-shot face recognition.