Search Results for author: Zun Wang

Found 14 papers, 9 papers with code

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

2 code implementations • 22 Mar 2024 • Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei HUANG, Yu Qiao, Yali Wang, LiMin Wang

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Action Classification Action Recognition +12

929

Paper
Code

Self-Consistency Training for Hamiltonian Prediction

no code implementations • 14 Mar 2024 • He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu

This merit addresses the data scarcity difficulty, and distinguishes the task from other property prediction formulations with unique benefits: (1) self-consistency training enables the model to be trained on a large amount of unlabeled data, hence substantially enhances generalization; (2) self-consistency training is more efficient than labeling data with DFT for supervised training, since it is an amortization of DFT calculation over a set of molecular structures.

Property Prediction

Paper
Add Code

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

2 code implementations • 3 Mar 2024 • Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan

The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.

Property Prediction

Paper
Code

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation • 28 Nov 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Ranked #1 on Zero-Shot Video Question Answer on STAR Benchmark

Fairness Multiple-choice +8

2,669

Paper
Code

Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields

no code implementations • 11 Aug 2023 • Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan

This suite of metrics has demonstrated a better ability to assess a model's performance in real-world scientific applications, in contrast to traditional AI benchmarking methodologies.

Benchmarking

Paper
Add Code

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

136

Paper
Code

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

1 code implementation • 6 Apr 2023 • Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang

To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.

Autonomous Navigation Navigate +1

161

Paper
Code

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation • 6 Dec 2022 • Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

929

Paper
Code

An ensemble of VisNet, Transformer-M, and pretraining models for molecular property prediction in OGB Large-Scale Challenge @ NeurIPS 2022

no code implementations • 23 Nov 2022 • Yusong Wang, Shaoning Li, Zun Wang, Xinheng He, Bin Shao, Tie-Yan Liu, Tong Wang

In the technical report, we provide our solution for OGB-LSC 2022 Graph Regression Task.

Graph Regression Molecular Property Prediction +2

Paper
Add Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

1 code implementation • 23 Jun 2022 • Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

Data Augmentation Vision and Language Navigation

161

Paper
Code

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.

Imitation Learning Vision and Language Navigation

Paper
Code

Heterogeneous relational message passing networks for molecular dynamics simulations

no code implementations • 2 Sep 2021 • Zun Wang, Chong Wang, Sibo Zhao, Yong Xu, Shaogang Hao, Chang Yu Hsieh, Bing-Lin Gu, Wenhui Duan

With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology.

BIG-bench Machine Learning

Paper
Add Code

Symmetry-adapted graph neural networks for constructing molecular dynamics force fields

no code implementations • 8 Jan 2021 • Zun Wang, Chong Wang, Sibo Zhao, Shiqiao Du, Yong Xu, Bing-Lin Gu, Wenhui Duan

Molecular dynamics is a powerful simulation tool to explore material properties.

Feature Engineering Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.