no code implementations • 22 Feb 2024 • Bin Zhu, Peng Jin, Munan Ning, Bin Lin, Jinfa Huang, Qi Song, Jiaxi Cui, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan
While recent progress in multimodal large language models tackles various modality tasks, they posses limited integration capabilities for complex multi-modality tasks, consequently constraining the development of the field.
1 code implementation • 8 Feb 2024 • Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.
Ranked #16 on Visual Question Answering on MM-Vet
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #46 on Visual Question Answering on MM-Vet
1 code implementation • 18 Jan 2024 • Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen
To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.
1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.
1 code implementation • 5 Dec 2023 • Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan
In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.
4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Ranked #2 on Zero-Shot Video Question Answer on TGIF-QA
2 code implementations • 14 Nov 2023 • Peng Jin, Ryuichi Takanobu, Wancai Zhang, Xiaochun Cao, Li Yuan
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations.
Image-based Generative Performance Benchmarking Language Modelling +9
1 code implementation • 8 Aug 2023 • Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue
To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.
no code implementations • 28 Jul 2023 • Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin
This paper investigates the impact of big data on deep learning models for full waveform inversion (FWI).
no code implementations • 21 Jun 2023 • Shihang Feng, Hanchen Wang, Chengyuan Deng, Yinan Feng, Yanhua Liu, Min Zhu, Peng Jin, Yinpeng Chen, Youzuo Lin
We conduct comprehensive numerical experiments to explore the relationship between P-wave and S-wave velocities in seismic data.
no code implementations • 19 Jun 2023 • Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen
Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information.
4 code implementations • 20 May 2023 • Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen
In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.
no code implementations • 17 May 2023 • Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.
no code implementations • 27 Apr 2023 • Yinan Feng, Yinpeng Chen, Peng Jin, Shihang Feng, Zicheng Liu, Youzuo Lin
Geophysics has witnessed success in applying deep learning to one of its core problems: full waveform inversion (FWI) to predict subsurface velocity maps from seismic data.
4 code implementations • CVPR 2023 • Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen
Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.
Ranked #6 on Video Question Answering on MSRVTT-QA
no code implementations • ICCV 2023 • Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen
Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis.
4 code implementations • ICCV 2023 • Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen
Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).
Ranked #13 on Video Retrieval on ActivityNet
no code implementations • 13 Mar 2023 • Zesen Cheng, Kehan Li, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen
An intuitive materialization of our paradigm is Parallel Vertex Diffusion (PVD) to directly set vertex coordinates as the generation target and use a diffusion model to train and infer.
4 code implementations • 21 Nov 2022 • Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen
Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.
Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)
no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen
Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.
no code implementations • 28 Apr 2022 • Yinan Feng, Yinpeng Chen, Shihang Feng, Peng Jin, Zicheng Liu, Youzuo Lin
In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels.
no code implementations • 3 Feb 2022 • Shihang Feng, Peng Jin, Xitong Zhang, Yinpeng Chen, David Alumbaugh, Michael Commer, Youzuo Lin
We explore a multi-physics inversion problem from two distinct measurements~(seismic and EM data) to three geophysical properties~(velocity, conductivity, and CO$_2$ saturation).
2 code implementations • 4 Nov 2021 • Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng Jin, Yinan Feng, Qili Zeng, Yinpeng Chen, Youzuo Lin
The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community.
no code implementations • ICLR 2022 • Peng Jin, Xitong Zhang, Yinpeng Chen, Sharon Xiaolei Huang, Zicheng Liu, Youzuo Lin
In particular, we use finite difference to approximate the forward modeling of PDE as a differentiable operator (from velocity map to seismic data) and model its inversion by CNN (from seismic data to velocity map).
no code implementations • 13 Jun 2021 • Peng Jin, Min Zhang, Jianwen Li, Li Han, Xuejun Wen
Formally verifying Deep Reinforcement Learning (DRL) systems is a challenging task due to the dynamic continuity of system behaviors and the black-box feature of embedded neural networks.
no code implementations • 26 Feb 2021 • Liang Chen, Peng Jin, Jing Yang, Yang Li, Yi Song
To obtain the accurate transient states of the big scale natural gas pipeline networks under the bad data and non-zero mean noises conditions, a robust Kalman filter-based dynamic state estimation method is proposed using the linearized gas pipeline transient flow equations in this paper.
no code implementations • SEMEVAL 2020 • Yice Zhang, Jiaxuan Lin, Yang Fan, Peng Jin, Yuanchao Liu, Bingquan Liu
For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements.
no code implementations • 2 Oct 2020 • Vishal Mandal, Abdul Rashid Mussah, Peng Jin, Yaw Adu-Gyamfi
Real-time object detection algorithms coupled with different tracking systems are deployed to automatically detect stranded vehicles as well as perform vehicular counts.
1 code implementation • 4 May 2020 • Ping Cai, Xingyuan Chen, Peng Jin, Hongjun Wang, Tianrui Li
The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data.
1 code implementation • 5 Apr 2020 • Xingyuan Chen, Ping Cai, Peng Jin, Hongjun Wang, Xin-yu Dai, Jia-Jun Chen
To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator's parameters directly, but they fail by being evaluated precisely.
no code implementations • 20 Oct 2019 • Hamed Majidifard, Peng Jin, Yaw Adu-Gyamfi, William G. Buttlar
Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community.
no code implementations • 28 Sep 2019 • Xingyuan Chen, Ping Cai, Peng Jin, Haokun Du, Hongjun Wang, Xingyu Dai, Jia-Jun Chen
In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text.
1 code implementation • 30 May 2019 • Xingyuan Chen, Yanzhe Li, Peng Jin, Jiuhua Zhang, Xin-yu Dai, Jia-Jun Chen, Gang Song
It is easy to improve the existing GAN-based models with this mechanism.
no code implementations • 13 Jan 2017 • Jielei Chu, Hongjun Wang, Hua Meng, Peng Jin, Tianrui Li
To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances.
no code implementations • LREC 2012 • Yunqing Xia, Guoyu Tang, Peng Jin, Xia Yang
A preliminary evaluation with CLTC corpus indicates that the corpus is effective in evaluating cross-lingual topic detection methods.