Search Results for author: Minjie Zhu

Found 6 papers, 2 papers with code

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

1 code implementation • 10 Mar 2024 • Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang

Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning.

Ranked #68 on Visual Question Answering on MM-Vet

Visual Question Answering

291

Paper
Code

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

no code implementations • 8 Jan 2024 • Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning.

Decision Making Intent Recognition +2

Paper
Add Code

Object-Centric Instruction Augmentation for Robotic Manipulation

no code implementations • 5 Jan 2024 • Junjie Wen, Yichen Zhu, Minjie Zhu, Jinming Li, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

Humans interpret scenes by recognizing both the identities and positions of objects in their observations.

Language Modelling Large Language Model +1

Paper
Add Code

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

1 code implementation • 4 Jan 2024 • Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang

In this paper, we introduce LLaVA-$\phi$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues.

Ranked #81 on Visual Question Answering on MM-Vet

Language Modelling Visual Question Answering

291

Paper
Code

SpeechAct: Towards Generating Whole-body Motion from Speech

no code implementations • 29 Nov 2023 • Jinsong Zhang, Minjie Zhu, Yuxiang Zhang, Yebin Liu, Kun Li

Then, we regress the motion representation from the audio signal by a translation model employing our contrastive motion learning method.

Paper
Add Code

Enhancing Asynchronous Time Series Forecasting with Contrastive Relational Inference

no code implementations • 6 Sep 2023 • Yan Wang, Zhixuan Chu, Tao Zhou, Caigao Jiang, Hongyan Hao, Minjie Zhu, Xindong Cai, Qing Cui, Longfei Li, james Y zhang, Siqiao Xue, Jun Zhou

Asynchronous time series, also known as temporal event sequences, are the basis of many applications throughout different industries.

Point Processes Time Series +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.