Search Results for author: Zhihong Chen

Found 33 papers, 24 papers with code

Large Multimodal Agents: A Survey

no code implementations23 Feb 2024 Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li

In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short).

Decision Making

ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model

1 code implementation18 Feb 2024 Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang

Recent advancements in Large Vision-Language Models (LVLMs) have enabled processing of multimodal inputs in language models but require significant computational resources for deployment, especially in edge devices.

Language Modelling Visual Question Answering

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

no code implementations22 Jan 2024 Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, Emily B. Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Gatidis, Akshay S. Chaudhari, Curtis Langlotz

However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation.

Benchmarking Fairness +2

MLLM-Bench, Evaluating Multi-modal LLMs using GPT-4V

1 code implementation23 Nov 2023 Wentao Ge, Shunian Chen, Guiming Chen, Junying Chen, Zhihong Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang

In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language models has marked a significant milestone.

Exploiting Low-confidence Pseudo-labels for Source-free Object Detection

no code implementations19 Oct 2023 Zhihong Chen, Zilei Wang, Yixin Zhang

The LPU module consists of Proposal Soft Training (PST) and Local Spatial Contrastive Learning (LSCL).

Contrastive Learning object-detection +2

AceGPT, Localizing Large Language Models in Arabic

1 code implementation21 Sep 2023 Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.

Instruction Following Language Modelling +2

CMB: A Comprehensive Medical Benchmark in Chinese

1 code implementation17 Aug 2023 Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li

We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China.

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

1 code implementation CVPR 2023 Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li

Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.

Image-text matching Text Matching +1

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

1 code implementation ICCV 2023 Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

Image Segmentation Referring Expression Segmentation +2

On the Difference of BERT-style and CLIP-style Text Encoders

1 code implementation6 Jun 2023 Zhihong Chen, Guiming Hardy Chen, Shizhe Diao, Xiang Wan, Benyou Wang

Masked language modeling (MLM) has been one of the most popular pretraining recipes in natural language processing, e. g., BERT, one of the representative models.

Language Modelling Masked Language Modeling +1

HuatuoGPT, towards Taming Language Model to Be a Doctor

1 code implementation24 May 2023 Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Jianquan Li, Guiming Chen, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li

Experimental results demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets.

Language Modelling Large Language Model

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

1 code implementation ICCV 2023 Zhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan

Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts.

Image Retrieval Image-text Classification +7

GIPA: A General Information Propagation Algorithm for Graph Learning

1 code implementation19 Jan 2023 Houyi Li, Zhihong Chen, Zhao Li, Qinkai Zheng, Peng Zhang, Shuigeng Zhou

Specifically, the bit-wise correlation calculates the element-wise attention weight through a multi-layer perceptron (MLP) based on the dense representations of two nodes and their edge; The feature-wise correlation is based on the one-hot representations of node attribute features for feature selection.

Attribute feature selection +3

Generalizing Multimodal Variational Methods to Sets

no code implementations19 Dec 2022 Jinzhao Zhou, Yiqun Duan, Zhihong Chen, Yu-Cheng Chang, Chin-Teng Lin

Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena.

Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

1 code implementation15 Nov 2022 Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck

We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS.

Anatomy

Improving Radiology Summarization with Radiograph and Anatomy Prompts

no code implementations15 Oct 2022 Jinpeng Hu, Zhihong Chen, Yang Liu, Xiang Wan, Tsung-Hui Chang

The impression is crucial for the referring physicians to grasp key information since it is concluded from the findings and reasoning of radiologists.

Anatomy Contrastive Learning +1

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

1 code implementation15 Sep 2022 Zhihong Chen, Yuhao Du, Jinpeng Hu, Yang Liu, Guanbin Li, Xiang Wan, Tsung-Hui Chang

Besides, we conduct further analysis to better verify the effectiveness of different components of our approach and various settings of pre-training.

Self-Supervised Learning

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

1 code implementation15 Sep 2022 Zhihong Chen, Guanbin Li, Xiang Wan

Most existing methods mainly contain three elements: uni-modal encoders (i. e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP.

Cross-modal Memory Networks for Radiology Report Generation

1 code implementation ACL 2021 Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan

Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments.

Medical Diagnosis Text Generation

Graph Enhanced Contrastive Learning for Radiology Findings Summarization

1 code implementation ACL 2022 Jinpeng Hu, Zhuo Li, Zhihong Chen, Zhen Li, Xiang Wan, Tsung-Hui Chang

To address the limitation, we propose a unified framework for exploiting both extra knowledge and the original findings in an integrated way so that the critical information (i. e., key words and their relations) can be extracted in an appropriate way to facilitate impression generation.

Contrastive Learning

Word Graph Guided Summarization for Radiology Findings

1 code implementation Findings (ACL) 2021 Jinpeng Hu, Jianling Li, Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan, Tsung-Hui Chang

In this paper, we propose a novel method for automatic impression generation, where a word graph is constructed from the findings to record the critical words and their relations, then a Word Graph guided Summarization model (WGSum) is designed to generate impressions with the help of the word graph.

Text Summarization

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

1 code implementation11 Oct 2021 Benyou Wang, Qianqian Xie, Jiahuan Pei, Zhihong Chen, Prayag Tiwari, Zhao Li, Jie Fu

In this paper, we summarize the recent progress of pre-trained language models in the biomedical domain and their applications in biomedical downstream tasks.

Path-based Deep Network for Candidate Item Matching in Recommenders

no code implementations18 May 2021 Houyi Li, Zhihong Chen, Chenliang Li, Rong Xiao, Hongbo Deng, Peng Zhang, Yongchao Liu, Haihong Tang

PDN utilizes Trigger Net to capture the user's interest in each of his/her interacted item, and Similarity Net to evaluate the similarity between each interacted item and the target item based on these items' profile and CF information.

Recommendation Systems Retrieval

Generalizable Representation Learning for Mixture Domain Face Anti-Spoofing

no code implementations6 May 2021 Zhihong Chen, Taiping Yao, Kekai Sheng, Shouhong Ding, Ying Tai, Jilin Li, Feiyue Huang, Xinyu Jin

Face anti-spoofing approach based on domain generalization(DG) has drawn growing attention due to its robustness forunseen scenarios.

Domain Generalization Face Anti-Spoofing +2

Generating Radiology Reports via Memory-driven Transformer

2 code implementations EMNLP 2020 Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan

Particularly, this is the first work reporting the generation results on MIMIC-CXR to the best of our knowledge.

Text Generation

Attention-Guided Discriminative Region Localization and Label Distribution Learning for Bone Age Assessment

1 code implementation30 May 2020 Chao Chen, Zhihong Chen, Xinyu Jin, Lanjuan Li, William Speier, Corey W. Arnold

However, training with the global image underutilizes discriminative local information, while providing extra annotations is expensive and subjective.

Age Estimation regression

ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance

1 code implementation21 May 2020 Zhihong Chen, Rong Xiao, Chenliang Li, Gangfeng Ye, Haochuan Sun, Hongbo Deng

Most of ranking models are trained only with displayed items (most are hot items), but they are utilized to retrieve items in the entire space which consists of both displayed and non-displayed items (most are long-tail items).

Attribute Clustering +2

HoMM: Higher-order Moment Matching for Unsupervised Domain Adaptation

1 code implementation27 Dec 2019 Chao Chen, Zhihang Fu, Zhihong Chen, Sheng Jin, Zhaowei Cheng, Xinyu Jin, Xian-Sheng Hua

In particular, our proposed HoMM can perform arbitrary-order moment tensor matching, we show that the first-order HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL).

Unsupervised Domain Adaptation

Towards Self-similarity Consistency and Feature Discrimination for Unsupervised Domain Adaptation

no code implementations13 Apr 2019 Chao Chen, Zhihang Fu, Zhihong Chen, Zhaowei Cheng, Xinyu Jin, Xian-Sheng Hua

Recent advances in unsupervised domain adaptation mainly focus on learning shared representations by global distribution alignment without considering class information across domains.

Unsupervised Domain Adaptation

Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation

1 code implementation28 Aug 2018 Chao Chen, Zhihong Chen, Boyuan Jiang, Xinyu Jin

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities.

Domain Adaptation

Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation

1 code implementation15 May 2018 Delu Zeng, Yixuan He, Li Liu, Zhihong Chen, Jiabin Huang, Jie Chen, John Paisley

In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to deal with saliency detection with the tolerance of distortion.

Saliency Detection Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.