Search Results for author: Jihao Wu

Found 6 papers, 2 papers with code

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

1 code implementation • 14 Apr 2024 • Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

Paper
Code

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation • 5 Mar 2024 • Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model

Paper
Code

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

no code implementations • 14 Dec 2023 • Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

However, temporal and spatial dimensions are extremely critical in the logistics field, and this limitation may directly affect the precision of subsidy and pricing strategies.

Paper
Add Code

DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

no code implementations • 27 Oct 2023 • Chaowei Liu, Jichun Li, Yihua Teng, Chaoqun Wang, Nuo Xu, Jihao Wu, Dandan Tu

Thus, we propose DocStormer, a novel algorithm designed to restore multi-degraded colored documents to their potential pristine PDF.

Binarization

Paper
Add Code

Efficient Image Captioning for Edge Devices

no code implementations • 18 Dec 2022 • Ning Wang, Jiangrong Xie, Hang Luo, Qinglin Cheng, Jihao Wu, Mingbo Jia, Linlin Li

On the other hand, we transfer the image-text retrieval design of CLIP to image captioning scenarios by devising a novel visual concept extractor and a cross-modal modulator.

Image Captioning Retrieval +1

Paper
Add Code

Controllable Image Captioning via Prompting

no code implementations • 4 Dec 2022 • Ning Wang, Jiahao Xie, Jihao Wu, Mingbo Jia, Linlin Li

Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e. g., describing the image in a rough or detailed manner, in a factual or emotional view, etc.

controllable image captioning Prompt Engineering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.