Search Results for author: Jihao Wu

Found 6 papers, 2 papers with code

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

1 code implementation14 Apr 2024 Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation5 Mar 2024 Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model

Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

no code implementations14 Dec 2023 Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

However, temporal and spatial dimensions are extremely critical in the logistics field, and this limitation may directly affect the precision of subsidy and pricing strategies.

DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

no code implementations27 Oct 2023 Chaowei Liu, Jichun Li, Yihua Teng, Chaoqun Wang, Nuo Xu, Jihao Wu, Dandan Tu

Thus, we propose DocStormer, a novel algorithm designed to restore multi-degraded colored documents to their potential pristine PDF.

Binarization

Efficient Image Captioning for Edge Devices

no code implementations18 Dec 2022 Ning Wang, Jiangrong Xie, Hang Luo, Qinglin Cheng, Jihao Wu, Mingbo Jia, Linlin Li

On the other hand, we transfer the image-text retrieval design of CLIP to image captioning scenarios by devising a novel visual concept extractor and a cross-modal modulator.

Image Captioning Retrieval +1

Controllable Image Captioning via Prompting

no code implementations4 Dec 2022 Ning Wang, Jiahao Xie, Jihao Wu, Mingbo Jia, Linlin Li

Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e. g., describing the image in a rough or detailed manner, in a factual or emotional view, etc.

controllable image captioning Prompt Engineering

Cannot find the paper you are looking for? You can Submit a new open access paper.