Search Results for author: Hao Feng

Found 40 papers, 11 papers with code

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations19 Apr 2024 Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

Progressive Multi-modal Conditional Prompt Tuning

no code implementations18 Apr 2024 Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

Initialization is responsible for encoding image and text using a VLM, followed by a feature filter that selects text features similar to image.

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

1 code implementation15 Apr 2024 Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.

Question Answering Visual Question Answering (VQA)

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation29 Feb 2024 Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

no code implementations2 Dec 2023 Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, Elke Rundensteiner

To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations23 Nov 2023 Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

1 code implementation22 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li

Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.

document understanding Instruction Following +3

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

no code implementations20 Nov 2023 Hao Feng, Qi Liu, Hao liu, Wengang Zhou, Houqiang Li, Can Huang

This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.

document understanding Language Modelling +2

Progressive Recurrent Network for Shadow Removal

no code implementations1 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li

To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).

Image Shadow Removal Shadow Removal

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

no code implementations19 Aug 2023 Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.

Instruction Following Text Detection +1

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations ICCV 2023 Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Mobile Supply: The Last Piece of Jigsaw of Recommender System

no code implementations7 Aug 2023 Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu

In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.

Recommendation Systems Re-Ranking

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

no code implementations18 Jul 2023 Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.

Decision Making Recommendation Systems +1

Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection

no code implementations17 Jul 2023 Huawei Sun, Hao Feng, Georg Stettinger, Lorenzo Servadei, Robert Wille

In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks.

Autonomous Driving Object +2

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

no code implementations16 Jun 2023 Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai

We provide a gradient backpropagation highway for low-rank adapters which eliminates the need for expensive backpropagation through the frozen pre-trained model, resulting in substantial savings of training memory and training time.

Transfer Learning

Active RIS-Assisted mmWave Indoor Signal Enhancement Based on Transparent RIS

no code implementations16 May 2023 Hao Feng, Yuping Zhao

In this paper, a novel RIS-assisted mmWave indoor enhancement scheme is proposed, in which a transparent RIS is deployed on the glass to enhance mmWave indoor signals, and three assisted transmission scenarios, namely passive RIS (PRIS), active RIS (ARIS), and a novel hybrid RIS (HRIS) are proposed.

Model-Based Monitoring and State Estimation for Digital Twins: The Kalman Filter

no code implementations29 Apr 2023 Hao Feng, Cláudio Gomes, Peter Gorm Larsen

A digital twin (DT) monitors states of the physical twin (PT) counterpart and provides a number of benefits such as advanced visualizations, fault detection capabilities, and reduced maintenance cost.

Anomaly Detection Fault Detection

mmWave RIS Phase Shift Feedback Based on Knowledge Base Autoencoder Framework

no code implementations27 Apr 2023 Hao Feng, Yuting Xu, Yuping Zhao

Then the knowledge base vectors index is obtained by calculating the similarity between feature vectors and knowledge base vectors and transmitted to the RIS.

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design

no code implementations25 Apr 2023 Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li

Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials.

Evolutionary Algorithms

DocMAE: Document Image Rectification via Self-supervised Representation Learning

1 code implementation20 Apr 2023 Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.

Representation Learning Self-Supervised Learning

Deep Unrestricted Document Image Rectification

1 code implementation18 Apr 2023 Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Local Distortion

PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation

1 code implementation20 Feb 2023 Mingzhe Liu, Han Huang, Hao Feng, Leilei Sun, Bowen Du, Yanjie Fu

Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior.

Imputation Noise Estimation

Geometric Representation Learning for Document Image Rectification

2 code implementations15 Oct 2022 Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Utilizing Explainable AI for improving the Performance of Neural Networks

no code implementations7 Oct 2022 Huawei Sun, Lorenzo Servadei, Hao Feng, Michael Stephan, Robert Wille, Avik Santra

To address this, Explainable Artificial Intelligence (XAI) has been developing as a field that aims to improve the transparency of the model and increase their trustworthiness.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

Towards Efficient Modularity in Industrial Drying: A Combinatorial Optimization Viewpoint

no code implementations5 Oct 2022 Alisina Bayati, Amber Srivastava, Amir Malvandi, Hao Feng, Srinivasa Salapaka

The industrial drying process consumes approximately 12% of the total energy used in manufacturing, with the potential for a 40% reduction in energy usage through improved process controls and the development of new drying technologies.

Combinatorial Optimization Total Energy

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

no code implementations4 Oct 2022 Ziyang Liu, Chaokun Wang, Hao Feng, Lingfei Wu, Liqun Yang

In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models.

Knowledge Distillation

TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

no code implementations LREC 2022 Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke Rundensteiner

To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks.

slot-filling Slot Filling

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations28 Oct 2021 Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations25 Oct 2021 Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

Rethinking Temperature in Graph Contrastive Learning

no code implementations29 Sep 2021 Ziyang Liu, Hao Feng, Chaokun Wang

In this paper, we investigate and discuss what a good representation should be for a general loss (InfoNCE) in graph contrastive learning.

Contrastive Learning Self-Supervised Learning

ES-Net: Erasing Salient Parts to Learn More in Re-Identification

no code implementations10 Mar 2021 Dong Shen, Shuai Zhao, Jinming Hu, Hao Feng, Deng Cai, Xiaofei He

In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.

Complementary Pseudo Labels For Unsupervised Domain Adaptation On Person Re-identification

no code implementations29 Jan 2021 Hao Feng, Minghao Chen, Jinming Hu, Dong Shen, Haifeng Liu, Deng Cai

In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels.

Person Re-Identification Unsupervised Domain Adaptation

High-Performance Discriminative Tracking With Transformers

no code implementations ICCV 2021 Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu

End-to-end discriminative trackers improve the state of the art significantly, yet the improvement in robustness and efficiency is restricted by the conventional discriminative model, i. e., least-squares based regression.

Object Visual Tracking +1

STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

1 code implementation18 Jun 2020 Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang

We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.

Taxonomy Expansion

Discovering Protagonist of Sentiment with Aspect Reconstructed Capsule Network

no code implementations23 Dec 2019 Chi Xu, Hao Feng, Guoxin Yu, Min Yang, Xiting Wang, Xiang Ao

In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown.

Sentence Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.