Search Results for author: Hao Feng

Found 40 papers, 11 papers with code

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations • 19 Apr 2024 • Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

Paper
Add Code

Progressive Multi-modal Conditional Prompt Tuning

no code implementations • 18 Apr 2024 • Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

Initialization is responsible for encoding image and text using a VLM, followed by a feature filter that selects text features similar to image.

Paper
Add Code

Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

no code implementations • 16 Apr 2024 • Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee

Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts.

Self-Supervised Learning

Paper
Add Code

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

1 code implementation • 15 Apr 2024 • Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.

Question Answering Visual Question Answering (VQA)

Paper
Code

Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications

no code implementations • 9 Apr 2024 • Huawei Sun, Hao Feng, Gianfranco Mauro, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille

Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors.

Depth Estimation Multi-Task Learning +3

Paper
Add Code

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation • 29 Feb 2024 • Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

Paper
Code

UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

no code implementations • 2 Dec 2023 • Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, Elke Rundensteiner

To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.

Paper
Add Code

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations • 23 Nov 2023 • Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Paper
Add Code

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

1 code implementation • 22 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li

Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.

document understanding Instruction Following +3

Paper
Code

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

no code implementations • 20 Nov 2023 • Hao Feng, Qi Liu, Hao liu, Wengang Zhou, Houqiang Li, Can Huang

This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.

document understanding Language Modelling +2

Paper
Add Code

Progressive Recurrent Network for Shadow Removal

no code implementations • 1 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li

To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).

Image Shadow Removal Shadow Removal

Paper
Add Code

Sign Language Translation with Iterative Prototype

no code implementations • ICCV 2023 • Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li

Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement.

Ranked #5 on Sign Language Translation on CSL-Daily

Sentence Sign Language Translation +1

Paper
Add Code

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

no code implementations • 19 Aug 2023 • Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.

Instruction Following Text Detection +1

Paper
Add Code

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations • ICCV 2023 • Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Paper
Add Code

Mobile Supply: The Last Piece of Jigsaw of Recommender System

no code implementations • 7 Aug 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu

In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.

Recommendation Systems Re-Ranking

Paper
Add Code

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

no code implementations • 18 Jul 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.

Decision Making Recommendation Systems +1

Paper
Add Code

Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection

no code implementations • 17 Jul 2023 • Huawei Sun, Hao Feng, Georg Stettinger, Lorenzo Servadei, Robert Wille

In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks.

Autonomous Driving Object +2

Paper
Add Code

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

no code implementations • 16 Jun 2023 • Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai

We provide a gradient backpropagation highway for low-rank adapters which eliminates the need for expensive backpropagation through the frozen pre-trained model, resulting in substantial savings of training memory and training time.

Transfer Learning

Paper
Add Code

Active RIS-Assisted mmWave Indoor Signal Enhancement Based on Transparent RIS

no code implementations • 16 May 2023 • Hao Feng, Yuping Zhao

In this paper, a novel RIS-assisted mmWave indoor enhancement scheme is proposed, in which a transparent RIS is deployed on the glass to enhance mmWave indoor signals, and three assisted transmission scenarios, namely passive RIS (PRIS), active RIS (ARIS), and a novel hybrid RIS (HRIS) are proposed.

Paper
Add Code

Model-Based Monitoring and State Estimation for Digital Twins: The Kalman Filter

no code implementations • 29 Apr 2023 • Hao Feng, Cláudio Gomes, Peter Gorm Larsen

A digital twin (DT) monitors states of the physical twin (PT) counterpart and provides a number of benefits such as advanced visualizations, fault detection capabilities, and reduced maintenance cost.

Anomaly Detection Fault Detection

Paper
Add Code

mmWave RIS Phase Shift Feedback Based on Knowledge Base Autoencoder Framework

no code implementations • 27 Apr 2023 • Hao Feng, Yuting Xu, Yuping Zhao

Then the knowledge base vectors index is obtained by calculating the similarity between feature vectors and knowledge base vectors and transmitted to the RIS.

Paper
Add Code

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design

no code implementations • 25 Apr 2023 • Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li

Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials.

Evolutionary Algorithms

Paper
Add Code

DocMAE: Document Image Rectification via Self-supervised Representation Learning

1 code implementation • 20 Apr 2023 • Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.

Representation Learning Self-Supervised Learning

Paper
Code

Deep Unrestricted Document Image Rectification

1 code implementation • 18 Apr 2023 • Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Ranked #1 on Local Distortion on DocUNet

Local Distortion

345

Paper
Code

PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation

1 code implementation • 20 Feb 2023 • Mingzhe Liu, Han Huang, Hao Feng, Leilei Sun, Bowen Du, Yanjie Fu

Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior.

Imputation Noise Estimation

Paper
Code

Recurrent Generic Contour-based Instance Segmentation with Progressive Learning

1 code implementation • 21 Jan 2023 • Hao Feng, Keyi Zhou, Wengang Zhou, Yufei Yin, Jiajun Deng, Qi Sun, Houqiang Li

It maintains a single estimate of the contour that is progressively deformed toward the object boundary.

Ranked #1 on Semantic Contour Prediction on Sbd val

Instance Segmentation Lane Detection +6

Paper
Code

Geometric Representation Learning for Document Image Rectification

2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Paper
Code

Utilizing Explainable AI for improving the Performance of Neural Networks

no code implementations • 7 Oct 2022 • Huawei Sun, Lorenzo Servadei, Hao Feng, Michael Stephan, Robert Wille, Avik Santra

To address this, Explainable Artificial Intelligence (XAI) has been developing as a field that aims to improve the transparency of the model and increase their trustworthiness.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

Paper
Add Code

Towards Efficient Modularity in Industrial Drying: A Combinatorial Optimization Viewpoint

no code implementations • 5 Oct 2022 • Alisina Bayati, Amber Srivastava, Amir Malvandi, Hao Feng, Srinivasa Salapaka

The industrial drying process consumes approximately 12% of the total energy used in manufacturing, with the potential for a 40% reduction in energy usage through improved process controls and the development of new drying technologies.

Combinatorial Optimization Total Energy

Paper
Add Code

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

no code implementations • 4 Oct 2022 • Ziyang Liu, Chaokun Wang, Hao Feng, Lingfei Wu, Liqun Yang

In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models.

Knowledge Distillation

Paper
Add Code

TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

no code implementations • LREC 2022 • Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke Rundensteiner

To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks.

slot-filling Slot Filling

Paper
Add Code

Cross-modal Learning of Graph Representations using Radar Point Cloud for Long-Range Gesture Recognition

no code implementations • 31 Mar 2022 • Souvik Hazra, Hao Feng, Gamze Naz Kiprit, Michael Stephan, Lorenzo Servadei, Robert Wille, Robert Weigel, Avik Santra

Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction.

Gesture Recognition

Paper
Add Code

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

331

Paper
Code

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

331

Paper
Code

Rethinking Temperature in Graph Contrastive Learning

no code implementations • 29 Sep 2021 • Ziyang Liu, Hao Feng, Chaokun Wang

In this paper, we investigate and discuss what a good representation should be for a general loss (InfoNCE) in graph contrastive learning.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

ES-Net: Erasing Salient Parts to Learn More in Re-Identification

no code implementations • 10 Mar 2021 • Dong Shen, Shuai Zhao, Jinming Hu, Hao Feng, Deng Cai, Xiaofei He

In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.

Paper
Add Code

Complementary Pseudo Labels For Unsupervised Domain Adaptation On Person Re-identification

no code implementations • 29 Jan 2021 • Hao Feng, Minghao Chen, Jinming Hu, Dong Shen, Haifeng Liu, Deng Cai

In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels.

Person Re-Identification Unsupervised Domain Adaptation

Paper
Add Code

High-Performance Discriminative Tracking With Transformers

no code implementations • ICCV 2021 • Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu

End-to-end discriminative trackers improve the state of the art significantly, yet the improvement in robustness and efficiency is restricted by the conventional discriminative model, i. e., least-squares based regression.

Object Visual Tracking +1

Paper
Add Code

STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

1 code implementation • 18 Jun 2020 • Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang

We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.

Taxonomy Expansion

Paper
Code

Discovering Protagonist of Sentiment with Aspect Reconstructed Capsule Network

no code implementations • 23 Dec 2019 • Chi Xu, Hao Feng, Guoxin Yu, Min Yang, Xiting Wang, Xiang Ao

In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown.

Sentence Sentiment Analysis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.