Search Results for author: Houdong Hu

Found 13 papers, 6 papers with code

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

no code implementations10 Nov 2023 Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Multi-Task Learning object-detection +1

Florence: A New Foundation Model for Computer Vision

1 code implementation22 Nov 2021 Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Action Classification Action Recognition In Videos +12

Image Scene Graph Generation (SGG) Benchmark

1 code implementation27 Jul 2021 Xiaotian Han, Jianwei Yang, Houdong Hu, Lei Zhang, Jianfeng Gao, Pengchuan Zhang

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.

Attribute Graph Generation +6

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations ECCV 2020 Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

 Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

Applications of Generative Adversarial Models in Visual Search Reformulation

no code implementations28 Oct 2019 Kyle Xiao, Houdong Hu, Yan Wang

Query reformulation is the process by which a input search query is refined by the user to match documents outside the original top-n results.

Unified Vision-Language Pre-Training for Image Captioning and VQA

3 code implementations24 Sep 2019 Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao

The model is unified in that (1) it can be fine-tuned for either vision-language generation (e. g., image captioning) or understanding (e. g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.

Image Captioning Question Answering +2

Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators

no code implementations22 Sep 2019 Kuang-Huei Lee, Hamid Palangi, Xi Chen, Houdong Hu, Jianfeng Gao

In this work, we tackle two fundamental language-and-vision tasks: image-text matching and image captioning, and demonstrate that neural scene graph generators can learn effective visual relation features to facilitate grounding language to visual relations and subsequently improve the two end applications.

Image Captioning Image-text matching +2

An Universal Image Attractiveness Ranking Framework

no code implementations12 Apr 2018 Ning Ma, Alexey Volkov, Aleksandr Livshits, Pawel Pietrusinski, Houdong Hu, Mark Bolin

We propose a new framework to rank image attractiveness using a novel pairwise deep network trained with a large set of side-by-side multi-labeled image pairs from a web image index.

Attribute

Stacked Cross Attention for Image-Text Matching

6 code implementations ECCV 2018 Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He

Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable.

Image Retrieval Image-text matching +5

Web-Scale Responsive Visual Search at Bing

no code implementations14 Feb 2018 Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti

In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing.

Learning-To-Rank

Cannot find the paper you are looking for? You can Submit a new open access paper.