Search Results for author: Xiangyu Wu

Found 14 papers, 3 papers with code

The Solution for the CVPR2024 NICE Image Captioning Challenge

no code implementations • 19 Apr 2024 • Longfei Huang, Shupeng Zhong, Xiangyu Wu, Ruoxuan Li, QingGuo Chen, Yang Yang

Subsequently, we propose caption-level strategy for the high-quality caption data generated by the image caption models and integrate them with retrieval augmentation strategy into the template to compel the model to generate higher quality, more matching, and semantically enriched captions based on the retrieval augmentation prompts.

Image Captioning Retrieval

Paper
Add Code

The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge

no code implementations • 26 Mar 2024 • Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang

In this paper, we propose a solution for improving the quality of captions generated for figures in papers.

Caption Generation Image Captioning +2

Paper
Add Code

Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic Reasoning Task 2023

no code implementations • 10 Oct 2023 • Xiangyu Wu, Yang Yang, Shengdong Xu, Yifeng Wu, QingGuo Chen, Jianfeng Lu

At the data level, inspired by the challenge paper, we categorized the whole questions into eight types and utilized the llama-2-chat model to directly generate the type for each question in a zero-shot manner.

object-detection Object Detection +3

Paper
Add Code

The Solution for the CVPR2023 NICE Image Captioning Challenge

no code implementations • 10 Oct 2023 • Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu

In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge.

Contrastive Learning Image Captioning +1

Paper
Add Code

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.

Fairness Image Captioning

Paper
Add Code

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

no code implementations • 26 Jun 2023 • Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, Gaofeng Meng

However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the dynamic changes that occur even within the same live room.

Click-Through Rate Prediction Dynamic Time Warping +1

Paper
Add Code

CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval

no code implementations • 15 Apr 2023 • Yang Yang, Zhongtian Fu, Xiangyu Wu, Wenjie Li

To address this challenge, in this paper, we experimentally observe that the vision-language divergence may cause the existence of strong and weak modalities, and the hard cross-modal consistency cannot guarantee that strong modal instances' relationships are not affected by weak modality, resulting in the strong modal instances' relationships perturbed despite learned consistent representations. To this end, we propose a novel and directly Coordinated VisionLanguage Retrieval method (dubbed CoVLR), which aims to study and alleviate the desynchrony problem between the cross-modal alignment and single-modal cluster-preserving tasks.

Cross-Modal Retrieval Instance Search +1

Paper
Add Code

Generation-Guided Multi-Level Unified Network for Video Grounding

no code implementations • 14 Mar 2023 • Xing Cheng, Xiangyu Wu, Dong Shen, Hezheng Lin, Fan Yang

Video grounding aims to locate the timestamps best matching the query description within an untrimmed video.

Video Grounding

Paper
Add Code

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset

no code implementations • 19 Nov 2022 • Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng, Fan Yang, Size Li, Ruiji Fu, Zhongyuan Wang

Furthermore, based on this dataset, we propose an end-to-end model that jointly optimizes the video understanding objective with knowledge graph embedding, which can not only better inject factual knowledge into video understanding but also generate effective multi-modal entity embedding for KG.

Common Sense Reasoning Knowledge Graph Embedding +4

Paper
Add Code

Learning a Single Near-hover Position Controller for Vastly Different Quadcopters

no code implementations • 19 Sep 2022 • Dingqi Zhang, Antonio Loquercio, Xiangyu Wu, Ashish Kumar, Jitendra Malik, Mark W. Mueller

This paper proposes an adaptive near-hover position controller for quadcopters, which can be deployed to quadcopters of very different mass, size and motor constants, and also shows rapid adaptation to unknown disturbances during runtime.

Drone Controller Position

Paper
Add Code

Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

2 code implementations • 9 Sep 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen

In this paper, we propose a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (CAMoE) and a novel Dual Softmax Loss (DSL) to solve the two heterogeneity.

Ranked #9 on Video Retrieval on MSVD (using extra training data)

Retrieval Text Retrieval +1

Paper
Code

Real-time Geo-localization Using Satellite Imagery and Topography for Unmanned Aerial Vehicles

no code implementations • 7 Aug 2021 • Shuxiao Chen, Xiangyu Wu, Mark W. Mueller, Koushil Sreenath

The capabilities of autonomous flight with unmanned aerial vehicles (UAVs) have significantly increased in recent times.

Image-Based Localization

Paper
Add Code

MlTr: Multi-label Classification with Transformer

1 code implementation • 11 Jun 2021 • Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, Honglin Liu

The task of multi-label image classification is to recognize all the object labels presented in an image.

Ranked #12 on Multi-Label Classification on MS-COCO

Classification Multi-Label Classification +1

Paper
Code

CAT: Cross Attention in Vision Transformer

1 code implementation • 10 Jun 2021 • Hezheng Lin, Xing Cheng, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Qing Song, Wei Yuan

In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.

133

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.