Search Results for author: Siliang Tang

Found 85 papers, 28 papers with code

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations5 Mar 2024 Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

Efficient Tuning and Inference for Large Language Models on Textual Graphs

no code implementations28 Jan 2024 Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang

In this paper, we propose ENGINE, a parameter- and memory-efficient fine-tuning method for textual graphs with an LLM encoder.

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

1 code implementation22 Nov 2023 Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Attribute counterfactual +3

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

no code implementations21 Nov 2023 Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang

Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.

Domain Adaptation Transfer Learning

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

no code implementations21 Nov 2023 Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.

Logical Reasoning

Negative Sampling with Adaptive Denoising Mixup for Knowledge Graph Embedding

1 code implementation15 Oct 2023 Xiangnan Chen, Wen Zhang, Zhen Yao, Mingyang Chen, Siliang Tang

Most existing negative sampling methods assume that non-existent triples with high scores are high-quality negative triples.

Denoising Knowledge Graph Completion +2

GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning

no code implementations11 Oct 2023 Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, Dian Jiao, Siliang Tang

These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance.

Attribute Specificity +1

Improving Vision Anomaly Detection with the Guidance of Language Modality

1 code implementation4 Oct 2023 Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang

To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix.

Anomaly Detection Defect Detection +1

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

no code implementations19 Aug 2023 Kaihang Pan, Juncheng Li, Hongye Song, Hao Fei, Wei Ji, Shuo Zhang, Jun Lin, Xiaozhong Liu, Siliang Tang

Recent studies have shown that dense retrieval models, lacking dedicated training data, struggle to perform well across diverse retrieval tasks, as different retrieval tasks often entail distinct search intents.

Retrieval Text-to-Image Generation

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

no code implementations15 Aug 2023 Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang

Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.

Image Inpainting

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation8 Aug 2023 Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Image Captioning Instruction Following

Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion

no code implementations2 Aug 2023 Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian

In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights.

MARIO: Model Agnostic Recipe for Improving OOD Generalization of Graph Contrastive Learning

1 code implementation24 Jul 2023 Yun Zhu, Haizhou Shi, Zhenshuo Zhang, Siliang Tang

In this work, we investigate the problem of out-of-distribution (OOD) generalization for unsupervised learning methods on graph data.

Contrastive Learning Data Augmentation

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

1 code implementation23 May 2023 Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang

GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document.

Relation Relation Extraction

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration

1 code implementation22 May 2023 Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Data Augmentation Prompt Engineering +1

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

no code implementations21 May 2023 Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang

To improve the consistency between adjacent frames of generated videos, we propose the Frame Difference Loss, which is incorporated during the training process.

Attribute Image Generation +2

Continual Vision-Language Representation Learning with Off-Diagonal Information

no code implementations11 May 2023 Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi Tian

Moreover, we empirically and theoretically demonstrate how SD leads to a performance decline for CLIP on cross-modal retrieval tasks.

Continual Learning Contrastive Learning +4

SkillQG: Learning to Generate Question for Reading Comprehension Assessment

no code implementations8 May 2023 Xiaoqiang Wang, Bang Liu, Siliang Tang, Lingfei Wu

We present $\textbf{$\texttt{SkillQG}$}$: a question generation framework with controllable comprehension types for assessing and improving machine reading comprehension models.

Machine Reading Comprehension Question Answering +2

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

1 code implementation ICCV 2023 Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.

Graph Generation Language Modelling +1

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

no code implementations16 Mar 2023 Boren Hu, Yun Zhu, Jiacheng Li, Siliang Tang

In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT.

Contrastive Learning Language Modelling +2

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations ICCV 2023 Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Structure-Aware Group Discrimination with Adaptive-View Graph Encoder: A Fast Graph Contrastive Learning Framework

no code implementations9 Mar 2023 Zhenshuo Zhang, Yun Zhu, Haizhou Shi, Siliang Tang

Albeit having gained significant progress lately, large-scale graph representation learning remains expensive to train and deploy for two main reasons: (i) the repetitive computation of multi-hop message passing and non-linearity in graph neural networks (GNNs); (ii) the computational cost of complex pairwise contrastive learning loss.

Contrastive Learning Graph Representation Learning

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

no code implementations7 Mar 2023 Jiacheng Li, Longhui Wei, Zongyuan Zhan, Xin He, Siliang Tang, Qi Tian, Yueting Zhuang

To better accelerate the generative transformers while keeping good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model.

Text-to-Image Generation

SGL-PT: A Strong Graph Learner with Graph Prompt Tuning

no code implementations24 Feb 2023 Yun Zhu, Jianhao Guo, Siliang Tang

And aiming for graph classification task, we unify pre-training and fine-tuning by designing a novel verbalizer-free prompting function, which reformulates the downstream task in a similar format as pretext task.

Graph Classification Graph Learning

A Study on ReLU and Softmax in Transformer

no code implementations13 Feb 2023 Kai Shen, Junliang Guo, Xu Tan, Siliang Tang, Rui Wang, Jiang Bian

This paper sheds light on the following points: 1) Softmax and ReLU use different normalization methods over elements which lead to different variances of results, and ReLU is good at dealing with a large number of key-value slots; 2) FFN and key-value memory are equivalent, and thus the Transformer can be viewed as a memory network where FFNs and self-attention networks are both key-value memories.

Document Translation

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations22 Jan 2023 Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention

no code implementations24 Nov 2022 Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang

Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.

LEMMA

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction

1 code implementation23 Nov 2022 Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin

Since the error rate of the incorrect sentence is usually low (e. g., 10\%), the correction model can only learn to correct on limited error tokens but trivially copy on most tokens (correct tokens), which harms the effective training of error correction.

Sentence speech-recognition +1

Distilling Task-specific Logical Rules from Large Pre-trained Models

no code implementations6 Oct 2022 Tao Chen, Luxin Liu, Xuepeng Jia, Baoliang Cui, Haihong Tang, Siliang Tang

Specifically, we borrow recent prompt-based language models as the knowledge expert to yield initial seed rules, and based on the formed high-quality instance pool that acts as an intermediary role, we keep teaching the expert to fit our task and learning task-specific logical rules.

Citation Trajectory Prediction via Publication Influence Representation Using Temporal Knowledge Graph

no code implementations2 Oct 2022 Chang Zong, Yueting Zhuang, Weiming Lu, Jian Shao, Siliang Tang

In this paper, we propose CTPIR, a new citation trajectory prediction framework that is able to represent the influence (the momentum of citation) of either new or existing publications using the history information of all their attributes.

Attribute Graph Embedding +1

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation3 Aug 2022 Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

no code implementations9 Jul 2022 Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.

Content-Based Image Retrieval counterfactual +2

Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning

no code implementations7 Jun 2022 Jiannan Guo, Yangyang Kang, Yu Duan, Xiaozhong Liu, Siliang Tang, Wenqiao Zhang, Kun Kuang, Changlong Sun, Fei Wu

Motivated by the industry practice of labeling data, we propose an innovative Inconsistency-based virtual aDvErsarial Active Learning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i. e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL.

Active Learning

Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile

1 code implementation4 Jun 2022 Dong Chen, Lingfei Wu, Siliang Tang, Xiao Yun, Bo Long, Yueting Zhuang

Moreover, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset.

Few-Shot Learning

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

no code implementations29 Apr 2022 Xiaoqiang Wang, Bang Liu, Siliang Tang, Lingfei Wu

Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input context of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts.

Question Generation Question-Generation +1

RoSA: A Robust Self-Aligned Framework for Node-Node Graph Contrastive Learning

1 code implementation29 Apr 2022 Yun Zhu, Jianhao Guo, Fei Wu, Siliang Tang

To the best of our awareness, RoSA is the first work focuses on the non-aligned node-node graph contrastive learning problem.

Contrastive Learning Node Classification +1

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation CVPR 2022 Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Feeding What You Need by Understanding What You Learned

no code implementations ACL 2022 Xiaoqiang Wang, Bang Liu, Fangli Xu, Bo Long, Siliang Tang, Lingfei Wu

In this paper, we argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data based on its learning status.

Machine Reading Comprehension

Learning To Learn by Jointly Optimizing Neural Architecture and Weights

no code implementations CVPR 2022 Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, Qi Tian

Existing NAS-based meta-learning methods apply a two-stage strategy, i. e., first searching architectures and then re-training meta-weights on the searched architecture.

Meta-Learning

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

1 code implementation1 Jan 2022 Xiaoqiang Wang, Lei Zhu, Siliang Tang, Huazhu Fu, Ping Li, Fei Wu, Yi Yang, Yueting Zhuang

The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data.

Depth Estimation object-detection +3

Relational Graph Learning for Grounded Video Description Generation

no code implementations2 Dec 2021 Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang

Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.

Graph Learning Hallucination +2

Consensus Graph Representation Learning for Better Grounded Image Captioning

no code implementations2 Dec 2021 Wenqiao Zhang, Haochen Shi, Siliang Tang, Jun Xiao, Qiang Yu, Yueting Zhuang

The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words.

Graph Representation Learning Hallucination +1

Learning to Generate Visual Questions with Noisy Supervision

1 code implementation NeurIPS 2021 Shen Kai, Lingfei Wu, Siliang Tang, Yueting Zhuang, Zhen He, Zhuoye Ding, Yun Xiao, Bo Long

The task of visual question generation (VQG) aims to generate human-like neural questions from an image and potentially other side information (e. g., answer type or the answer itself).

Question Generation Question-Generation +1

Self-Supervised Class Incremental Learning

no code implementations18 Nov 2021 Zixuan Ni, Siliang Tang, Yueting Zhuang

Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.

Class Incremental Learning Data Augmentation +2

Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning

no code implementations29 Sep 2021 Haizhou Shi, Youcai Zhang, Zijin Shen, Siliang Tang, Yaqian Li, Yandong Guo, Yueting Zhuang

This paper investigates the feasibility of federated representation learning under the constraints of communication cost and privacy protection.

Contrastive Learning Federated Learning +2

Revisiting Catastrophic Forgetting in Class Incremental Learning

no code implementations26 Jul 2021 Zixuan Ni, Haizhou Shi, Siliang Tang, Longhui Wei, Qi Tian, Yueting Zhuang

After investigating existing strategies, we observe that there is a lack of study on how to prevent the inter-phase confusion.

Class Incremental Learning Contrastive Learning +2

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

no code implementations ICCV 2021 Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang

Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.

Empower Distantly Supervised Relation Extraction with Collaborative Adversarial Training

1 code implementation21 Jun 2021 Tao Chen, Haochen Shi, Liyuan Liu, Siliang Tang, Jian Shao, Zhigang Chen, Yueting Zhuang

In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.

Relation Relation Extraction

CIL: Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

1 code implementation ACL 2021 Tao Chen, Haizhou Shi, Siliang Tang, Zhigang Chen, Fei Wu, Yueting Zhuang

The journey of reducing noise from distant supervision (DS) generated training data has been started since the DS was first introduced into the relation extraction (RE) task.

Relation Relation Extraction +1

Improving Weakly-supervised Object Localization via Causal Intervention

1 code implementation21 Apr 2021 Feifei Shao, Yawei Luo, Li Zhang, Lu Ye, Siliang Tang, Yi Yang, Jun Xiao

The recent emerged weakly supervised object localization (WSOL) methods can learn to localize an object in the image only using image-level labels.

Object Weakly-Supervised Object Localization

Differentiable Graph Optimization for Neural Architecture Search

no code implementations1 Jan 2021 Chengyue Huang, Lingfei Wu, Yadong Ding, Siliang Tang, Fangli Xu, Chang Zong, Chilie Tan, Yueting Zhuang

To this end, we learn a differentiable graph neural network as a surrogate model to rank candidate architectures, which enable us to obtain gradient w. r. t the input architectures.

Bayesian Optimization Neural Architecture Search

Connection-Adaptive Meta-Learning

no code implementations1 Jan 2021 Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Yueting Zhuang

In this paper, we aim to obtain better meta-learners by co-optimizing the architecture and meta-weights simultaneously.

Meta-Learning

Ask Question with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

no code implementations1 Jan 2021 Shen Kai, Lingfei Wu, Siliang Tang, Fangli Xu, Zhu Zhang, Yu Qiang, Yueting Zhuang

The task of visual question generation~(VQG) aims to generate human-like questions from an image and potentially other side information (e. g. answer type or the answer itself).

Graph-to-Sequence Question Generation +1

Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels

no code implementations ICCV 2021 Jiannan Guo, Haochen Shi, Yangyang Kang, Kun Kuang, Siliang Tang, Zhuoren Jiang, Changlong Sun, Fei Wu, Yueting Zhuang

Although current mainstream methods begin to combine SSL and AL (SSL-AL) to excavate the diverse expressions of unlabeled samples, these methods' fully supervised task models are still trained only with labeled data.

Active Learning

Run Away From your Teacher: a New Self-Supervised Approach Solving the Puzzle of BYOL

no code implementations1 Jan 2021 Haizhou Shi, Dongliang Luo, Siliang Tang, Jian Wang, Yueting Zhuang

Recently, a newly proposed self-supervised framework Bootstrap Your Own Latent (BYOL) seriously challenges the necessity of negative samples in contrastive-based learning frameworks.

Self-Supervised Learning

Robust Meta-learning with Noise via Eigen-Reptile

no code implementations1 Jan 2021 Dong Chen, Lingfei Wu, Siliang Tang, Fangli Xu, Juncheng Li, Chang Zong, Chilie Tan, Yueting Zhuang

In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step.

Few-Shot Learning

Run Away From your Teacher: Understanding BYOL by a Novel Self-Supervised Approach

no code implementations22 Nov 2020 Haizhou Shi, Dongliang Luo, Siliang Tang, Jian Wang, Yueting Zhuang

Recently, a newly proposed self-supervised framework Bootstrap Your Own Latent (BYOL) seriously challenges the necessity of negative samples in contrastive learning frameworks.

Contrastive Learning Self-Supervised Learning

MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination

no code implementations2 Oct 2020 Shengyu Zhang, Donghui Wang, Zhou Zhao, Siliang Tang, Di Xie, Fei Wu

In this paper, we investigate the problem of text-to-pedestrian synthesis, which has many potential applications in art, design, and video surveillance.

Generative Adversarial Network Image Generation

Two Step Joint Model for Drug Drug Interaction Extraction

no code implementations28 Aug 2020 Siliang Tang, Qi Zhang, Tianpeng Zheng, Mengdi Zhou, Zhan Chen, Lixing Shen, Xiang Ren, Yueting Zhuang, ShiLiang Pu, Fei Wu

When patients need to take medicine, particularly taking more than one kind of drug simultaneously, they should be alarmed that there possibly exists drug-drug interaction.

Drug–drug Interaction Extraction named-entity-recognition +4

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

no code implementations11 Aug 2020 Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang

In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.

Meta-Learning Visual Storytelling

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

no code implementations4 Jun 2020 Yesheng Xu, Ming Kong, Wenjia Xie, Runping Duan, Zhengqing Fang, Yuxiao Lin, Qiang Zhu, Siliang Tang, Fei Wu, Yu-Feng Yao

Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues.

General Classification Image Classification

Quda: Natural Language Queries for Visual Data Analytics

no code implementations7 May 2020 Siwei Fu, Kai Xiong, Xiaodong Ge, Siliang Tang, Wei Chen, Yingcai Wu

To address this challenge, we present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models.

Multi-Label Classification Natural Language Queries +1

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

no code implementations10 Mar 2020 Yankun Ren, Jianbin Lin, Siliang Tang, Jun Zhou, Shuang Yang, Yuan Qi, Xiang Ren

It can attack text classification models with a higher success rate than existing methods, and provide acceptable quality for humans in the meantime.

Adversarial Text General Classification +4

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

no code implementations29 Feb 2020 Shengyu Zhang, Tan Jiang, Qinghao Huang, Ziqi Tan, Zhou Zhao, Siliang Tang, Jin Yu, Hongxia Yang, Yi Yang, Fei Wu

Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge.

Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features

no code implementations9 Dec 2019 Du Chen, Zewei He, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, Michael Ying Yang, Siliang Tang, Yueting Zhuang

Firstly, we proposed a novel Orientation-Aware feature extraction and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional kernels (i. e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware features.

Computational Efficiency Image Super-Resolution

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

no code implementations CVPR 2020 Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang

Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e. g., television) using only visual observations.

Object reinforcement-learning +3

Learning Dynamic Context Augmentation for Global Entity Linking

2 code implementations IJCNLP 2019 Xiyuan Yang, Xiaotao Gu, Sheng Lin, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren

Despite of the recent success of collective entity linking (EL) methods, these "global" inference methods may yield sub-optimal results when the "all-mention coherence" assumption breaks, and often suffer from high computational cost at the inference stage, due to the complex search space.

Entity Disambiguation Entity Linking +1

Walking with MIND: Mental Imagery eNhanceD Embodied QA

no code implementations5 Aug 2019 Juncheng Li, Siliang Tang, Fei Wu, Yueting Zhuang

The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.

Informative Visual Storytelling with Cross-modal Rules

1 code implementation7 Jul 2019 Jiacheng Li, Haizhou Shi, Siliang Tang, Fei Wu, Yueting Zhuang

To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input.

Visual Storytelling

Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction

1 code implementation27 Dec 2018 Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, ShiLiang Pu, Fei Wu, Xiang Ren

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations.

Relation Relation Extraction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.