Search Results for author: Trung Bui

Found 76 papers, 35 papers with code

Multimodal Intent Discovery from Livestream Videos

no code implementations Findings (NAACL) 2022 Adyasha Maharana, Quan Tran, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Mohit Bansal

We construct and present a new multimodal dataset consisting of software instructional livestreams and containing manual annotations for both detailed and abstract procedural intent that enable training and evaluation of joint video and text understanding models.

Intent Discovery Video Summarization +1

Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval

1 code implementation COLING 2022 Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh

We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.

Domain Adaptation graph construction +2

Offensive Content Detection via Synthetic Code-Switched Text

no code implementations COLING 2022 Cesa Salaam, Franck Dernoncourt, Trung Bui, Danda Rawat, Seunghyun Yoon

The prevalent use of offensive content in social media has become an important reason for concern for online platforms (customer service chat-boxes, social media platforms, etc).

Scaling Up Video Summarization Pretraining with Large Language Models

no code implementations4 Apr 2024 Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

Video Alignment Video Summarization

PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

1 code implementation8 Mar 2024 Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Totti Nguyen

CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder.

Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

no code implementations23 Feb 2024 Hyunjae Kim, Seunghyun Yoon, Trung Bui, Handong Zhao, Quan Tran, Franck Dernoncourt, Jaewoo Kang

Contrastive language-image pre-training (CLIP) models have demonstrated considerable success across various vision-language tasks, such as text-to-image retrieval, where the model is required to effectively process natural language input to produce an accurate visual output.

Image Captioning Image Retrieval +3

Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

no code implementations30 Nov 2023 Linzi Xing, Quan Tran, Fabian Caba, Franck Dernoncourt, Seunghyun Yoon, Zhaowen Wang, Trung Bui, Giuseppe Carenini

Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks.

Contrastive Learning Segmentation +2

LRM: Large Reconstruction Model for Single Image to 3D

1 code implementation8 Nov 2023 Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.

Image to 3D

Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

no code implementations15 Sep 2023 Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon

Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner.

Sentence

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

no code implementations24 Jul 2023 Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

no code implementations12 Apr 2023 Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen

The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i. e., beyond reported anecdotes), which is still missing or limited in current research.

Multilingual NLP Text Generation +1

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

1 code implementation ICCV 2023 Qiucheng Wu, Yujian Liu, Handong Zhao, Trung Bui, Zhe Lin, Yang Zhang, Shiyu Chang

We then impose spatial attention control by combining the attention over the entire text description and that over the local description of the particular object in the corresponding pixel region of that object.

Denoising Image Generation

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning

no code implementations15 Mar 2023 Yongil Kim, Yerin Hwang, Hyeongu Yun, Seunghyun Yoon, Trung Bui, Kyomin Jung

Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning.

Image Captioning

Moment Detection in Long Tutorial Videos

1 code implementation ICCV 2023 Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui

To study this problem, we propose the first dataset of untrimmed, long-form tutorial videos for the task of Moment Detection called the Behance Moment Detection (BMD) dataset.

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

1 code implementation CVPR 2023 Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang

Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.

Denoising Disentanglement

LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos

no code implementations12 Oct 2022 JieLin Qiu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Ding Zhao, Hailin Jin

Livestream videos have become a significant part of online learning, where design, digital marketing, creative painting, and other skills are taught by experienced experts in the sessions, making them valuable materials.

Marketing Segmentation

Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

no code implementations10 Oct 2022 JieLin Qiu, Jiacheng Zhu, Mengdi Xu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Bo Li, Ding Zhao, Hailin Jin

Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding.

Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision

1 code implementation COLING 2022 Khalil Mrini, Harpreet Singh, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Emilia Farcas, Ndapa Nakashole

The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.

Question Answering Retrieval

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

1 code implementation19 Jul 2022 Thang M. Pham, Seunghyun Yoon, Trung Bui, Anh Nguyen

While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone).

Information Retrieval Natural Language Understanding +5

Fine-grained Image Captioning with CLIP Reward

1 code implementation Findings (NAACL) 2022 Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Caption Generation Descriptive +5

MHMS: Multimodal Hierarchical Multimedia Summarization

no code implementations7 Apr 2022 JieLin Qiu, Jiacheng Zhu, Mengdi Xu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Bo Li, Ding Zhao, Hailin Jin

Multimedia summarization with multimodal output can play an essential role in real-world applications, i. e., automatically generating cover images and titles for news articles or providing introductions to online videos.

CAISE: Conversational Agent for Image Search and Editing

1 code implementation24 Feb 2022 Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal

To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.

Image Retrieval

Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models?

1 code implementation22 Oct 2021 Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen

We find two reasons why IM is not better than LOO: (1) deleting a single word from the input only marginally reduces a classifier's accuracy; and (2) a highly predictable word is always given near-zero attribution, regardless of its true importance to the classifier.

Causal Inference

StreamHover: Livestream Transcript Summarization and Annotation

1 code implementation EMNLP 2021 Sangwoo Cho, Franck Dernoncourt, Tim Ganter, Trung Bui, Nedim Lipka, Walter Chang, Hailin Jin, Jonathan Brandt, Hassan Foroosh, Fei Liu

With the explosive growth of livestream broadcasting, there is an urgent need for new summarization technology that enables us to create a preview of streamed content and tap into this wealth of knowledge.

Extractive Summarization

End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline

no code implementations4 Jul 2021 Tuan Manh Lai, Trung Bui, Doo Soon Kim

Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning.

coreference-resolution

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

1 code implementation ACL 2021 Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung

Also, we observe critical problems of the previous benchmark dataset (i. e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions.

Contrastive Learning Image Captioning +1

Learning by Planning: Language-Guided Global Image Editing

1 code implementation CVPR 2021 Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu

Recently, language-guided global image editing draws increasing attention with growing application potentials.

A Benchmark and Baseline for Language-Driven Image Editing

no code implementations5 Oct 2020 Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.

PhraseCut: Language-based Image Segmentation in the Wild

1 code implementation CVPR 2020 Chenyun Wu, Zhe Lin, Scott Cohen, Trung Bui, Subhransu Maji

We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77, 262 images and 345, 486 phrase-region pairs.

Attribute Image Segmentation +2

ISA: An Intelligent Shopping Assistant

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Tuan Manh Lai, Trung Bui, Nedim Lipka

Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people.

Open-Domain Question Answering with Pre-Constructed Question Spaces

no code implementations NAACL 2021 Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han

Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.

Information Retrieval Knowledge Graphs +2

History for Visual Dialog: Do we really need it?

2 code implementations ACL 2020 Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

Visual Dialog

DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator

no code implementations1 Apr 2020 Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog.

Retrieval Word Embeddings

A Multimodal Dialogue System for Conversational Image Editing

no code implementations16 Feb 2020 Tzu-Hsiang Lin, Trung Bui, Doo Soon Kim, Jean Oh

In this paper, we present a multimodal dialogue system for Conversational Image Editing.

Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation

1 code implementation EMNLP 2020 Kang Min Yoo, Hanbit Lee, Franck Dernoncourt, Trung Bui, Walter Chang, Sang-goo Lee

Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models complement the training dataset, benefit NLP tasks.

Data Augmentation dialog state tracking +4

Propagate-Selector: Detecting Supporting Sentences for Question Answering via Graph Neural Networks

1 code implementation LREC 2020 Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

In this study, we propose a novel graph neural network called propagate-selector (PS), which propagates information over sentences to understand information that cannot be inferred when considering sentences in isolation.

Answer Selection Sentence

Expressing Visual Relationships via Language

1 code implementation ACL 2019 Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal

To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions.

Image Captioning Retrieval

A Compare-Aggregate Model with Latent Clustering for Answer Selection

no code implementations30 May 2019 Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing.

Answer Selection Clustering +3

Dance Dance Generation: Motion Transfer for Internet Videos

no code implementations30 Mar 2019 Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

This work presents computational methods for transferring body movements from one person to another with videos collected in the wild.

Supervised Transfer Learning for Product Information Question Answering

no code implementations8 Jan 2019 Tuan Manh Lai, Trung Bui, Nedim Lipka, Sheng Li

Popular e-commerce websites such as Amazon offer community question answering systems for users to pose product related questions and experienced customers may provide answers voluntarily.

Community Question Answering Transfer Learning

A System for Automated Image Editing from Natural Language Commands

no code implementations3 Dec 2018 Jacqueline Brixey, Ramesh Manuvinakurike, Nham Le, Tuan Lai, Walter Chang, Trung Bui

This work presents the task of modifying images in an image editing program using natural language written commands.

A Review on Deep Learning Techniques Applied to Answer Selection

no code implementations COLING 2018 Tuan Manh Lai, Trung Bui, Sheng Li

Given a question and a set of candidate answers, answer selection is the task of identifying which of the candidates answers the question correctly.

Answer Selection Community Question Answering +3

Conversational Image Editing: Incremental Intent Identification in a New Dialogue Task

no code implementations WS 2018 Ramesh Manuvinakurike, Trung Bui, Walter Chang, Kallirroi Georgila

We present {``}conversational image editing{''}, a novel real-world application domain combining dialogue, visual information, and the use of computer vision.

General Classification

Visual to Sound: Generating Natural Sound for Videos in the Wild

3 code implementations CVPR 2018 Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world.

AMC: Attention guided Multi-modal Correlation Learning for Image Search

2 code implementations CVPR 2017 Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia

According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities.

Image Retrieval

Proposing Plausible Answers for Open-ended Visual Question Answering

no code implementations20 Oct 2016 Omid Bakhshandeh, Trung Bui, Zhe Lin, Walter Chang

One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual understanding through its answers to natural language questions about images.

Graph Matching Open-Ended Question Answering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.