Search Results for author: Jialu Li

Found 21 papers, 11 papers with code

NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue

1 code implementation • EMNLP 2021 • Hyounghun Kim, Jialu Li, Mohit Bansal

In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.

Data Augmentation Dynamic Time Warping +1

Paper
Code

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

no code implementations • 11 Mar 2024 • Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging.

In-Context Learning

Paper
Add Code

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

no code implementations • 10 Feb 2024 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations in adult speech.

Self-Supervised Learning

Paper
Add Code

VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation

no code implementations • 5 Feb 2024 • Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme, Mohit Bansal

Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions.

Language Modelling Masked Language Modeling +2

Paper
Add Code

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

1 code implementation • 12 Jan 2024 • Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, QinGhua Hu

In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance.

Clustering Graph Clustering +1

Paper
Code

DPATD: Dual-Phase Audio Transformer for Denoising

no code implementations • 30 Oct 2023 • Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang

Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods.

Denoising Speech Enhancement

Paper
Add Code

Complex Image Generation SwinTransformer Network for Audio Denoising

1 code implementation • 24 Oct 2023 • Youshan Zhang, Jialu Li

Achieving high-performance audio denoising is still a challenging task in real-world applications.

Audio Denoising Denoising +1

Paper
Code

Multimodal Large Language Model for Visual Navigation

no code implementations • 12 Oct 2023 • Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang

Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.

Language Modelling Large Language Model +2

Paper
Add Code

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

no code implementations • 13 Sep 2023 • Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios

In this study, we leverage the self-supervised learning model, Wav2Vec 2. 0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

136

Paper
Code

Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

no code implementations • 21 May 2023 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models.

speaker-diarization Speaker Diarization

Paper
Add Code

Improving Vision-and-Language Navigation by Generating Future-View Image Semantics

no code implementations • CVPR 2023 • Jialu Li, Mohit Bansal

We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step.

Image Generation Navigate +3

Paper
Add Code

BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sounds

1 code implementation • 18 Oct 2022 • Youshan Zhang, Jialu Li

Audio denoising has been explored for decades using both traditional and deep learning-based methods.

Audio Denoising Denoising +4

Paper
Code

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

1 code implementation • Findings (NAACL) 2022 • Jialu Li, Hao Tan, Mohit Bansal

Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.

Navigate Representation Learning +2

Paper
Code

EnvEdit: Environment Editing for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Jialu Li, Hao Tan, Mohit Bansal

Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments.

Ranked #2 on Vision and Language Navigation on RxR (using extra training data)

Data Augmentation Navigate +1

Paper
Code

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

1 code implementation • 29 Mar 2022 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

We demonstrate that our high-quality visualizations capture major types of family vocalization interactions, in categories indicative of mental, behavioral, and developmental health, for both labeled and unlabeled LB audio.

speaker-diarization Speaker Diarization

Paper
Code

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Human Attention during Goal-directed Reading Comprehension Relies on Task Optimization

1 code implementation • 13 Jul 2021 • Jiajie Zou, Yuran Zhang, Jialu Li, Xing Tian, Nai Ding

Furthermore, when readers scan a passage without a question in mind, their reading time is predicted by DNNs optimized for a word prediction task.

Question Answering Reading Comprehension

Paper
Code

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

1 code implementation • NAACL 2021 • Jialu Li, Hao Tan, Mohit Bansal

One key challenge in this task is to ground instructions with the current visual information that the agent perceives.

Navigate Sentence +1

Paper
Code

Exploring the Role of Argument Structure in Online Debate Persuasion

no code implementations • EMNLP 2020 • Jialu Li, Esin Durmus, Claire Cardie

Online debate forums provide users a platform to express their opinions on controversial topics while being exposed to opinions from diverse set of viewpoints.

Persuasiveness

Paper
Add Code

Bayesian estimation of a semiparametric recurrent event model with applications to the penetrance estimation of multiple primary cancers in Li-Fraumeni Syndrome

1 code implementation • 18 Apr 2018 • Jialu Li, Seung Jun Shin, Jing Ning, Jasmina Bojadzieva, Louise C. Strong, Wenyi Wang

We employed a family-wise likelihood that facilitates using genetic information inherited through the family pedigree and properly adjusted for the ascertainment bias that was inevitable in studies of rare diseases by using an inverse probability weighting scheme.

Applications

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.