1 code implementation • EMNLP 2021 • Hyounghun Kim, Jialu Li, Mohit Bansal
In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers.
no code implementations • 11 Mar 2024 • Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with Auto-Generated Data, a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging.
no code implementations • 10 Feb 2024 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations in adult speech.
no code implementations • 5 Feb 2024 • Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme, Mohit Bansal
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions.
1 code implementation • 12 Jan 2024 • Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, QinGhua Hu
In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance.
no code implementations • 30 Oct 2023 • Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang
Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods.
1 code implementation • 24 Oct 2023 • Youshan Zhang, Jialu Li
Achieving high-performance audio denoising is still a challenging task in real-world applications.
no code implementations • 12 Oct 2023 • Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang
Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.
no code implementations • 13 Sep 2023 • Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios
In this study, we leverage the self-supervised learning model, Wav2Vec 2. 0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.
no code implementations • 21 May 2023 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models.
no code implementations • CVPR 2023 • Jialu Li, Mohit Bansal
We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step.
1 code implementation • 18 Oct 2022 • Youshan Zhang, Jialu Li
Audio denoising has been explored for decades using both traditional and deep learning-based methods.
1 code implementation • Findings (NAACL) 2022 • Jialu Li, Hao Tan, Mohit Bansal
Empirically, on the Room-Across-Room dataset, we show that our multilingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environment-agnostic visual representation.
1 code implementation • CVPR 2022 • Jialu Li, Hao Tan, Mohit Bansal
Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments.
Ranked #2 on Vision and Language Navigation on RxR (using extra training data)
1 code implementation • 29 Mar 2022 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
We demonstrate that our high-quality visualizations capture major types of family vocalization interactions, in categories indicative of mental, behavioral, and developmental health, for both labeled and unlabeled LB audio.
no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf
Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 13 Jul 2021 • Jiajie Zou, Yuran Zhang, Jialu Li, Xing Tian, Nai Ding
Furthermore, when readers scan a passage without a question in mind, their reading time is predicted by DNNs optimized for a word prediction task.
1 code implementation • NAACL 2021 • Jialu Li, Hao Tan, Mohit Bansal
One key challenge in this task is to ground instructions with the current visual information that the agent perceives.
no code implementations • EMNLP 2020 • Jialu Li, Esin Durmus, Claire Cardie
Online debate forums provide users a platform to express their opinions on controversial topics while being exposed to opinions from diverse set of viewpoints.
1 code implementation • 18 Apr 2018 • Jialu Li, Seung Jun Shin, Jing Ning, Jasmina Bojadzieva, Louise C. Strong, Wenyi Wang
We employed a family-wise likelihood that facilitates using genetic information inherited through the family pedigree and properly adjusted for the ascertainment bias that was inevitable in studies of rare diseases by using an inverse probability weighting scheme.
Applications