no code implementations • 10 Apr 2024 • Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato
This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Apr 2024 • Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis.
no code implementations • 7 Feb 2024 • Marek Klemes, Lan Hu, Greg Bowles, Mohammad Akbari, Soulideth Thirakoune, Michael Schwartzman, Kevin Zhang, Tan Huy Ho, David Wessel, Wen Tong
The OAM beams have a helical phase and polarization structure and have conical amplitude shape in the far field.
no code implementations • 5 Feb 2024 • Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler
Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring.
no code implementations • 26 Dec 2023 • Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang
However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.
no code implementations • 7 Dec 2023 • Haoming Cai, Jingxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie, Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler
tmospheric turbulence presents a significant challenge in long-range imaging.
no code implementations • 30 Oct 2023 • Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler
Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape.
no code implementations • 15 Jun 2023 • Hadi AlZayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang
The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like.
no code implementations • 5 Jun 2023 • Kevin Zhang, Shi Feng, Yuri D. Lensky, Nandini Trivedi, Eun-Ah Kim
With rapid progress in simulation of strongly interacting quantum Hamiltonians, the challenge in characterizing unknown phases becomes a bottleneck for scientific progress.
no code implementations • 4 May 2023 • Kevin Zhang, Vipul Mann, Venkat Venkatasubramanian
Additional analyses of G-MATT attention maps demonstrate the ability to retain chemistry knowledge without relying on excessively complex model architectures.
1 code implementation • CVPR 2023 • Shuhong Chen, Kevin Zhang, Yichun Shi, Heng Wang, Yiheng Zhu, Guoxian Song, Sizhe An, Janus Kristjansson, Xiao Yang, Matthias Zwicker
We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters.
1 code implementation • CVPR 2023 • Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang
Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.
1 code implementation • CVPR 2023 • Hao Huang, Ziyan Chen, Huanran Chen, Yongtao Wang, Kevin Zhang
Then, we analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch to efficiently make use of the limited information and prevent the patch from overfitting.
2 code implementations • 20 Oct 2022 • Kevin Zhang, Zhiqiang Shen
(2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders?
no code implementations • 18 Sep 2022 • Kevin Zhang, Mingyang Xie, Maharshi Gor, Yi-Ting Chen, Yvonne Zhou, Christopher A. Metzler
Deep image prior (DIP) is a recently proposed technique for solving imaging inverse problems by fitting the reconstructed images to the output of an untrained convolutional neural network.
1 code implementation • 28 Jul 2022 • Kevin Zhang, Neha Patki, Kalyan Veeramachaneni
After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN.
no code implementations • 6 Mar 2021 • Ke Wang, Michael Kellman, Christopher M. Sandino, Kevin Zhang, Shreyas S. Vasanawala, Jonathan I. Tamir, Stella X. Yu, Michael Lustig
Deep learning (DL) based unrolled reconstructions have shown state-of-the-art performance for under-sampled magnetic resonance imaging (MRI).
no code implementations • 23 May 2020 • Nicolas Castaño, Seth Cordts, Myra Kurosu Jalil, Kevin Zhang, Saisneha Koppaka, Alison Bick, Rajorshi Paul, Sindy KY Tang
Contaminated objects or surfaces, referred to as fomites, play a critical role in the spread of viruses, including SARS-CoV-2, the virus responsible for the COVID-19 pandemic.
no code implementations • NeurIPS Workshop Deep_Invers 2019 • Michael Kellman, Kevin Zhang, Jon Tamir, Emrah Bostan, Michael Lustig, Laura Waller
Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks).
no code implementations • 27 Sep 2019 • Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer
In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events.
no code implementations • 22 Sep 2019 • Kevin Zhang, Feng Xiong, Peize Sun, Li Hu, Boxun Li, Gang Yu
Double Anchor RPN is developed to capture body and head parts in pairs.