Search Results for author: Kevin Zhang

Found 21 papers, 5 papers with code

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

no code implementations • 10 Apr 2024 • Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

no code implementations • 6 Apr 2024 • Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla

In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis.

Autonomous Navigation Novel View Synthesis

Paper
Add Code

Self-Healing Effects in OAM Beams Observed on a 28 GHz Experimental Link

no code implementations • 7 Feb 2024 • Marek Klemes, Lan Hu, Greg Bowles, Mohammad Akbari, Soulideth Thirakoune, Michael Schwartzman, Kevin Zhang, Tan Huy Ho, David Wessel, Wen Tong

The OAM beams have a helical phase and polarization structure and have conical amplitude shape in the far field.

Paper
Add Code

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

no code implementations • 5 Feb 2024 • Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring.

3D Scene Reconstruction Neural Rendering +2

Paper
Add Code

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations • 26 Dec 2023 • Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

Paper
Add Code

ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

no code implementations • 7 Dec 2023 • Haoming Cai, Jingxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie, Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler

tmospheric turbulence presents a significant challenge in long-range imaging.

Language Modelling Video Restoration

Paper
Add Code

A Scalable Training Strategy for Blind Multi-Distribution Noise Removal

no code implementations • 30 Oct 2023 • Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler

Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape.

Active Learning Denoising

Paper
Add Code

Seeing the World through Your Eyes

no code implementations • 15 Jun 2023 • Hadi AlZayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like.

Paper
Add Code

Machine learning reveals features of spinon Fermi surface

no code implementations • 5 Jun 2023 • Kevin Zhang, Shi Feng, Yuri D. Lensky, Nandini Trivedi, Eun-Ah Kim

With rapid progress in simulation of strongly interacting quantum Hamiltonians, the challenge in characterizing unknown phases becomes a bottleneck for scientific progress.

Paper
Add Code

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer

no code implementations • 4 May 2023 • Kevin Zhang, Vipul Mann, Venkat Venkatasubramanian

Additional analyses of G-MATT attention maps demonstrate the ability to retain chemistry knowledge without relying on excessively complex model architectures.

Retrosynthesis Single-step retrosynthesis

Paper
Add Code

PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters

1 code implementation • CVPR 2023 • Shuhong Chen, Kevin Zhang, Yichun Shi, Heng Wang, Yiheng Zhu, Guoxian Song, Sizhe An, Janus Kristjansson, Xiao Yang, Matthias Zwicker

We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters.

3D Architecture 3D Reconstruction +1

695

Paper
Code

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

1 code implementation • CVPR 2023 • Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.

3D Object Detection object-detection +2

104

Paper
Code

T-SEA: Transfer-based Self-Ensemble Attack on Object Detection

1 code implementation • CVPR 2023 • Hao Huang, Ziyan Chen, Huanran Chen, Yongtao Wang, Kevin Zhang

Then, we analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch to efficiently make use of the limited information and prevent the patch from overfitting.

Adversarial Attack Model Optimization +2

Paper
Code

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

2 code implementations • 20 Oct 2022 • Kevin Zhang, Zhiqiang Shen

(2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders?

Image Reconstruction

Paper
Code

MetaDIP: Accelerating Deep Image Prior with Meta Learning

no code implementations • 18 Sep 2022 • Kevin Zhang, Mingyang Xie, Maharshi Gor, Yi-Ting Chen, Yvonne Zhou, Christopher A. Metzler

Deep image prior (DIP) is a recently proposed technique for solving imaging inverse problems by fitting the reconstructed images to the output of an untrained convolutional neural network.

Denoising Meta-Learning +1

Paper
Add Code

Sequential Models in the Synthetic Data Vault

1 code implementation • 28 Jul 2022 • Kevin Zhang, Neha Patki, Kalyan Veeramachaneni

After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN.

Generative Adversarial Network

2,141

Paper
Code

Memory-efficient Learning for High-Dimensional MRI Reconstruction

no code implementations • 6 Mar 2021 • Ke Wang, Michael Kellman, Christopher M. Sandino, Kevin Zhang, Shreyas S. Vasanawala, Jonathan I. Tamir, Stella X. Yu, Michael Lustig

Deep learning (DL) based unrolled reconstructions have shown state-of-the-art performance for under-sampled magnetic resonance imaging (MRI).

MRI Reconstruction Vocal Bursts Intensity Prediction

Paper
Add Code

Fomite transmission and disinfection strategies for SARS-CoV-2 and related viruses

no code implementations • 23 May 2020 • Nicolas Castaño, Seth Cordts, Myra Kurosu Jalil, Kevin Zhang, Saisneha Koppaka, Alison Bick, Rajorshi Paul, Sindy KY Tang

Contaminated objects or surfaces, referred to as fomites, play a critical role in the spread of viruses, including SARS-CoV-2, the virus responsible for the COVID-19 pandemic.

Paper
Add Code

Memory-efficient Learning for Large-scale Computational Imaging

no code implementations • NeurIPS Workshop Deep_Invers 2019 • Michael Kellman, Kevin Zhang, Jon Tamir, Emrah Bostan, Michael Lustig, Laura Waller

Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks).

Experimental Design Super-Resolution

Paper
Add Code

Leveraging Multimodal Haptic Sensory Data for Robust Cutting

no code implementations • 27 Sep 2019 • Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer

In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events.

Paper
Add Code

Double Anchor R-CNN for Human Detection in a Crowd

no code implementations • 22 Sep 2019 • Kevin Zhang, Feng Xiong, Peize Sun, Li Hu, Boxun Li, Gang Yu

Double Anchor RPN is developed to capture body and head parts in pairs.

Human Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.