Search Results for author: Oncel Tuzel

Found 54 papers, 19 papers with code

Weight subcloning: direct initialization of transformers using larger pretrained ones

no code implementations • 14 Dec 2023 • Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari

The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed.

Image Classification Transfer Learning

Paper
Add Code

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

no code implementations • 30 Nov 2023 • Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?

Image Retrieval Retrieval +1

Paper
Add Code

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

no code implementations • 30 Nov 2023 • Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world.

Motion Synthesis

Paper
Add Code

HUGS: Human Gaussian Splats

no code implementations • 29 Nov 2023 • Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.

Neural Rendering Novel View Synthesis

Paper
Add Code

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

1 code implementation • 28 Nov 2023 • Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel

We further demonstrate the effectiveness of our multi-modal reinforced training by training a CLIP model based on ViT-B/16 image backbone and achieving +2. 9% average performance improvement on 38 evaluation benchmarks compared to the previous best.

Image Captioning Transfer Learning +1

355

Paper
Code

TiC-CLIP: Continual Training of CLIP Models

1 code implementation • 24 Oct 2023 • Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps.

Continual Learning Retrieval

Paper
Code

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

1 code implementation • 23 Oct 2023 • Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang

Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene.

Paper
Code

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

no code implementations • 23 Oct 2023 • Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari

By applying our method to SAM and CLIP, we obtain SAM-CLIP: a unified model that combines the capabilities of SAM and CLIP into a single vision transformer.

Continual Learning Multi-Task Learning +2

Paper
Add Code

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

no code implementations • 21 Oct 2023 • Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities.

Depth Estimation Image Classification +3

Paper
Add Code

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

1 code implementation • 6 Oct 2023 • Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar

Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications.

6,947

Paper
Code

Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

no code implementations • 18 Sep 2023 • Hsuan Su, Ting-yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON

no code implementations • 13 Jun 2023 • Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ramazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, Meng Cao

We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges.

Defect Detection Instance Segmentation +1

Paper
Add Code

Pointersect: Neural Rendering with Cloud-Ray Intersection

no code implementations • CVPR 2023 • Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel

Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray.

Inverse Rendering Neural Rendering +2

Paper
Add Code

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

no code implementations • 27 Mar 2023 • Karren Yang, Ting-yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

FaceLit: Neural 3D Relightable Faces

no code implementations • CVPR 2023 • Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel

We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation.

Paper
Add Code

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

4 code implementations • ICCV 2023 • Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.

Image Classification

29,735

Paper
Code

Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

1 code implementation • ICCV 2023 • Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel

Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3. 4% improved accuracy.

Data Augmentation Knowledge Distillation +2

Paper
Code

FastFill: Efficient Compatible Model Update

1 code implementation • 8 Mar 2023 • Florian Jaeckle, Fartash Faghri, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features.

Representation Learning Retrieval

Paper
Code

RangeAugment: Efficient Online Augmentation with Range Learning

1 code implementation • 20 Dec 2022 • Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari

To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations.

Knowledge Distillation object-detection +3

1,674

Paper
Code

I see what you hear: a vision-inspired method to localize words

no code implementations • 24 Oct 2022 • Mohammad Samragh, Arnav Kundu, Ting-yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik

This paper explores the possibility of using visual object detection techniques for word localization in speech data.

Object object-detection +2

Paper
Add Code

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

no code implementations • 8 Oct 2022 • Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri

Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.

Zero-Shot Learning

Paper
Add Code

MobileOne: An Improved One millisecond Mobile Backbone

7 code implementations • CVPR 2023 • Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device.

Ranked #586 on Image Classification on ImageNet

Efficient Neural Network Image Classification +2

29,735

Paper
Code

NeuMan: Neural Human Radiance Field from a Single Video

1 code implementation • 23 Mar 2022 • Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, Anurag Ranjan

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

1,239

Paper
Code

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

1 code implementation • CVPR 2022 • Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.

Representation Learning Retrieval

Paper
Code

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

no code implementations • 21 Oct 2021 • Ting-yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

no code implementations • 13 Oct 2021 • Jen-Hao Rick Chang, Martin Bresler, Youssouf Chherawala, Adrien Delaye, Thomas Deselaers, Ryan Dixon, Oncel Tuzel

We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.

Handwriting Recognition

Paper
Add Code

Token Pooling in Vision Transformers

no code implementations • 8 Oct 2021 • Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel

Token Pooling is a simple and effective operator that can benefit many architectures.

Paper
Add Code

Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

no code implementations • 6 Oct 2021 • Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference.

Paper
Add Code

Instance-Level Task Parameters: A Robust Multi-task Weighting Framework

no code implementations • 11 Jun 2021 • Pavan Kumar Anasosalu Vasu, Shreyas Saxena, Oncel Tuzel

When applied to datasets where one or more tasks can have noisy annotations, the proposed method learns to prioritize learning from clean labels for a given task, e. g. reducing surface estimation errors by up to 60%.

Depth Estimation Multi-Task Learning +2

Paper
Add Code

Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric

no code implementations • 2 Nov 2020 • Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel

The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.

Ranked #2 on Keyword Spotting on hey Siri

Keyword Spotting

Paper
Add Code

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

no code implementations • 2 Nov 2020 • Ting-yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

Our policy adapts the augmentation parameters based on the training loss of the data samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Subject-Aware Contrastive Learning for Biosignals

1 code implementation • 30 Jun 2020 • Joseph Y. Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, Erdrin Azemi

Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100).

Ranked #1 on Person Identification on EEG Motor Movement/Imagery Dataset

Anomaly Detection Contrastive Learning +9

Paper
Code

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

no code implementations • 30 Jun 2020 • Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel

We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples.

Image Classification Knowledge Distillation +2

Paper
Add Code

Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

no code implementations • 9 Mar 2020 • Ting-yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i. e., no style annotation, such as speaker information, is required.

Speech Synthesis

Paper
Add Code

Least squares binary quantization of neural networks

1 code implementation • 9 Jan 2020 • Hadi Pouransari, Zhucheng Tu, Oncel Tuzel

We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.

Quantization

Paper
Code

Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum

1 code implementation • NeurIPS 2019 • Shreyas Saxena, Oncel Tuzel, Dennis Decoste

To the best of our knowledge, our work is the first curriculum learning method to show gains on large scale image classification and detection tasks.

General Classification Image Classification +2

Paper
Code

OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL NETWORKS

no code implementations • 25 Sep 2019 • Hadi Pouransari, Oncel Tuzel

We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed optimal quantization algorithms.

Quantization

Paper
Add Code

MVX-Net: Multimodal VoxelNet for 3D Object Detection

1 code implementation • 2 Apr 2019 • Vishwanath A. Sindagi, Yin Zhou, Oncel Tuzel

Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data.

Ranked #7 on 3D Object Detection on DAIR-V2X-I

3D Object Detection Object +1

4,799

Paper
Code

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training

1 code implementation • 7 Dec 2018 • Saurabh Adya, Vinay Palakkode, Oncel Tuzel

In this work, we propose and evaluate the stochastic preconditioned nonlinear conjugate gradient algorithm for large scale DNN training tasks.

16k General Classification

Paper
Code

Divide, Denoise, and Defend against Adversarial Attacks

no code implementations • 19 Feb 2018 • Seyed-Mohsen Moosavi-Dezfooli, Ashish Shrivastava, Oncel Tuzel

Improving the robustness of neural networks against these attacks is important, especially for security-critical applications.

Denoising

Paper
Add Code

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

44 code implementations • CVPR 2018 • Yin Zhou, Oncel Tuzel

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.

Ranked #1 on Birds Eye View Object Detection on KITTI Cyclist Easy val

3D Object Detection Birds Eye View Object Detection +4

639

Paper
Code

Attentional Network for Visual Object Detection

no code implementations • 6 Feb 2017 • Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-Massoud Farahmand

We propose augmenting deep neural networks with an attention mechanism for the visual object detection task.

Object object-detection +1

Paper
Add Code

Learning from Simulated and Unsupervised Images through Adversarial Training

9 code implementations • CVPR 2017 • Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb

With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.

Ranked #3 on Image-to-Image Translation on Cityscapes Labels-to-Photo (Per-class Accuracy metric)

Domain Adaptation Gaze Estimation +2

575

Paper
Code

Coupled Generative Adversarial Networks

4 code implementations • NeurIPS 2016 • Ming-Yu Liu, Oncel Tuzel

We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images.

Ranked #3 on Image-to-Image Translation on Cityscapes Photo-to-Labels

Domain Adaptation Generative Adversarial Network +1

15,701

Paper
Code

Gaussian Conditional Random Field Network for Semantic Segmentation

no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu, Rama Chellapa

In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose to use a Gaussian CRF model for the task of semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection

no code implementations • CVPR 2016 • Bharat Singh, Tim K. Marks, Michael Jones, Oncel Tuzel, Ming Shao

We present a multi-stream bi-directional recurrent neural network for fine-grained action detection.

Action Recognition In Videos Fine-Grained Action Detection +2

Paper
Add Code

Global-Local Face Upsampling Network

no code implementations • 23 Mar 2016 • Oncel Tuzel, Yuichi Taguchi, John R. Hershey

In our deep network architecture the global and local constraints that define a face can be efficiently modeled and learned end-to-end using training data.

Face Hallucination Face Reconstruction +2

Paper
Add Code

Robust Face Alignment Using a Mixture of Invariant Experts

no code implementations • 13 Nov 2015 • Oncel Tuzel, Tim K. Marks, Salil Tambe

Face alignment is particularly challenging when there are large variations in pose (in-plane and out-of-plane rotations) and facial expression.

Face Alignment regression +1

Paper
Add Code

Deep Gaussian Conditional Random Field Network: A Model-based Deep Network for Discriminative Denoising

no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu

We propose a novel deep network architecture for image\\ denoising based on a Gaussian Conditional Random Field (GCRF) model.

Image Denoising

Paper
Add Code

Layered Interpretation of Street View Images

no code implementations • 15 Jun 2015 • Ming-Yu Liu, Shuoxin Lin, Srikumar Ramalingam, Oncel Tuzel

We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving.

Autonomous Driving Scene Labeling +1

Paper
Add Code

Deep Hierarchical Parsing for Semantic Segmentation

no code implementations • CVPR 2015 • Abhishek Sharma, Oncel Tuzel, David W. Jacobs

We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function.

General Classification Scene Parsing +2

Paper
Add Code

Efficient Upsampling of Natural Images

no code implementations • 28 Feb 2015 • Chinmay Hegde, Oncel Tuzel, Fatih Porikli

1) For the edge layer, we use a nonparametric approach by constructing a dictionary of patches from a given image, and synthesize edge regions in a higher-resolution version of the image.

Paper
Add Code

Recursive Context Propagation Network for Semantic Scene Labeling

no code implementations • NeurIPS 2014 • Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu

Then a top-down propagation of the aggregated information takes place that enhances the contextual information of each local feature.

Scene Labeling

Paper
Add Code

Joint Geodesic Upsampling of Depth Images

no code implementations • CVPR 2013 • Ming-Yu Liu, Oncel Tuzel, Yuichi Taguchi

We propose an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image.

Sensor Fusion

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.