Search Results for author: Oncel Tuzel

Found 54 papers, 19 papers with code

Weight subcloning: direct initialization of transformers using larger pretrained ones

no code implementations14 Dec 2023 Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari

The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed.

Image Classification Transfer Learning

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

no code implementations30 Nov 2023 Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?

Image Retrieval Retrieval +1

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

no code implementations30 Nov 2023 Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world.

Motion Synthesis

HUGS: Human Gaussian Splats

no code implementations29 Nov 2023 Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.

Neural Rendering Novel View Synthesis

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

1 code implementation28 Nov 2023 Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel

We further demonstrate the effectiveness of our multi-modal reinforced training by training a CLIP model based on ViT-B/16 image backbone and achieving +2. 9% average performance improvement on 38 evaluation benchmarks compared to the previous best.

Image Captioning Transfer Learning +1

TiC-CLIP: Continual Training of CLIP Models

1 code implementation24 Oct 2023 Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps.

Continual Learning Retrieval

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

1 code implementation23 Oct 2023 Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang

Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene.

Pointersect: Neural Rendering with Cloud-Ray Intersection

no code implementations CVPR 2023 Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel

Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray.

Inverse Rendering Neural Rendering +2

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

no code implementations27 Mar 2023 Karren Yang, Ting-yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

FaceLit: Neural 3D Relightable Faces

no code implementations CVPR 2023 Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel

We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation.

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

4 code implementations ICCV 2023 Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.

Image Classification

FastFill: Efficient Compatible Model Update

1 code implementation8 Mar 2023 Florian Jaeckle, Fartash Faghri, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features.

Representation Learning Retrieval

RangeAugment: Efficient Online Augmentation with Range Learning

1 code implementation20 Dec 2022 Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari

To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations.

Knowledge Distillation object-detection +3

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

no code implementations8 Oct 2022 Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri

Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.

Zero-Shot Learning

MobileOne: An Improved One millisecond Mobile Backbone

7 code implementations CVPR 2023 Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device.

Efficient Neural Network Image Classification +2

NeuMan: Neural Human Radiance Field from a Single Video

1 code implementation23 Mar 2022 Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, Anurag Ranjan

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

1 code implementation CVPR 2022 Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.

Representation Learning Retrieval

Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

no code implementations13 Oct 2021 Jen-Hao Rick Chang, Martin Bresler, Youssouf Chherawala, Adrien Delaye, Thomas Deselaers, Ryan Dixon, Oncel Tuzel

We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.

Handwriting Recognition

Token Pooling in Vision Transformers

no code implementations8 Oct 2021 Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel

Token Pooling is a simple and effective operator that can benefit many architectures.

Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

no code implementations6 Oct 2021 Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference.

Instance-Level Task Parameters: A Robust Multi-task Weighting Framework

no code implementations11 Jun 2021 Pavan Kumar Anasosalu Vasu, Shreyas Saxena, Oncel Tuzel

When applied to datasets where one or more tasks can have noisy annotations, the proposed method learns to prioritize learning from clean labels for a given task, e. g. reducing surface estimation errors by up to 60%.

Depth Estimation Multi-Task Learning +2

Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric

no code implementations2 Nov 2020 Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel

The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.

Keyword Spotting

Subject-Aware Contrastive Learning for Biosignals

1 code implementation30 Jun 2020 Joseph Y. Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, Erdrin Azemi

Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100).

Anomaly Detection Contrastive Learning +9

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

no code implementations30 Jun 2020 Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel

We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples.

Image Classification Knowledge Distillation +2

Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

no code implementations9 Mar 2020 Ting-yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i. e., no style annotation, such as speaker information, is required.

Speech Synthesis

Least squares binary quantization of neural networks

1 code implementation9 Jan 2020 Hadi Pouransari, Zhucheng Tu, Oncel Tuzel

We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.

Quantization

Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum

1 code implementation NeurIPS 2019 Shreyas Saxena, Oncel Tuzel, Dennis Decoste

To the best of our knowledge, our work is the first curriculum learning method to show gains on large scale image classification and detection tasks.

General Classification Image Classification +2

OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL NETWORKS

no code implementations25 Sep 2019 Hadi Pouransari, Oncel Tuzel

We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed optimal quantization algorithms.

Quantization

MVX-Net: Multimodal VoxelNet for 3D Object Detection

1 code implementation2 Apr 2019 Vishwanath A. Sindagi, Yin Zhou, Oncel Tuzel

Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data.

3D Object Detection Object +1

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training

1 code implementation7 Dec 2018 Saurabh Adya, Vinay Palakkode, Oncel Tuzel

In this work, we propose and evaluate the stochastic preconditioned nonlinear conjugate gradient algorithm for large scale DNN training tasks.

16k General Classification

Divide, Denoise, and Defend against Adversarial Attacks

no code implementations19 Feb 2018 Seyed-Mohsen Moosavi-Dezfooli, Ashish Shrivastava, Oncel Tuzel

Improving the robustness of neural networks against these attacks is important, especially for security-critical applications.

Denoising

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

44 code implementations CVPR 2018 Yin Zhou, Oncel Tuzel

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.

3D Object Detection Birds Eye View Object Detection +4

Attentional Network for Visual Object Detection

no code implementations6 Feb 2017 Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-Massoud Farahmand

We propose augmenting deep neural networks with an attention mechanism for the visual object detection task.

Object object-detection +1

Learning from Simulated and Unsupervised Images through Adversarial Training

9 code implementations CVPR 2017 Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb

With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.

Ranked #3 on Image-to-Image Translation on Cityscapes Labels-to-Photo (Per-class Accuracy metric)

Domain Adaptation Gaze Estimation +2

Gaussian Conditional Random Field Network for Semantic Segmentation

no code implementations CVPR 2016 Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu, Rama Chellapa

In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose to use a Gaussian CRF model for the task of semantic segmentation.

Segmentation Semantic Segmentation

Global-Local Face Upsampling Network

no code implementations23 Mar 2016 Oncel Tuzel, Yuichi Taguchi, John R. Hershey

In our deep network architecture the global and local constraints that define a face can be efficiently modeled and learned end-to-end using training data.

Face Hallucination Face Reconstruction +2

Robust Face Alignment Using a Mixture of Invariant Experts

no code implementations13 Nov 2015 Oncel Tuzel, Tim K. Marks, Salil Tambe

Face alignment is particularly challenging when there are large variations in pose (in-plane and out-of-plane rotations) and facial expression.

Face Alignment regression +1

Layered Interpretation of Street View Images

no code implementations15 Jun 2015 Ming-Yu Liu, Shuoxin Lin, Srikumar Ramalingam, Oncel Tuzel

We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving.

Autonomous Driving Scene Labeling +1

Deep Hierarchical Parsing for Semantic Segmentation

no code implementations CVPR 2015 Abhishek Sharma, Oncel Tuzel, David W. Jacobs

We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function.

General Classification Scene Parsing +2

Efficient Upsampling of Natural Images

no code implementations28 Feb 2015 Chinmay Hegde, Oncel Tuzel, Fatih Porikli

1) For the edge layer, we use a nonparametric approach by constructing a dictionary of patches from a given image, and synthesize edge regions in a higher-resolution version of the image.

Recursive Context Propagation Network for Semantic Scene Labeling

no code implementations NeurIPS 2014 Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu

Then a top-down propagation of the aggregated information takes place that enhances the contextual information of each local feature.

Scene Labeling

Joint Geodesic Upsampling of Depth Images

no code implementations CVPR 2013 Ming-Yu Liu, Oncel Tuzel, Yuichi Taguchi

We propose an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image.

Sensor Fusion

Cannot find the paper you are looking for? You can Submit a new open access paper.