no code implementations • 26 Mar 2024 • Mehdy Dousty, David J. Fleet, José Zariffa
A comprehensive evaluation of function in home and community settings requires a hand grasp taxonomy for individuals with impaired hand function.
no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.
no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
no code implementations • 17 Apr 2023 • Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet
Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts.
no code implementations • 28 Feb 2023 • Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet
To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks.
Ranked #22 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
1 code implementation • CVPR 2023 • Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, Andrea Tagliasacchi
To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem.
no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
1 code implementation • 19 Oct 2022 • Renjie Liao, Simon Kornblith, Mengye Ren, David J. Fleet, Geoffrey Hinton
We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations.
1 code implementation • ICCV 2023 • Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image.
no code implementations • 5 Oct 2022 • Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.
Ranked #1 on Video Generation on LAION-400M
1 code implementation • 15 Jun 2022 • Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton
Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.
1 code implementation • 1 Jun 2022 • Shayan shekarforoush, David B. Lindell, David J. Fleet, Marcus A. Brubaker
Coordinate networks like Multiplicative Filter Networks (MFNs) and BACON offer some control over the frequency spectrum used to represent continuous signals such as images or 3D volumes.
3 code implementations • 7 Apr 2022 • Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.
5 code implementations • 10 Nov 2021 • Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi
We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.
Ranked #1 on Colorization on ImageNet ctest10k
6 code implementations • ICLR 2022 • Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton
We present Pix2Seq, a simple and generic framework for object detection.
Ranked #77 on Object Detection on COCO minival (using extra training data)
no code implementations • 30 May 2021 • Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality.
Ranked #2 on Image Generation on ImageNet 64x64
4 code implementations • 15 Apr 2021 • Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi
We present SR3, an approach to image Super-Resolution via Repeated Refinement.
1 code implementation • 17 Feb 2021 • Fartash Faghri, Sven Gowal, Cristina Vasconcelos, David J. Fleet, Fabian Pedregosa, Nicolas Le Roux
We demonstrate that the choice of optimizer, neural network architecture, and regularizer significantly affect the adversarial robustness of linear neural networks, providing guarantees without the need for adversarial training.
no code implementations • 27 Nov 2020 • Sara Sabour, Andrea Tagliasacchi, Soroosh Yazdani, Geoffrey E. Hinton, David J. Fleet
Capsule networks aim to parse images into a hierarchy of objects, parts and relations.
1 code implementation • 9 Jul 2020 • Fartash Faghri, David Duvenaud, David J. Fleet, Jimmy Ba
We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.
1 code implementation • NeurIPS 2020 • Sajad Norouzi, David J. Fleet, Mohammad Norouzi
We introduce Exemplar VAEs, a family of generative models that bridge the gap between parametric and non-parametric, exemplar based generative models.
1 code implementation • 18 Feb 2020 • Micha Livne, Kevin Swersky, David J. Fleet
MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse.
Ranked #1 on Question Answering on YahooCQA (using extra training data)
1 code implementation • 8 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet
Experiments show that MIM learns representations with high mutual information, consistent encoding and decoding distributions, effective latent clustering, and data log likelihood comparable to VAE, while avoiding posterior collapse.
no code implementations • 4 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet
We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework.
no code implementations • 4 Dec 2018 • Micha Livne, Leonid Sigal, Marcus A. Brubaker, David J. Fleet
To our knowledge, this is the first approach to take physics into account without explicit {\em a priori} knowledge of the environment or body dimensions.
no code implementations • 5 Nov 2018 • Micha Livne, David J. Fleet
Unlike autoencoders, the bottleneck does not limit model expressiveness, similar to flow-based ML.
10 code implementations • 18 Jul 2017 • Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
We present a new technique for learning visual-semantic embeddings for cross-modal retrieval.
Ranked #23 on Cross-Modal Retrieval on Flickr30k
no code implementations • 24 Nov 2015 • Yanshuai Cao, David J. Fleet
We introduce a framework for analyzing transductive combination of Gaussian process (GP) experts, where independently trained GP experts are combined in a way that depends on test point location, in order to scale GPs to big data.
2 code implementations • 16 Nov 2015 • Sara Sabour, Yanshuai Cao, Fartash Faghri, David J. Fleet
We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image.
no code implementations • NeurIPS 2015 • Mohammad Norouzi, Maxwell D. Collins, Matthew Johnson, David J. Fleet, Pushmeet Kohli
In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective.
no code implementations • 19 Jun 2015 • Mohammad Norouzi, Maxwell D. Collins, David J. Fleet, Pushmeet Kohli
We develop a convex-concave upper bound on the classification loss for a one-level decision tree, and optimize the bound by stochastic gradient descent at each internal node of the tree.
no code implementations • CVPR 2015 • Marcus A. Brubaker, Ali Punjani, David J. Fleet
A new framework for estimation is introduced which relies on modern stochastic optimization techniques to scale to large datasets.
no code implementations • 28 Oct 2014 • Yanshuai Cao, David J. Fleet
In this work, we propose a generalized product of experts (gPoE) framework for combining the predictions of multiple probabilistic models.
no code implementations • CVPR 2014 • Gerard Pons-Moll, David J. Fleet, Bodo Rosenhahn
We advocate the inference of qualitative information about 3D human pose, called posebits, from images.
no code implementations • NeurIPS 2013 • Yanshuai Cao, Marcus A. Brubaker, David J. Fleet, Aaron Hertzmann
We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression.
2 code implementations • 11 Jul 2013 • Mohammad Norouzi, Ali Punjani, David J. Fleet
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search.
1 code implementation • CVPR 2013 • Mohammad Norouzi, David J. Fleet
A fundamental limitation of quantization techniques like the k-means clustering algorithm is the storage and runtime cost associated with the large numbers of clusters required to keep quantization errors small and model fidelity high.
no code implementations • NeurIPS 2012 • Mohammad Norouzi, David J. Fleet, Ruslan R. Salakhutdinov
Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity.