no code implementations • 1 Apr 2024 • Deqing Fu, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger
Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.
no code implementations • 11 Mar 2024 • Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan
Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive.
no code implementations • 4 Feb 2024 • Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger
Large language models (LLMs) are increasingly used across society, including in domains like business, engineering, and medicine.
no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.
no code implementations • 26 Oct 2023 • Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL.
1 code implementation • 13 May 2023 • Deqing Fu, Ameya Godbole, Robin Jia
In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples.
no code implementations • 22 Nov 2021 • Deqing Fu, Bradley J. Nelson
Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image.
no code implementations • ICCV 2021 • Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon
Multi-modal domain translation typically refers to synthesizing a novel image that inherits certain localized attributes from a 'content' image (e. g. layout, semantics, or geometry), and inherits everything else (e. g. texture, lighting, sometimes even semantics) from a 'style' image.