Search Results for author: Shimon Ullman

Found 33 papers, 8 papers with code

Towards Multimodal In-Context Learning for Vision & Language Models

no code implementations19 Mar 2024 Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream.

In-Context Learning

Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation

no code implementations26 Nov 2023 Yonatan Sverdlov, Shimon Ullman

This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting.

Continual Learning

Top-Down Network Combines Back-Propagation with Attention

1 code implementation4 Jun 2023 Roy Abel, Shimon Ullman

For example, during multi-task learning, the same top-down network is being used for both learning, via propagating feedback signals, and at the same time also for top-down attention, by guiding the bottom-up network to perform a selected task.

Multi-Task Learning

A model for full local image interpretation

no code implementations17 Oct 2021 Guy Ben-Yosef, Liav Assif, Daniel Harari, Shimon Ullman

We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene.

Image interpretation by iterative bottom-up top-down processing

1 code implementation12 May 2021 Shimon Ullman, Liav Assif, Alona Strugatski, Ben-Zion Vatashsky, Hila Levy, Aviv Netanyahu, Adam Yaari

Scene understanding requires the extraction and representation of scene components together with their properties and inter-relations.

Scene Understanding

What can human minimal videos tell us about dynamic recognition models?

1 code implementation19 Apr 2021 Guy Ben-Yosef, Gabriel Kreiman, Shimon Ullman

In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood.

Multi-Task Learning by a Top-Down Control Network

no code implementations9 Feb 2020 Hila Levi, Shimon Ullman

As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and efficiently in a single network has become an important and still open problem.

Multi-Task Learning

Task-Based Top-Down Modulation Network for Multi-Task-Learning Applications

no code implementations25 Sep 2019 Hila Levi, Shimon Ullman

Recent approaches address this problem by a channel-wise modulation of the feature-maps along the shared backbone, with task specific vectors, manually or dynamically tuned.

Multi-Task Learning

The Cakewalk Method

no code implementations ICLR 2019 Uri Patish, Shimon Ullman

Notably, we show in this benchmark that fixing the distribution of the surrogate is key to consistently recovering locally optimal solutions, and that our surrogate objective leads to an algorithm that outperforms other methods we have tested in a number of measures.

Combinatorial Optimization

Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

no code implementations29 Nov 2018 Hila Levi, Shimon Ullman

An image is not just a collection of objects, but rather a graph where each object is related to other objects through spatial and semantic relations.

object-detection Object Detection +1

VQA with no questions-answers training

1 code implementation CVPR 2020 Ben-Zion Vatashsky, Shimon Ullman

Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner, providing explanations for the answers and handling new domains without explicit examples.

Visual Question Answering (VQA)

Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures

no code implementations25 Oct 2018 Ben Zion Vatashsky, Shimon Ullman

An image related question defines a specific visual task that is required in order to produce an appropriate answer.

Discovery and usage of joint attention in images

no code implementations10 Apr 2018 Daniel Harari, Joshua B. Tenenbaum, Shimon Ullman

Second, we use a human study to demonstrate the sensitivity of humans to joint attention, suggesting that the detection of such a configuration in an image can be useful for understanding the image, including the goals of the agents and their joint activity, and therefore can contribute to image captioning and related tasks.

Image Captioning

Large Field and High Resolution: Detecting Needle in Haystack

no code implementations10 Apr 2018 Hadar Gorodissky, Daniel Harari, Shimon Ullman

The growing use of convolutional neural networks (CNN) for a broad range of visual tasks, including tasks involving fine details, raises the problem of applying such networks to a large field of view, since the amount of computations increases significantly with the number of pixels.

Vocal Bursts Intensity Prediction

Cakewalk Sampling

no code implementations25 Feb 2018 Uri Patish, Shimon Ullman

We study the task of finding good local optima in combinatorial optimization problems.

Clustering Combinatorial Optimization +1

A model for interpreting social interactions in local image regions

no code implementations26 Dec 2017 Guy Ben-Yosef, Alon Yachin, Shimon Ullman

Understanding social interactions (such as 'hug' or 'fight') is a basic and important capacity of the human visual system, but a challenging and still open problem for modeling.

Structured learning and detailed interpretation of minimal object images

no code implementations29 Nov 2017 Guy Ben-Yosef, Liav Assif, Shimon Ullman

We model the process of human full interpretation of object images, namely the ability to identify and localize all semantic features and parts that are recognized by human observers.

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

no code implementations29 Nov 2016 Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman

How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sources of information for the underlying perceptual processes?

Discovering containment: from infants to machines

no code implementations30 Oct 2016 Shimon Ullman, Nimrod Dorfman, Daniel Harari

Current artificial learning systems can recognize thousands of visual categories, or play Go at a champion"s level, but cannot explain infants learning, in particular the ability to learn complex concepts without guidance, in a specific order.

Human Pose Estimation using Deep Consensus Voting

no code implementations27 Mar 2016 Ita Lifshitz, Ethan Fetaya, Shimon Ullman

In this paper we consider the problem of human pose estimation from a single still image.

Pose Estimation Position

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

no code implementations EMNLP 2015 Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception.

Sentence

Visual Concept Recognition and Localization via Iterative Introspection

no code implementations14 Mar 2016 Amir Rosenfeld, Shimon Ullman

Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels.

General Classification

Face-space Action Recognition by Face-Object Interactions

no code implementations17 Jan 2016 Amir Rosenfeld, Shimon Ullman

Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations.

Action Recognition In Still Images Object +2

Hand-Object Interaction and Precise Localization in Transitive Action Recognition

no code implementations12 Nov 2015 Amir Rosenfeld, Shimon Ullman

In this paper we demonstrate how recognition is improved by obtaining precise localization of the action-object and consequently extracting details of the object shape together with the actor-object interaction.

Action Recognition In Still Images Object +3

Learning Local Invariant Mahalanobis Distances

no code implementations4 Feb 2015 Ethan Fetaya, Shimon Ullman

For many tasks and data types, there are natural transformations to which the data should be invariant or insensitive.

BIG-bench Machine Learning Translation

When Computer Vision Gazes at Cognition

1 code implementation8 Dec 2014 Tao Gao, Daniel Harari, Joshua Tenenbaum, Shimon Ullman

(1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly.

Task 2

Graph Approximation and Clustering on a Budget

no code implementations10 Jun 2014 Ethan Fetaya, Ohad Shamir, Shimon Ullman

We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.

Clustering

Using body-anchored priors for identifying actions in single images

no code implementations NeurIPS 2010 Leonid Karlinsky, Michael Dinerstein, Shimon Ullman

The task is easy for humans but difficult for current approaches to object recognition, because action instances may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.