1 code implementation • 31 May 2023 • Young-Jin Park, Hao Wang, Shervin Ardeshir, Navid Azizan
Quantifying the reliability of these representations is crucial, as many downstream models rely on them as input for their own tasks.
no code implementations • 4 May 2023 • Shervin Ardeshir
We then extract a semantically meaningful representation for each training data point (such as CLIP embeddings from its visual encoder) and train a lightweight diagnosis model which maps this semantically meaningful representation of a data point to its task loss.
no code implementations • 7 Apr 2023 • Qi Qi, Shervin Ardeshir
When it comes to models directly trained on human faces, a sensitive confounder is that of human identities.
no code implementations • 12 Oct 2022 • Qi Qi, Shervin Ardeshir, Yi Xu, Tianbao Yang
Improving fairness between privileged and less-privileged sensitive attribute groups (e. g, {race, gender}) has attracted lots of attention.
no code implementations • 19 Jul 2022 • Shervin Ardeshir, Navid Azizan
In this work, we study whether the uncertainty of such a representation can be quantified for a single datapoint in a meaningful way.
no code implementations • 29 Apr 2022 • Mahdi M. Kalayeh, Shervin Ardeshir, Lingyi Liu, Nagendra Kamath, Ashok Chandrashekar
The abundance and ease of utilizing sound, along with the fact that auditory clues reveal a plethora of information about what happens in a scene, make the audio-visual space an intuitive choice for representation learning.
no code implementations • CVPR 2022 • Shervin Ardeshir, Cristina Segalin, Nathan Kallus
Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated.
no code implementations • 13 Apr 2022 • Shervin Ardeshir, Nagendra Kamath, Hossein Taghavi
Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates.
no code implementations • 14 Dec 2018 • Naji Khosravan, Shervin Ardeshir, Rohit Puri
To judge whether audio and video signals of a multimedia presentation are synchronized, we as humans often pay close attention to discriminative spatio-temporal blocks of the video (e. g. synchronizing the lip movement with the utterance of words, or the sound of a bouncing ball at the moment it hits the ground).
1 code implementation • 1 Dec 2018 • Mohamed Elfeki, Krishna Regmi, Shervin Ardeshir, Ali Borji
In this work, we introduce two datasets (synthetic and natural/real) containing simultaneously recorded egocentric and exocentric videos.
no code implementations • ECCV 2018 • Shervin Ardeshir, Ali Borji
Videos recorded from first person (egocentric) perspective have little visual appearance in common with those from third person perspective, especially with videos captured by top-view surveillance cameras.
no code implementations • 24 Dec 2016 • Shervin Ardeshir, Sandesh Sharma, Ali Broji
Human identification remains to be one of the challenging tasks in computer vision community due to drastic changes in visual features across different viewpoints, lighting conditions, occlusion, etc.
no code implementations • 17 Dec 2016 • Shervin Ardeshir, Krishna Regmi, Ali Borji
On one hand, the abundance of egocentric cameras in the past few years has offered the opportunity to study a lot of vision problems from the first-person perspective.
no code implementations • 30 Aug 2016 • Shervin Ardeshir, Ali Borji
First, having a set of egocentric videos and a top-view video, can we verify if the top-view video contains all, or some of the egocentric viewers present in the egocentric set?
no code implementations • 24 Jul 2016 • Shervin Ardeshir, Ali Borji
At the same time, surveillance cameras and drones offer an abundance of visual information, often captured from top-view.
1 code implementation • CVPR 2015 • Shervin Ardeshir, Kofi Malcolm Collins-Sibley, Mubarak Shah
In this paper, we propose a method which leverages information acquired from GIS databases to perform semantic segmentation of the image alongside with geo-referencing each semantic segment with its address and geo-location.
no code implementations • CVPR 2014 • Amir Roshan Zamir, Shervin Ardeshir, Mubarak Shah
We develop a robust method for identification and refinement of this subset using the rest of the images in the dataset.