Search Results for author: C. V. Jawahar

Found 106 papers, 40 papers with code

Recurrent Image Annotation With Explicit Inter-Label Dependencies

1 code implementation ECCV 2020 Ayushi Dutta, Yashaswi Verma, C. V. Jawahar

Additionally, it provides a new perspecitve of looking at an unordered set of labels as equivalent to a collection of different permutations (sequences) of those labels, thus naturally aligning with the image annotation task.

Image Captioning

IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic

no code implementations12 Apr 2024 Chirag Parikh, Rohit Saluja, C. V. Jawahar, Ravi Kiran Sarvadevabhatla

Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation.

Object Object Localization

Towards Accurate Lip-to-Speech Synthesis in-the-Wild

no code implementations2 Mar 2024 Sindhu Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay Namboodiri

In this paper, we introduce a novel approach to address the task of synthesizing speech from silent videos of any in-the-wild speaker solely based on lip movements.

Language Modelling Lip to Speech Synthesis +1

Multiple Instance Learning for Glioma Diagnosis using Hematoxylin and Eosin Whole Slide Images: An Indian Cohort Study

no code implementations24 Feb 2024 Ekansh Chauhan, Amit Sharma, Megha S Uppin, C. V. Jawahar, P. K. Vinod

It establishes new performance benchmarks in glioma subtype classification across multiple datasets, including a novel dataset focused on the Indian demographic (IPD- Brain), providing a valuable resource for existing research.

Decision Making Management +2

Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation

no code implementations30 Nov 2023 Avijit Dasgupta, C. V. Jawahar, Karteek Alahari

We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy.

Domain Adaptation

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations30 Nov 2023 Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos

no code implementations6 Nov 2023 Siddhant Bansal, Chetan Arora, C. V. Jawahar

Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task.

Procedure Learning

Explaining Deep Face Algorithms through Visualization: A Survey

no code implementations26 Sep 2023 Thrupthi Ann John, Vineeth N Balasubramanian, C. V. Jawahar

Although current deep models for face tasks surpass human performance on some benchmarks, we do not understand how they work.

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

no code implementations4 Sep 2023 Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.

Domain Adaptation Question Answering +1

Towards Real-Time Analysis of Broadcast Badminton Videos

1 code implementation23 Aug 2023 Nitin Nilesh, Tushar Sharma, Anurag Ghosh, C. V. Jawahar

In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos.

Reading Between the Lanes: Text VideoQA on the Road

no code implementations8 Jul 2023 George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.

Question Answering Scene Text Recognition +1

CueCAn: Cue Driven Contextual Attention For Identifying Missing Traffic Signs on Unconstrained Roads

no code implementations5 Mar 2023 Varun Gupta, Anbumani Subramanian, C. V. Jawahar, Rohit Saluja

MTSVD is challenging compared to the previous works in two aspects i) The traffic signs are generally not present in the vicinity of their cues, ii) The traffic signs cues are diverse and unique.

object-detection Object Detection

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

1 code implementation30 Dec 2022 Prafful Kumar Khoba, Chirag Parikh, Rohit Saluja, Ravi Kiran Sarvadevabhatla, C. V. Jawahar

Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task.

Towards Robust Handwritten Text Recognition with On-the-fly User Participation

no code implementations17 Dec 2022 Ajoy Mondal, Rohit Saluja, C. V. Jawahar

The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget.

Handwritten Text Recognition Optical Character Recognition (OCR)

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information

1 code implementation15 Dec 2022 Ajoy Mondal, C. V. Jawahar

We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts.

Handwritten Text Recognition HTR +1

Information Retrieval from the Digitized Books

no code implementations2 Dec 2022 Riya Gupta, C. V. Jawahar

Extracting the relevant information out of a large number of documents is a challenging and tedious task.

Image Retrieval Information Retrieval +3

Watching the News: Towards VideoQA Models that can Read

no code implementations10 Nov 2022 Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.

Question Answering Video Question Answering +1

Unsupervised Audio-Visual Lecture Segmentation

no code implementations29 Oct 2022 Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi

We formulate lecture segmentation as an unsupervised task that leverages visual, textual, and OCR cues from the lecture, while clip representations are fine-tuned on a pretext self-supervised task of matching the narration with the temporally aligned visual content.

Navigate Optical Character Recognition (OCR) +1

INR-V: A Continuous Representation Space for Video-based Generative Tasks

1 code implementation29 Oct 2022 Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar

In this work, we evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines.

Video Generation Video Inpainting

IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes

no code implementations23 Oct 2022 Shubham Dokania, A. H. Abdul Hafez, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar

Autonomous driving and assistance systems rely on annotated data from traffic and road scenarios to model and learn the various object relations in complex real-world scenarios.

3D Object Detection Autonomous Driving +2

Grounded Video Situation Recognition

no code implementations19 Oct 2022 Zeeshan Khan, C. V. Jawahar, Makarand Tapaswi

Recently, Video Situation Recognition (VidSitu) is framed as a task for structured prediction of multiple events, their relationships, and actions and various verb-role pairs attached to descriptive entities.

Descriptive Structured Prediction +1

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

no code implementations1 Sep 2022 Sindhu B Hegde, K R Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar

With the help of multiple powerful discriminators that guide the training process, our generator learns to synthesize speech sequences in any voice for the lip movements of any person.

Lip to Speech Synthesis Speech Synthesis

FaceOff: A Video-to-Video Face Swapping System

1 code implementation21 Aug 2022 Aditya Agarwal, Bipasha Sen, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

To tackle this challenge, we introduce video-to-video (V2V) face-swapping, a novel task of face-swapping that can preserve (1) the identity and expressions of the source (actor) face video and (2) the background and pose of the target (double) video.

Face Swapping

Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors

1 code implementation17 Aug 2022 Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V. Jawahar

We show that when we process this $8\times8$ video with the right set of audio and image priors, we can obtain a full-length, $256\times256$ video.

Super-Resolution Video Compression

TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments

1 code implementation16 Aug 2022 Shubham Dokania, Anbumani Subramanian, Manmohan Chandraker, C. V. Jawahar

We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation, mimicking real scene properties with high-fidelity, along with mechanisms to diversify samples in a physically meaningful way.

Semantic Segmentation Synthetic Data Generation

My View is the Best View: Procedure Learning from Egocentric Videos

1 code implementation22 Jul 2022 Siddhant Bansal, Chetan Arora, C. V. Jawahar

Instead, we propose to use the signal provided by the temporal correspondences between key-steps across videos.

Procedure Learning

Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads

1 code implementation18 Apr 2022 Aman Goyal, Dev Agarwal, Anbumani Subramanian, C. V. Jawahar, Ravi Kiran Sarvadevabhatla, Rohit Saluja

In many Asian countries with unconstrained road traffic conditions, driving violations such as not wearing helmets and triple-riding are a significant source of fatalities involving motorcycles.

Classroom Slide Narration System

no code implementations21 Jan 2022 Jobin K. V., Ajoy Mondal, C. V. Jawahar

With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content.

Image Segmentation Optical Character Recognition +4

Automatic Quantification and Visualization of Street Trees

1 code implementation17 Jan 2022 Arpit Bahety, Rohit Saluja, Ravi Kiran Sarvadevabhatla, Anbumani Subramanian, C. V. Jawahar

We obtain TCDCA of 96. 77% on the test videos, with a remarkable improvement of 22. 58% over baseline, and demonstrate that our counting module's performance is close to human level.

Transfer Learning for Scene Text Recognition in Indian Languages

no code implementations10 Jan 2022 Sanjana Gunna, Rohit Saluja, C. V. Jawahar

WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.

Scene Text Recognition Transfer Learning

Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

1 code implementation10 Jan 2022 Sanjana Gunna, Rohit Saluja, C. V. Jawahar

Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images.

Scene Text Recognition

Multi-Domain Incremental Learning for Semantic Segmentation

1 code implementation23 Oct 2021 Prachi Garg, Rohit Saluja, Vineeth N Balasubramanian, Chetan Arora, Anbumani Subramanian, C. V. Jawahar

Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model.

Incremental Learning Scene Segmentation +1

Ego4D: Around the World in 3,000 Hours of Egocentric Video

5 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads

no code implementations11 Sep 2021 Harish Rithish, Raghava Modhugu, Ranjith Reddy, Rohit Saluja, C. V. Jawahar

Conventional approaches for addressing road safety rely on manual interventions or immobile CCTV infrastructure.

DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks

2 code implementations Journal of Chemical Information and Modeling 2021 Rishal Aggarwal, Akash Gupta, Vineeth Chelur, C. V. Jawahar, and U. Deva Priyakumar

A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site.

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

1 code implementation18 Mar 2021 Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar

In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).

Key Information Extraction Optical Character Recognition (OCR) +1

Few Shot Learning With No Labels

no code implementations26 Dec 2020 Aditya Bharti, N. B. Vineeth, C. V. Jawahar

Few-shot learners aim to recognize new categories given only a small number of training samples.

Few-Shot Learning

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

1 code implementation27 Oct 2020 Siddhant Bansal, Praveen Krishnan, C. V. Jawahar

We propose a novel scheme for improving the word recognition accuracy using word image embeddings.

Table Structure Recognition using Top-Down and Bottom-Up Cues

1 code implementation ECCV 2020 Sachin Raja, Ajoy Mondal, C. V. Jawahar

We present an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells.

Cell Detection Optical Character Recognition +2

Graphical Object Detection in Document Images

1 code implementation25 Aug 2020 Ranajit Saha, Ajoy Mondal, C. V. Jawahar

Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document.

Domain Adaptation Object +3

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

4 code implementations23 Aug 2020 K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

 Ranked #1 on Unconstrained Lip-synchronization on LRS3 (using extra training data)

MORPH Unconstrained Lip-synchronization

Document Visual Question Answering Challenge 2020

no code implementations20 Aug 2020 Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar

For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.

Question Answering Retrieval +2

Revisiting Low Resource Status of Indian Languages in Machine Translation

no code implementations11 Aug 2020 Jerin Philip, Shashank Siripragada, Vinay P. Namboodiri, C. V. Jawahar

Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems.

Machine Translation NMT +3

Textual Description for Mathematical Equations

1 code implementation7 Aug 2020 Ajoy Mondal, C. V. Jawahar

Reading of mathematical expression or equation in the document images is very challenging due to the large variability of mathematical symbols and expressions.

Image Captioning

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

no code implementations6 Aug 2020 Ajoy Mondal, Peter Lipps, C. V. Jawahar

This dataset, IIIT-AR-13k, is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports.

Object object-detection +2

Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances

no code implementations ECCV 2020 Aditya Arun, C. V. Jawahar, M. Pawan Kumar

Recent approaches for weakly supervised instance segmentations depend on two components: (i) a pseudo label generation model that provides instances which are consistent with a given annotation; and (ii) an instance segmentation model, which is trained in a supervised manner using the pseudo labels as ground-truth.

Image-level Supervised Instance Segmentation Pseudo Label +3

A Multilingual Parallel Corpora Collection Effort for Indian Languages

2 code implementations LREC 2020 Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar

We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource.

Machine Translation Retrieval +2

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

1 code implementation1 Jul 2020 Siddhant Bansal, Praveen Krishnan, C. V. Jawahar

Recognition and retrieval of textual content from the large document collections have been a powerful use case for the document image analysis community.

Optical Character Recognition (OCR) Retrieval

RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

no code implementations19 May 2020 Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar

State of the art methods for text detection, recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets.

Text Detection

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

1 code implementation CVPR 2020 K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.

 Ranked #1 on Lip Reading on LRW

Lip Reading Speaker-Specific Lip to Speech Synthesis +1

IndicSpeech: Text-to-Speech Corpus for Indian Languages

no code implementations LREC 2020 Nimisha Srivastava, Rudrabha Mukhopadhyay, Prajwal K R, C. V. Jawahar

We believe that one of the major reasons for this is the lack of large, publicly available text-to-speech corpora in these languages that are suitable for training neural text-to-speech systems.

Towards Automatic Face-to-Face Translation

1 code implementation ACM Multimedia, 2019 2019 Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

 Ranked #1 on Talking Face Generation on LRW (using extra training data)

Face to Face Translation Machine Translation +3

A Deep Learning Approach for Robust Corridor Following

no code implementations18 Nov 2019 Vishnu Sashank Dorbala, A. H. Abdul Hafez, C. V. Jawahar

For an autonomous corridor following task where the environment is continuously changing, several forms of environmental noise prevent an automated feature extraction procedure from performing reliably.

CVIT's submissions to WAT-2019

no code implementations WS 2019 Jerin Philip, Shashank Siripragada, Upendra Kumar, Vinay Namboodiri, C. V. Jawahar

This paper describes the Neural Machine Translation systems used by IIIT Hyderabad (CVIT-MT) for the translation tasks part of WAT-2019.

Machine Translation Translation

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations30 Jun 2019 Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

CVIT-MT Systems for WAT-2018

no code implementations PACLIC 2018 Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar

This document describes the machine translation system used in the submissions of IIIT-Hyderabad CVIT-MT for the WAT-2018 English-Hindi translation task.

Machine Translation Translation

Self-Supervised Visual Representations for Cross-Modal Retrieval

no code implementations31 Jan 2019 Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.

Cross-Modal Retrieval Image Classification +3

Universal Semi-Supervised Semantic Segmentation

1 code implementation ICCV 2019 Tarun Kalluri, Girish Varma, Manmohan Chandraker, C. V. Jawahar

In recent years, the need for semantic segmentation has arisen across several different applications and environments.

Ranked #27 on Semantic Segmentation on DensePASS (using extra training data)

Segmentation Semi-Supervised Semantic Segmentation +1

IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments

2 code implementations26 Nov 2018 Girish Varma, Anbumani Subramanian, Anoop Namboodiri, Manmohan Chandraker, C. V. Jawahar

It also reflects label distributions of road scenes significantly different from existing datasets, with most classes displaying greater within-class diversity.

Autonomous Navigation Domain Adaptation +3

Dissimilarity Coefficient based Weakly Supervised Object Detection

no code implementations CVPR 2019 Aditya Arun, C. V. Jawahar, M. Pawan Kumar

This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution.

Object object-detection +2

Improved Visual Relocalization by Discovering Anchor Points

no code implementations11 Nov 2018 Soham Saha, Girish Varma, C. V. Jawahar

Our method improves the median error in indoor as well as outdoor localization datasets compared to the previous best deep learning model known as PoseNet (with geometric re-projection loss) using the same feature extractor.

Outdoor Localization

Class2Str: End to End Latent Hierarchy Learning

1 code implementation20 Aug 2018 Soham Saha, Girish Varma, C. V. Jawahar

We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes.

General Classification Image Classification

Connecting Visual Experiences using Max-flow Network with Application to Visual Localization

no code implementations1 Aug 2018 A. H. Abdul Hafez, Nakul Agarwal, C. V. Jawahar

This problem is solved by finding the maximum flow in a directed graph flow-network, whose vertices represent the matches between frames in the test and reference sequences.

Autonomous Navigation Visual Localization

Learning Human Poses from Actions

no code implementations24 Jul 2018 Aditya Arun, C. V. Jawahar, M. Pawan Kumar

In order to avoid the high cost of full supervision, we propose to use a diverse data set, which consists of two types of annotations: (i) a small number of images are labeled using the expensive ground-truth pose; and (ii) other images are labeled using the inexpensive action label.

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

1 code implementation4 Jul 2018 Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.

Image Classification object-detection +3

Efficient Semantic Segmentation using Gradual Grouping

no code implementations22 Jun 2018 Nikitha Vallurupalli, Sriharsha Annamaneni, Girish Varma, C. V. Jawahar, Manu Mathew, Soyeb Nagori

We study the effectiveness of these techniques on a real-time semantic segmentation architecture like ERFNet for improving run time by over 5X.

Real-Time Semantic Segmentation Segmentation

Unsupervised Learning of Face Representations

1 code implementation3 Mar 2018 Samyak Datta, Gaurav Sharma, C. V. Jawahar

Although faces extracted from videos have a lower spatial resolution than those which are available as part of standard supervised face datasets such as LFW and CASIA-WebFace, the former represent a much more realistic setting, e. g. in surveillance scenarios where most of the faces detected are very small.

HWNet v2: An Efficient Word Image Representation for Handwritten Documents

no code implementations17 Feb 2018 Praveen Krishnan, C. V. Jawahar

We present a framework for learning an efficient holistic representation for handwritten word images.

Transfer Learning

SmartTennisTV: Automatic indexing of tennis videos

no code implementations4 Jan 2018 Anurag Ghosh, C. V. Jawahar

In this paper, we demonstrate a score based indexing approach for tennis videos.

Towards Structured Analysis of Broadcast Badminton Videos

no code implementations23 Dec 2017 Anurag Ghosh, Suriya Singh, C. V. Jawahar

Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics.

An EEG-based Image Annotation System

no code implementations7 Nov 2017 Viral Parekh, Ramanathan Subramanian, Dipanjan Roy, C. V. Jawahar

The success of deep learning in computer vision has greatly increased the need for annotated image datasets.

EEG ERP

Unconstrained Scene Text and Video Text Recognition for Arabic Script

no code implementations7 Nov 2017 Mohit Jain, Minesh Mathew, C. V. Jawahar

For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data.

Scene Text Recognition

Pose-Aware Person Recognition

no code implementations CVPR 2017 Vijay Kumar, Anoop Namboodiri, Manohar Paluri, C. V. Jawahar

Person recognition methods that use multiple body regions have shown significant improvements over traditional face-based recognition.

Person Recognition

Self-supervised learning of visual features through embedding images into text topic spaces

no code implementations CVPR 2017 Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.

Image Classification object-detection +3

Automated Top View Registration of Broadcast Football Videos

no code implementations4 Mar 2017 Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C. V. Jawahar

The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model.

Bird View Synthesis Homography Estimation

Align Me: A framework to generate Parallel Corpus Using OCRs and Bilingual Dictionaries

no code implementations WS 2016 Priyam Bakliwal, Devadath V V, C. V. Jawahar

Multilingual language processing tasks like statistical machine translation and cross language information retrieval rely mainly on availability of accurate parallel corpora.

Active Learning Information Retrieval +4

Generating Synthetic Data for Text Recognition

1 code implementation15 Aug 2016 Praveen Krishnan, C. V. Jawahar

Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner.

Data Augmentation Image Generation

Matching Handwritten Document Images

no code implementations19 May 2016 Praveen Krishnan, C. V. Jawahar

We address the problem of predicting similarity between a pair of handwritten document images written by different individuals.

Efficient Optimization for Rank-based Loss Functions

no code implementations CVPR 2018 Pritish Mohapatra, Michal Rolinek, C. V. Jawahar, Vladimir Kolmogorov, M. Pawan Kumar

We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions.

Information Retrieval Retrieval

Trajectory Aligned Features For First Person Action Recognition

no code implementations7 Apr 2016 Suriya Singh, Chetan Arora, C. V. Jawahar

Objects present in the scene and hand gestures of the wearer are the most important cues for first person action recognition but are difficult to segment and recognize in an egocentric video.

Action Recognition Point Tracking +1

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

no code implementations13 Jan 2016 Anand Mishra, Karteek Alahari, C. V. Jawahar

We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them.

Scene Text Recognition

Visual Phrases for Exemplar Face Detection

no code implementations ICCV 2015 Vijay Kumar, Anoop Namboodiri, C. V. Jawahar

Contrary to traditional approaches that model face variations from a large and diverse set of training examples, exemplar-based approaches use a collection of discriminatively trained exemplars for detection.

Face Detection Retrieval

Multi-Label Cross-Modal Retrieval

1 code implementation ICCV 2015 Viresh Ranjan, Nikhil Rasiwasia, C. V. Jawahar

In this work, we address the problem of cross-modal retrieval in presence of multi-label annotations.

Cross-Modal Retrieval Retrieval

TennisVid2Text: Fine-grained Descriptions for Domain Specific Videos

no code implementations26 Nov 2015 Mohak Sukhwani, C. V. Jawahar

In this work, we attempt to describe videos from a specific domain - broadcast videos of lawn tennis matches.

Optimizing Average Precision using Weakly Supervised Data

no code implementations CVPR 2014 Aseem Behl, C. V. Jawahar, M. Pawan Kumar

The performance of binary classification tasks, such as action classification and object detection, is often measured in terms of the average precision (AP).

Action Classification Binary Classification +5

Relative Parts: Distinctive Parts for Learning Relative Attributes

no code implementations CVPR 2014 Ramachandruni N. Sandeep, Yashaswi Verma, C. V. Jawahar

The notion of relative attributes as introduced by Parikh and Grauman (ICCV, 2011) provides an appealing way of comparing two images based on their visual properties (or attributes) such as "smiling" for face images, "naturalness" for outdoor images, etc.

Attribute Image Retrieval

Blocks That Shout: Distinctive Parts for Scene Classification

no code implementations CVPR 2013 Mayank Juneja, Andrea Vedaldi, C. V. Jawahar, Andrew Zisserman

The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.