Search Results for author: Ankit Ramchandani

Found 5 papers, 1 papers with code

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

no code implementations • 12 Feb 2024 • Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu, Shicong Zhao, Longfang Zhao, Ankit Ramchandani, Xin Luna Dong, Anuj Kumar

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities.

Language Modelling Large Language Model +2

Paper
Add Code

Animated Stickers: Bringing Stickers to Life with Video Diffusion

no code implementations • 8 Feb 2024 • David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.

Paper
Add Code

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

no code implementations • 17 Nov 2023 • Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

Evaluation results show our method improves visual quality by 14%, prompt alignment by 16. 2% and scene diversity by 15. 3%, compared to prompt engineering the base Emu model for stickers generation.

Image Generation Prompt Engineering

Paper
Add Code

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

no code implementations • 25 Aug 2023 • Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds.

Machine Translation Optical Character Recognition +2

Paper
Add Code

DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive Surveillance of COVID-19 Using Heterogeneous Features and their Interactions

2 code implementations • 31 Jul 2020 • Ankit Ramchandani, Chao Fan, Ali Mostafavi

In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivariate spatial time series data.

Time Series Time Series Analysis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.