Search Results for author: Tanzila Rahman

Found 10 papers, 5 papers with code

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

Paper
Add Code

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

no code implementations • 19 Dec 2023 • Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal

Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.

Image Generation Prompt Engineering

Paper
Add Code

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Paper
Code

TriBERT: Human-centric Audio-visual Representation Learning

1 code implementation • NeurIPS 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal

In this work, we introduce TriBERT -- a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention.

Pose Retrieval Representation Learning +1

Paper
Code

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

1 code implementation • 26 Oct 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal

Pose Retrieval Representation Learning +1

Paper
Code

Weakly-supervised Audio-visual Sound Source Detection and Separation

no code implementations • 25 Mar 2021 • Tanzila Rahman, Leonid Sigal

Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task.

Audio Source Separation Denoising +5

Paper
Add Code

An Improved Attention for Visual Question Answering

1 code implementation • 4 Nov 2020 • Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini

We also propose multimodal fusion module to combine both visual and textual information.

Decoder Question Answering +1

Paper
Code

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

no code implementations • ICCV 2019 • Tanzila Rahman, Bicheng Xu, Leonid Sigal

Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning.

Paper
Add Code

Convolutional Temporal Attention Model for Video-based Person Re-identification

no code implementations • 9 Apr 2019 • Tanzila Rahman, Mrigank Rochan, Yang Wang

A common approach for person re-identification is to first extract image features for all frames in the video, then aggregate all the features to form a video-level feature.

Semantic Segmentation Video-Based Person Re-Identification

Paper
Add Code

Video-based Person Re-identification Using Spatial-Temporal Attention Networks

1 code implementation • 26 Oct 2018 • Shivansh Rao, Tanzila Rahman, Mrigank Rochan, Yang Wang

The goal is to identify a person from videos captured under different cameras.

Video-Based Person Re-Identification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.