We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON).
In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.
This paper is about where and how optimal transport is used in machine learning with a focus on the question of scalable optimal transport.
LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data.
Our approach reduces memory usage by up to 65. 5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19. 7B tokens, and on fine-tuning RoBERTa on GLUE tasks.
PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.
Generalized Referring Expression Segmentation Image Segmentation +5
Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation.
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.
Ranked #25 on Language Modelling on LAMBADA
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.