PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

678
0.78 stars / hour

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

assafelovic/gpt-researcher 6 May 2023

To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting.

Math

10,203
0.67 stars / hour

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

cohere-ai/magikarp 8 May 2024

The disconnect between tokenizer creation and model training in language models has been known to allow for certain inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted behaviour.

Language Modelling Large Language Model

61
0.61 stars / hour

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

google/imageinwords 5 May 2024

To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process.

Specificity Text-to-Image Generation

134
0.61 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

4,284
0.60 stars / hour

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

638
0.58 stars / hour

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,647
0.58 stars / hour

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

csyxwei/masterweaver 9 May 2024

In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.

Text-to-Image Generation

46
0.58 stars / hour

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,647
0.57 stars / hour

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

id-animator/id-animator 23 Apr 2024

Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation.

Attribute Video Generation

182
0.55 stars / hour