CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy. The provided texts describes both static and dynamic attributes precisely.
Source: CelebV-Text: A Large-Scale Facial Text-Video DatasetPaper | Code | Results | Date | Stars |
---|