COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.
20 PAPERS • 3 BENCHMARKS
Perseus is a dataset for Cross-Lingual Summarization (CLS) which collects about 94K Chinese scientific documents paired with English summaries. The average length of documents in Perseus is more than two thousand tokens.
1 PAPER • NO BENCHMARKS YET