WDC SOTAB V2

SOTAB V2 features two annotation tasks: Column Type Annotation (CTA) and Columns Property Annotation (CPA). The goal of the Column Type Annotation (CTA) task is to annotate the columns of a table using 82 types from the Schema.org vocabulary, such as telephone, Duration, Mass, or Organization. The goal of the Columns Property Annotation (CPA) task is to annotate pairs of table columns with one out of 108 Schema.org properties, such as gtin, startDate, priceValidUntil, or recipeIngredient. The benchmark consists of 45,834 tables annotated for CTA and 30,220 tables annotated for CPA originating from 55,511 different websites. The tables are split into training-, validation- and test sets for both tasks. The tables cover 17 popular Schema.org types including Product, LocalBusiness, Event, and JobPosting.

Some characteristics for the different tasks are provided in the table below, where "Columns" refers to the number of columns/column pairs labeled and "Classes" to the number of unique classes used for annotation.

Train Validation Test Classes
Tables Columns Tables Columns Tables Columns
CTA 44,769 116,887 456 1,769 609 1,851 82
CPA 29,158 109,994 497 2,459 565 2,340 108

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages