Table annotation
20 papers with code • 0 benchmarks • 10 datasets
Table annotation is the task of annotating a table with terms/concepts from knowledge graph or database schema. Table annotation is typically broken down into the following five subtasks:
- Cell Entity Annotation (CEA)
- Column Type Annotation (CTA)
- Column Property Annotation (CPA)
- Table Type Detection
- Row Annotation
The SemTab challenge is closely related to the Table Annotation problem. It is a yearly challenge which focuses on the first three tasks of table annotation and its purpose is to benchmark different table annotation systems.
Benchmarks
These leaderboards are used to track progress in Table annotation
Datasets
Subtasks
Most implemented papers
TCN: Table Convolutional Network for Web Table Interpretation
Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table.
bbw: Matching CSV to Wikidata via Meta-lookup
We present our publicly available semantic annotator bbw (boosted by wiki) tested at the second Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab2020).
Annotating Columns with Pre-trained Language Models
Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information.
MAGIC: Mining an Augmented Graph using INK, starting from a CSV
A large portion of structured data does not yet reap the benefits of the Semantic Web.
JenTab Meets SemTab 2021's New Challenges
While tables are a rich source of structured information, their automated use is oftentimes prevented by the inherent ambiguity contained within.
SOTAB: The WDC Schema.org Table Annotation Benchmark
This paper presents the WDC Schema. org Table Annotation Benchmark (SOTAB) for comparing the performance of table annotation systems.
BiodivTab: Semantic Table Annotation Benchmark Construction, Analysis, and New Additions
Individual cells and columns are assigned to KG entities and classes to disambiguate their meaning.
A large-scale dataset for end-to-end table recognition in the wild
To this end, we propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild, providing complete annotation dedicated to end-to-end TR research.
Column Type Annotation using ChatGPT
Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column.
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner.