TaBERT is a pretrained language model (LM) that jointly learns representations for natural language sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts.
In summary, TaBERT's process for learning representations for NL sentences is as follows: Given an utterance $u$ and a table $T$, TaBERT first creates a content snapshot of $T$. This snapshot consists of sampled rows that summarize the information in $T$ most relevant to the input utterance. The model then linearizes each row in the snapshot, concatenates each linearized row with the utterance, and uses the concatenated string as input to a Transformer model, which outputs row-wise encoding vectors of utterance tokens and cells. The encodings for all the rows in the snapshot are fed into a series of vertical self-attention layers, where a cell representation (or an utterance token representation) is computed by attending to vertically-aligned vectors of the same column (or the same NL token). Finally, representations for each utterance token and column are generated from a pooling layer.
Source: TaBERT: Pretraining for Joint Understanding of Textual and Tabular DataPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Semantic Parsing | 3 | 42.86% |
Question Answering | 2 | 28.57% |
Natural Language Understanding | 1 | 14.29% |
Text-To-SQL | 1 | 14.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |