CANINE

Introduced by Clark et al. in CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

CANINE is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context.

Source: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Phishing Website Detection	1	25.00%
Document Classification	1	25.00%
Specificity	1	25.00%
Malware Classification	1	25.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Transformer	Transformers

Categories

Add Remove

Language Models