Language Models

CANINE is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context.

Source: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Phishing Website Detection 1 25.00%
Document Classification 1 25.00%
Specificity 1 25.00%
Malware Classification 1 25.00%

Categories