Transformers

Charformer is a type of Transformer model that learns a subword tokenization end-to-end as part of the model. Specifically it uses GBST that automatically learns latent subword representations from characters in a data-driven fashion. Following GBST, the soft subword sequence is passed through Transformer layers.

Source: Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Papers


Paper Code Results Date Stars

Categories