fastText

Introduced by Bojanowski et al. in Enriching Word Vectors with Subword Information

fastText embeddings exploit subword information to construct word embeddings. Representations are learnt of character $n$-grams, and words represented as the sum of the $n$-gram vectors. This extends the word2vec type models with subword information. This helps the embeddings understand suffixes and prefixes. Once a word is represented using character $n$-grams, a skipgram model is trained to learn the embeddings.

Source: Enriching Word Vectors with Subword Information

Latest Papers

PAPER DATE
Gender Prediction Based on Vietnamese Names with Machine Learning Techniques
Huy Quoc ToKiet Van NguyenNgan Luu-Thuy NguyenAnh Gia-Tuan Nguyen
2020-10-21
LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text
Pranaydeep SinghEls Lefever
2020-10-21
WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets
Dat Quoc NguyenThanh VuAfshin RahimiMai Hoang DaoLinh The NguyenLong Doan
2020-10-16
gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data
Sunil GundapuRadhika Mamidi
2020-10-09
Intrinsic Probing through Dimension Selection
Lucas Torroba HennigenAdina WilliamsRyan Cotterell
2020-10-06
"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings
Akshita AggarwalAnshul WadhawanAnshima ChaudharyKavita Maurya
2020-10-01
Development of Word Embeddings for Uzbek Language
B. MansurovA. Mansurov
2020-09-30
FarsTail: A Persian Natural Language Inference Dataset
| Hossein AmirkhaniMohammad Azari JafariAzadeh AmirakZohreh PourjafariSoroush Faridan JahromiZeinab Kouhkan
2020-09-18
EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
Nickil Maveli
2020-09-06
Going Beyond T-SNE: Exposing \texttt{whatlies} in Text Embeddings
Vincent D. WarmerdamThomas KoberRachael Tatman
2020-09-04
Beyond Next Item Recommendation: Recommending and Evaluating List of Sequences
Makbule Gulcin Ozsoy
2020-08-30
A Study of fastText Word Embedding Effects in Document Classification in Bangla Language
| Pritom MojumderMahmudul HasanMd. Faruque HossainK. M. Azharul Hasan
2020-07-30
COVID-19 therapy target discovery with context-aware literature mining
Matej MartincBlaž ŠkrljSergej PirkmajerNada LavračBojan CestnikMartin MarzidovšekSenja Pollak
2020-07-30
Word Embeddings: Stability and Semantic Change
Lucas Rettenmeier
2020-07-23
Morphological Skip-Gram: Using morphological knowledge to improve word representation
Flávio SantosHendrik MacedoThiago BispoCleber Zanchettin
2020-07-20
Sosed: a tool for finding similar software projects
Egor BogomolovYaroslav GolubevArtyom LobanovVladimir KovalenkoTimofey Bryksin
2020-07-06
Estimating the effect of COVID-19 on mental health: Linguistic indicators of depression during a global pandemic
JT Wolohan
2020-07-01
On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
Adam SuttonNello Cristianini
2020-06-17
qDKT: Question-centric Deep Knowledge Tracing
Shashank SonkarAndrew E. WatersAndrew S. LanPhillip J. GrimaldiRichard G. Baraniuk
2020-05-25
Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality
Lachlan McPheatMehrnoosh SadrzadehHadi WazniGijs Wijnholds
2020-05-06
TALN/LS2N Participation at the BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora
Martin LavilleAmir HazemEmmanuel Morin
2020-05-01
UNIOR NLP at MWSA Task - GlobaLex 2020: Siamese LSTM with Attention for Word Sense Alignment
Raffaele MannaGiulia SperanzaMaria Pia di BuonoJohanna Monti
2020-05-01
AI\_ML\_NIT\_Patna @ TRAC - 2: Deep Learning Approach for Multi-lingual Aggression Identification
Kirti KumariJyoti Prakash Singh
2020-05-01
A First Dataset for Film Age Appropriateness Investigation
Emad MohamedLe An Ha
2020-05-01
Word Embedding Evaluation for Sinhala
Dimuthu LakmalSurangika RanathungaSaman PeramunaIndu Herath
2020-05-01
Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
Jeremie BoudreauAkankshya PatraAshima SuvarnaPaul Cook
2020-05-01
Facilitating Corpus Usage: Making Icelandic Corpora More Accessible for Researchers and Language Users
Stein{\th}{\'o}r Steingr{\'\i}mssonStarka{\dh}ur BarkarsonGunnar Thor {\"O}rn{\'o}lfsson
2020-05-01
Identifying Cognates in English-Dutch and French-Dutch by means of Orthographic Information and Cross-lingual Word Embeddings
Els LefeverSofie LabatPranaydeep Singh
2020-05-01
Czech Historical Named Entity Corpus v 1.0
Helena Hubkov{\'a}Pavel KralEva Pettersson
2020-05-01
High Quality ELMo Embeddings for Seven Less-Resourced Languages
Matej Ul{\v{c}}arMarko Robnik-{\v{S}}ikonja
2020-05-01
CBOW-tag: a Modified CBOW Algorithm for Generating Embedding Models from Annotated Corpora
Attila Nov{\'a}kL{\'a}szl{\'o} LakiBorb{\'a}la Nov{\'a}k
2020-05-01
KLEJ: Comprehensive Benchmark for Polish Language Understanding
| Piotr RybakRobert MroczkowskiJanusz TraczIreneusz Gawlik
2020-05-01
DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus
| Javad PourMostafa Roshan SharamiParsa Abbasi SarabestaniSeyed Abolghasem Mirroshandel
2020-04-11
Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara LogachevaDenis TeslenkoArtem ShelmanovSteffen RemusDmitry UstalovAndrey KutuzovEkaterina ArtemovaChris BiemannSimone Paolo PonzettoAlexander Panchenko
2020-03-14
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
Ivan VulićSimon BakerEdoardo Maria PontiUlla PettiIra LeviantKelly WingOlga MajewskaEden BarMatt MaloneThierry PoibeauRoi ReichartAnna Korhonen
2020-03-10
Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes
Noel KennedyImogen SchofieldDave C. BrodbeltDavid B. ChurchDan G. O'Neill
2020-03-07
Comparison of Turkish Word Representations Trained on Different Morphological Forms
Gökhan GülerA. Cüneyd Tantuğ
2020-02-13
Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese
Jessica Rodrigues da SilvaHelena de Medeiros Caseli
2020-01-21
ExEm: Expert Embedding using dominating set theory with deep learning approaches
N. Nikzad-KhasmakhiM. A. BalafarM. Reza Feizi-DerakhshiCina Motamed
2020-01-16
A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts
Anant KhandelwalNiraj Kumar
2020-01-15
Character 3-gram Mover's Distance: An Effective Method for Detecting Near-duplicate Japanese-language Recipes
Masaki OguniYohei SekiYu Hirate
2019-12-11
A New Corpus for Low-Resourced Sindhi Language with Word Embeddings
Wazir AliJay KumarJunyu LuZenglin Xu
2019-11-28
hauWE: Hausa Words Embedding for Natural Language Processing
Idris AbdulmuminBashir Shehu Galadanci
2019-11-25
High Quality ELMo Embeddings for Seven Less-Resourced Languages
Matej UlčarMarko Robnik-Šikonja
2019-11-22
Multilingual Culture-Independent Word Analogy Datasets
Matej UlčarKristiina VaikJessica LindströmMilda DailidėnaitėMarko Robnik-Šikonja
2019-11-22
Towards non-toxic landscapes: Automatic toxic comment detection using DNN
Ashwin Geet D'SaIrina IllinaDominique Fohr
2019-11-19
Event detection in Colombian security Twitter news using fine-grained latent topic analysis
Vladimir Vargas-CalderónNicolás Parra-A.Jorge E. CamargoHerbert Vinck-Posada
2019-11-19
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
| Guillaume WenzekMarie-Anne LachauxAlexis ConneauVishrav ChaudharyFrancisco GuzmánArmand JoulinEdouard Grave
2019-11-01
Sentence Embeddings for Russian NLU
Dmitry PopovAlexander PugachevPolina SvyatokumElizaveta SvitankoEkaterina Artemova
2019-10-29
Using machine learning and information visualisation for discovering latent topics in Twitter news
Vladimir Vargas-CalderónMarlon Steibeck DominguezN. Parra-A.Herbert Vinck-PosadaJorge E. Camargo
2019-10-21
Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations
Oluwaseyi FeyisetanBorja BalleThomas DrakeTom Diethe
2019-10-20
Uncovering Flaming Events on News Media in Social Media
Praboda RajapakshaReza FarahbakhshNoel CrespiBruno Defude
2019-09-16
Dialogue Act Classification in Team Communication for Robot Assisted Disaster Response
Tatiana AnikinaIvana Kruijff-Korbayova
2019-09-01
Sparse Victory -- A Large Scale Systematic Comparison of count-based and prediction-based vectorizers for text classification
Rupak ChakrabortyAshima ElhenceKapil Arora
2019-09-01
Question Similarity in Community Question Answering: A Systematic Exploration of Preprocessing Methods and Models
Florian KunnemanThiago Castro FerreiraEmiel KrahmerAntal van den Bosch
2019-09-01
Tagger for Polish Computer Mediated Communication Texts
Wiktor WalentynowiczMaciej PiaseckiMarcin Oleksy
2019-09-01
Evaluation of vector embedding models in clustering of text documents
Tomasz WalkowiakMateusz Gniewkowski
2019-09-01
Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERC
Iva Marinova
2019-09-01
Improving Word Embeddings Using Kernel PCA
Vishwani GuptaSven GiesselbachStefan R{\"u}pingChristian Bauckhage
2019-08-01
Learning Word Embeddings without Context Vectors
Alexey ZobninEvgenia Elistratova
2019-08-01
RNN Embeddings for Identifying Difficult to Understand Medical Words
| Hanna PylievaArtem ChernodubNatalia GrabarThierry Hamon
2019-08-01
Robust to Noise Models in Natural Language Processing Tasks
Valentin Malykh
2019-07-01
Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Suyoun KimSiddharth DalmiaFlorian Metze
2019-06-27
Word Embeddings for the Armenian Language: Intrinsic and Extrinsic Evaluation
| Karen AvetisyanTsolak Ghukasyan
2019-06-07
Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Benjamin HeinzerlingMichael Strube
2019-06-04
Beyond Context: A New Perspective for Word Embeddings
Yichu ZhouVivek Srikumar
2019-06-01
YNU NLP at SemEval-2019 Task 5: Attention and Capsule Ensemble for Identifying Hate Speech
Bin WangHaiyan Ding
2019-06-01
YNU\_DYX at SemEval-2019 Task 5: A Stacked BiGRU Model Based on Capsule Network in Detection of Hate
Yunxia DingXiaobing ZhouXuejie Zhang
2019-06-01
Deep Learning Techniques for Humor Detection in Hindi-English Code-Mixed Tweets
Sushmitha Reddy SaneSuraj TripathiKoushik Reddy SaneRadhika Mamidi
2019-06-01
Medical Word Embeddings for Spanish: Development and Evaluation
Felipe SoaresMarta VillegasAitor Gonzalez-AgirreMartin KrallingerJordi Armengol-Estap{\'e}
2019-06-01
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
NNavnita akumarTimothy BaldwinBahar Salehi
2019-06-01
Misspelling Oblivious Word Embeddings
Bora EdizelAleksandra PiktusPiotr BojanowskiRui FerreiraEdouard GraveFabrizio Silvestri
2019-05-23
Taming Pretrained Transformers for Extreme Multi-label Text Classification
| Wei-Cheng ChangHsiang-Fu YuKai ZhongYiming YangInderjit Dhillon
2019-05-07
Personalized Query Auto-Completion Through a Lightweight Representation of the User Context
Manojkumar Rangasamy KannadasanGrigor Aslanyan
2019-05-03
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Sandip ModhaPrasenjit Majumder
2019-04-16
Text2Node: a Cross-Domain System for Mapping Arbitrary Phrases to a Taxonomy
Rohollah SoltaniAlexandre Tomberg
2019-04-11
[email protected] at SemEval-2019 Task 6 and Task 5: Linguistically enhanced deep learning offensive sentence classifier
Alessandro SegantiHelena SobolIryna OrlovaHannam KimJakub StaniszewskiTymoteusz KrumholcKrystian Koziel
2019-04-10
Exploring Fine-Tuned Embeddings that Model Intensifiers for Emotion Analysis
Laura BostanRoman Klinger
2019-04-05
Question Embeddings Based on Shannon Entropy: Solving intent classification task in goal-oriented dialogue system
| Aleksandr PerevalovDaniil KurushinRustam FaizrakhmanovFarida Khabibrakhmanova
2019-03-25
Column2Vec: Structural Understanding via Distributed Representations of Database Schemas
Michael J. MiorAlexander G. Ororbia II
2019-03-20
Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection
Loreto ParisiSimone FranciaSilvio OlivastriMaria Stella Tavella
2019-01-15
Multilingual Constituency Parsing with Self-Attention and Pre-Training
| Nikita KitaevSteven CaoDan Klein
2018-12-31
Evaluating Architectural Choices for Deep Learning Approaches for Question Answering over Knowledge Bases
Sherzod HakimovSoufian JebbaraPhilipp Cimiano
2018-12-06
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package
| Ajay PatelAlexander SandsChris Callison-BurchMarianna Apidianaki
2018-10-26
An Analysis of Hierarchical Text Classification Using Word Embeddings
Roger A. SteinPatricia A. JaquesJoao F. Valiati
2018-09-06
Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks
Martin SchmittSimon SteinheberKonrad SchreiberBenjamin Roth
2018-08-28
A Hassle-Free Machine Learning Method for Cohort Selection of Clinical Trials
Liu Man
2018-08-10
Cyberbullying Detection -- Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology
Michał PtaszyńskiGniewosz LeliwaMateusz PiechAleksander Smywiński-Pohl
2018-08-02
Aggression Identification and Multi Lingual Word Embeddings
Thiago GaleryEfstathios CharitosYe Tian
2018-08-01
Tree-structured multi-stage principal component analysis (TMPCA): theory and applications
Yuanhang SuRuiyuan LinC. -C. Jay Kuo
2018-07-22
Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization
| Dieter GaleaIvan LaponogovKirill Veselkov
2018-07-01
Probabilistic FastText for Multi-Sense Word Embeddings
| Ben AthiwaratkunAndrew Gordon WilsonAnima Anandkumar
2018-06-07
CENNLP at SemEval-2018 Task 2: Enhanced Distributed Representation of Text using Target Classes for Emoji Prediction Representation
Naveen J RHariharan VBarathi Ganesh H. B.An Kumar MSoman K P
2018-06-01
Entropy-Based Subword Mining with an Application to Word Embeddings
Ahmed El-KishkyFrank XuAston ZhangStephen MackeJiawei Han
2018-06-01
A Novel Way of Identifying Cyber Predators
Dan LiuChing Yee SuenOlga Ormandjieva
2017-12-11
IIIT-H at IJCNLP-2017 Task 4: Customer Feedback Analysis using Machine Learning and Neural Network Approaches
DPrathyusha aPruthwik MishraSilpa KannegantiSoujanya Lanka
2017-12-01
Evaluation of Croatian Word Embeddings
| Lukas SvobodaSlobodan Beliga
2017-11-06
Fast Linear Model for Knowledge Graph Embeddings
Armand JoulinEdouard GravePiotr BojanowskiMaximilian NickelTomas Mikolov
2017-10-30
OhioState at IJCNLP-2017 Task 4: Exploring Neural Architectures for Multilingual Customer Feedback Analysis
Dushyanta Dhyani
2017-10-18
Synapse at CAp 2017 NER challenge: Fasttext CRF
Damien SileoCamille PradelPhilippe MullerTim Van de Cruys
2017-09-14
Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?
| Xiang ZhangYann LeCun
2017-08-08
When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data
| Akshita JhaRadhika Mamidi
2017-08-01
Rotations and Interpretability of Word Embeddings: the Case of the Russian Language
Alexey Zobnin
2017-07-14
Learning Convolutional Text Representations for Visual Question Answering
| Zhengyang WangShuiwang Ji
2017-05-18
Analysis and Optimization of fastText Linear Text Classifier
Vladimir ZolotovDavid Kung
2017-02-17
Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities
Yadollah YaghoobzadehHinrich Schütze
2017-01-08
FastText.zip: Compressing text classification models
| Armand JoulinEdouard GravePiotr BojanowskiMatthijs DouzeHérve JégouTomas Mikolov
2016-12-12
Enriching Word Vectors with Subword Information
| Piotr BojanowskiEdouard GraveArmand JoulinTomas Mikolov
2016-07-15

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories