Search Results for author: Amaru Cuba Gyllensten

Found 12 papers, 3 papers with code

The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

no code implementations30 Mar 2023 Joey Öhman, Severine Verlinden, Ariel Ekgren, Amaru Cuba Gyllensten, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Magnus Sahlgren

Pre-training Large Language Models (LLMs) require massive amounts of text data, and the performance of the LLMs typically correlates with the scale and quality of the datasets.

Language Modelling

A comparative evaluation and analysis of three generations of Distributional Semantic Models

1 code implementation20 May 2021 Alessandro Lenci, Magnus Sahlgren, Patrick Jeuniaux, Amaru Cuba Gyllensten, Martina Miliani

In this paper, we perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.

Deep Representational Re-tuning using Contrastive Tension

1 code implementation ICLR 2021 Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren

Extracting semantically useful natural language sentence representations from pre-trained deep neural networks such as Transformers remains a challenge.

Semantic Similarity Semantic Textual Similarity +3

SenseCluster at SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

no code implementations SEMEVAL 2020 Amaru Cuba Gyllensten, Evangelia Gogoulou, Ariel Ekgren, Magnus Sahlgren

We (Team Skurt) propose a simple method to detect lexical semantic change by clustering contextualized embeddings produced by XLM-R, using K-Means++.

Change Detection Clustering +1

Distributional Term Set Expansion

no code implementations LREC 2018 Amaru Cuba Gyllensten, Magnus Sahlgren

This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models.

Active Learning Classification +1

The Gavagai Living Lexicon

no code implementations LREC 2016 Magnus Sahlgren, Amaru Cuba Gyllensten, Fredrik Espinoza, Ola Hamfors, Jussi Karlgren, Fredrik Olsson, Per Persson, Akshay Viswanathan, Anders Holst

This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages.

Navigating the Semantic Horizon using Relative Neighborhood Graphs

no code implementations EMNLP 2015 Amaru Cuba Gyllensten, Magnus Sahlgren

We also argue that the topology of the neighborhoods in semantic space can be used to determine the semantic horizon of a point, which we define as the set of neighbors that have a direct connection to the point.

Word Sense Induction

Cannot find the paper you are looking for? You can Submit a new open access paper.