no code implementations • EMNLP (MRQA) 2021 • Fredrik Carlsson, Magnus Sahlgren, Fredrik Olsson, Amaru Cuba Gyllensten
This paper introduces a long-range multiple-choice Question Answering (QA) dataset, based on full-length fiction book texts.
no code implementations • LREC 2022 • Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren
We present GTP-SW3, a 3. 5 billion parameter autoregressive language model, trained on a newly created 100 GB Swedish corpus.
no code implementations • 22 May 2023 • Ariel Ekgren, Amaru Cuba Gyllensten, Felix Stollenwerk, Joey Öhman, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Alice Heiman, Judit Casademont, Magnus Sahlgren
This paper details the process of developing the first native large generative language model for the Nordic languages, GPT-SW3.
no code implementations • 30 Mar 2023 • Joey Öhman, Severine Verlinden, Ariel Ekgren, Amaru Cuba Gyllensten, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Magnus Sahlgren
Pre-training Large Language Models (LLMs) require massive amounts of text data, and the performance of the LLMs typically correlates with the scale and quality of the datasets.
1 code implementation • 20 May 2021 • Alessandro Lenci, Magnus Sahlgren, Patrick Jeuniaux, Amaru Cuba Gyllensten, Martina Miliani
In this paper, we perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
1 code implementation • ICLR 2021 • Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren
Extracting semantically useful natural language sentence representations from pre-trained deep neural networks such as Transformers remains a challenge.
no code implementations • SEMEVAL 2020 • Amaru Cuba Gyllensten, Evangelia Gogoulou, Ariel Ekgren, Magnus Sahlgren
We (Team Skurt) propose a simple method to detect lexical semantic change by clustering contextualized embeddings produced by XLM-R, using K-Means++.
no code implementations • WS 2018 • Amaru Cuba Gyllensten, Magnus Sahlgren
Sentiment and topic analysis are common methods used for social media monitoring.
1 code implementation • WS 2019 • Ariel Ekgren, Amaru Cuba Gyllensten, Magnus Sahlgren
This paper investigates data-driven segmentation using Re-Pair or Byte Pair Encoding-techniques.
no code implementations • LREC 2018 • Amaru Cuba Gyllensten, Magnus Sahlgren
This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models.
no code implementations • LREC 2016 • Magnus Sahlgren, Amaru Cuba Gyllensten, Fredrik Espinoza, Ola Hamfors, Jussi Karlgren, Fredrik Olsson, Per Persson, Akshay Viswanathan, Anders Holst
This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages.
no code implementations • EMNLP 2015 • Amaru Cuba Gyllensten, Magnus Sahlgren
We also argue that the topology of the neighborhoods in semantic space can be used to determine the semantic horizon of a point, which we define as the set of neighbors that have a direct connection to the point.