Search Results for author: Caio Corro

Found 19 papers, 4 papers with code

Auto-encodeurs variationnels : contrecarrer le problème de posterior collapse grâce à la régularisation du décodeur (Variational auto-encoders : prevent posterior collapse via decoder regularization)

no code implementations • JEP/TALN/RECITAL 2021 • Alban Petit, Caio Corro

Les auto-encodeurs variationnels sont des modèles génératifs utiles pour apprendre des représentations latentes.

Paper
Add Code

Un algorithme d’analyse sémantique fondée sur les graphes via le problème de l’arborescence généralisée couvrante (A graph-based semantic parsing algorithm via the generalized spanning arborescence problem)

no code implementations • JEP/TALN/RECITAL 2022 • Alban Petit, Caio Corro

Nous proposons un nouvel algorithme pour l’analyse sémantique fondée sur les graphes via le problème de l’arborescence généralisée couvrante.

Semantic Parsing

Paper
Add Code

Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n\^6) down to O(n\^3)

no code implementations • EMNLP 2020 • Caio Corro

We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures.

Constituency Parsing Word Embeddings

Paper
Add Code

Ré-ordonnancement via programmation dynamique pour l’adaptation cross-lingue d’un analyseur en dépendances (Sentence reordering via dynamic programming for cross-lingual dependency parsing )

no code implementations • JEP/TALN/RECITAL 2022 • Nicolas Devatine, Caio Corro, François Yvon

Cet article s’intéresse au transfert cross-lingue d’analyseurs en dépendances et étudie des méthodes pour limiter l’effet potentiellement néfaste pour le transfert de divergences entre l’ordre des mots dans les langues source et cible.

Dependency Parsing Sentence

Paper
Add Code

Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks

1 code implementation • 26 Mar 2024 • Santiago Herrera, Caio Corro, Sylvain Kahane

More specifically, we extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order, using a large search space and paying special attention to the ranking order of the extracted rules.

Descriptive

Paper
Code

SaulLM-7B: A pioneering Large Language Model for Law

no code implementations • 6 Mar 2024 • Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa

In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain.

Language Modelling Large Language Model +1

Paper
Add Code

CroissantLLM: A Truly Bilingual French-English Language Model

1 code implementation • 1 Feb 2024 • Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

We introduce CroissantLLM, a 1. 3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware.

Language Modelling Large Language Model

Paper
Code

Structural generalization in COGS: Supertagging is (almost) all you need

no code implementations • 21 Oct 2023 • Alban Petit, Caio Corro, François Yvon

In many Natural Language Processing applications, neural networks have been found to fail to generalize on out-of-distribution examples.

Semantic Parsing

Paper
Add Code

On graph-based reentrancy-free semantic parsing

no code implementations • 15 Feb 2023 • Alban Petit, Caio Corro

We propose a novel graph-based approach for semantic parsing that resolves two problems observed in the literature: (1) seq2seq models fail on compositional generalization tasks; (2) previous work using phrase structure parsers cannot cover all the semantic parses observed in treebanks.

Semantic Parsing TAG +1

Paper
Add Code

On the inconsistency of separable losses for structured prediction

no code implementations • 25 Jan 2023 • Caio Corro

In this paper, we prove that separable negative log-likelihood losses for structured prediction are not necessarily Bayes consistent, or, in other words, minimizing these losses may not result in a model that predicts the most probable structure in the data distribution for a given input.

Structured Prediction

Paper
Add Code

A dynamic programming algorithm for span-based nested named-entity recognition in O(n^2)

no code implementations • 10 Oct 2022 • Caio Corro

Span-based nested named-entity recognition (NER) has a cubic-time complexity using a variant of the CYK algorithm.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Preventing posterior collapse in variational autoencoders for text generation via decoder regularization

no code implementations • 28 Oct 2021 • Alban Petit, Caio Corro

Variational autoencoders trained to minimize the reconstruction error are sensitive to the posterior collapse problem, that is the proposal posterior distribution is always equal to the prior.

Decoder Text Generation

Paper
Add Code

Sur l'impact des contraintes structurelles pour l'analyse en d\'ependances profondes fond\'ee sur les graphes (On the impact of structural constraints for graph-based deep dependency parsing)

no code implementations • JEPTALNRECITAL 2020 • Caio Corro

Les algorithmes existants pour l{'}analyse en d{\'e}pendances profondes fond{\'e}e sur les graphes capables de garantir la connexit{\'e} des structures produites ne couvrent pas les corpus du fran{\c{c}}ais.

Dependency Parsing

Paper
Add Code

Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)

1 code implementation • 30 Mar 2020 • Caio Corro

We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures.

Constituency Parsing Word Embeddings

Paper
Code

Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming

1 code implementation • ACL 2019 • Caio Corro, Ivan Titov

We treat projective dependency trees as latent variables in our probabilistic model and induce them in such a way as to be beneficial for a downstream task, without relying on any direct tree supervision.

Natural Language Inference Sentiment Analysis