Search Results for author: Martin Gerlach

Found 10 papers, 4 papers with code

Multilayer Networks for Text Analysis with Multiple Data Types

1 code implementation30 Jun 2021 Charles C. Hyland, Yuanming Tao, Lamiae Azizi, Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks.

A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics

3 code implementations19 Dec 2018 Martin Gerlach, Francesc Font-Clos

The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years.

Information Retrieval Retrieval

A network approach to topic models

1 code implementation4 Aug 2017 Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e. g., it automatically detects the number of topics and hierarchically clusters both the words and documents).

Community Detection Model Selection +3

Generalized Entropies and the Similarity of Texts

no code implementations11 Nov 2016 Eduardo G. Altmann, Laercio Dias, Martin Gerlach

This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences.

Similarity of symbol frequency distributions with heavy tails

no code implementations1 Oct 2015 Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann

Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences.

valid

Statistical laws in linguistics

no code implementations11 Feb 2015 Eduardo G. Altmann, Martin Gerlach

Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language.

Text Generation

Scaling laws and fluctuations in the statistics of word frequencies

no code implementations17 Jun 2014 Martin Gerlach, Eduardo G. Altmann

In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies.

Topic Models

Extracting information from S-curves of language change

no code implementations17 Jun 2014 Fakhteh Ghanbarnejad, Martin Gerlach, Jose M. Miotto, Eduardo G. Altmann

Combining data analysis with simulations of simple models (e. g., the Bass dynamics on complex networks) we identify signatures of endogenous and exogenous factors in the S-curves of adoption.

Stochastic model for the vocabulary growth in natural languages

no code implementations6 Dec 2012 Martin Gerlach, Eduardo G. Altmann

We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes.

Descriptive

Cannot find the paper you are looking for? You can Submit a new open access paper.