Search Results for author: Bela Gipp

Found 62 papers, 38 papers with code

Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession

1 code implementation19 Feb 2024 Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023).

Text-Guided Image Clustering

1 code implementation5 Feb 2024 Andreas Stephan, Lukas Miklautz, Kevin Sidak, Jan Philip Wahle, Bela Gipp, Claudia Plant, Benjamin Roth

We, therefore, propose Text-Guided Image Clustering, i. e., generating text using image captioning and visual question-answering (VQA) models and subsequently clustering the generated text.

Clustering Image Captioning +3

Taxonomy of Mathematical Plagiarism

1 code implementation30 Jan 2024 Ankit Satpute, Andre Greiner-Petter, Noah Gießing, Isabel Beckenbach, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp

Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy.

Math Question Answering +1

The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

1 code implementation26 Dec 2023 Timo Spinde, Smi Hinterreiter, Fabian Haak, Terry Ruas, Helge Giese, Norman Meuschke, Bela Gipp

However, we have identified a lack of interdisciplinarity in existing projects, and a need for more awareness of the various types of media bias to support methodologically thorough performance evaluations of media bias detection systems.

Bias Detection

Paraphrase Types for Generation and Detection

1 code implementation23 Oct 2023 Jan Philip Wahle, Bela Gipp, Terry Ruas

Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language.

Binary Classification Paraphrase Generation

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

1 code implementation23 Oct 2023 Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

We analyzed ~77k NLP papers, ~3. 1m citations from NLP papers to other papers, and ~1. 8m citations from other papers to NLP papers.

Math

Generative User-Experience Research for Developing Domain-specific Natural Language Processing Applications

no code implementations28 Jun 2023 Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp

Generative UX research employs domain users at the initial stages of prototype development, i. e., ideation and concept evaluation, and the last stage for evaluating system usefulness and user utility.

Neural Machine Translation for Mathematical Formulae

no code implementations25 May 2023 Felix Petersen, Moritz Schubotz, Andre Greiner-Petter, Bela Gipp

We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages.

Machine Translation Translation

TEIMMA: The First Content Reuse Annotator for Text, Images, and Math

1 code implementation22 May 2023 Ankit Satpute, André Greiner-Petter, Moritz Schubotz, Norman Meuschke, Akiko Aizawa, Olaf Teschke, Bela Gipp

This demo paper presents the first tool to annotate the reuse of text, images, and mathematical formulae in a document pair -- TEIMMA.

Math

Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation-, and Assistance-Systems

no code implementations12 May 2023 Bela Gipp, André Greiner-Petter, Moritz Schubotz, Norman Meuschke

This project investigated new approaches and technologies to enhance the accessibility of mathematical content and its semantic information for a broad range of information retrieval applications.

Information Retrieval Retrieval

Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection

1 code implementation25 Apr 2023 Martin Wessel, Tomáš Horych, Terry Ruas, Akiko Aizawa, Bela Gipp, Timo Spinde

A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously.

Bias Detection

Paraphrase Detection: Human vs. Machine Content

1 code implementation24 Mar 2023 Jonas Becker, Jan Philip Wahle, Terry Ruas, Bela Gipp

Additionally, we identify four datasets as the most diverse and challenging for paraphrase detection.

A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents

no code implementations17 Mar 2023 Norman Meuschke, Apurva Jagdale, Timo Spinde, Jelena Mitrović, Bela Gipp

Using the new framework, we benchmark ten freely available tools in extracting document metadata, bibliographic references, tables, and other content elements from academic PDF documents.

Retrieval Table Extraction

Discovery and Recognition of Formula Concepts using Machine Learning

1 code implementation3 Mar 2023 Philipp Scharpf, Moritz Schubotz, Howard S. Cohl, Corinna Breitinger, Bela Gipp

Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts.

Information Retrieval Question Answering +2

Collaborative and AI-aided Exam Question Generation using Wikidata in Education

1 code implementation15 Nov 2022 Philipp Scharpf, Moritz Schubotz, Andreas Spitz, Andre Greiner-Petter, Bela Gipp

To address this need, we propose a multilingual Wikimedia framework that allows for collaborative worldwide teacher knowledge engineering and subsequent AI-aided question generation, test, and correction.

Question Generation Question-Generation

Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling

1 code implementation12 Nov 2022 Philipp Scharpf, Moritz Schubotz, Bela Gipp

In this paper, we aim to bridge the gap by presenting data mining methods and benchmark results to employ Mathematical Entity Linking (MathEL) and Unsupervised Formula Labeling (UFL) for semantic formula search and mathematical question answering (MathQA) on the arXiv preprint repository, Wikipedia, and Wikidata, which is part of the Wikimedia ecosystem of free knowledge.

Entity Linking Knowledge Graphs +2

Caching and Reproducibility: Making Data Science experiments faster and FAIRer

no code implementations8 Nov 2022 Moritz Schubotz, Ankit Satpute, Andre Greiner-Petter, Akiko Aizawa, Bela Gipp

In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers.

Information Retrieval Retrieval

Analyzing Multi-Task Learning for Abstractive Text Summarization

1 code implementation26 Oct 2022 Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp

Further, we find that choice and combinations of task families influence downstream performance more than the training scheme, supporting the use of task families for abstractive text summarization.

Abstractive Text Summarization Multi-Task Learning +3

CS-Insights: A System for Analyzing Computer Science Research

2 code implementations13 Oct 2022 Terry Ruas, Jan Philip Wahle, Lennart Küll, Saif M. Mohammad, Bela Gipp

This paper presents CS-Insights, an interactive web application to analyze computer science publications from DBLP through multiple perspectives.

How Large Language Models are Transforming Machine-Paraphrased Plagiarism

3 code implementations7 Oct 2022 Jan Philip Wahle, Terry Ruas, Frederic Kirstein, Bela Gipp

The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work.

Paraphrase Generation

Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts

1 code implementation29 Sep 2022 Timo Spinde, Manuel Plank, Jan-David Krieger, Terry Ruas, Bela Gipp, Akiko Aizawa

Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0. 804, outperforming existing methods.

Bias Detection Sentence

A Domain-adaptive Pre-training Approach for Language Bias Detection in News

1 code implementation22 May 2022 Jan-David Krieger, Timo Spinde, Terry Ruas, Juhi Kulshrestha, Bela Gipp

We present DA-RoBERTa, a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias with an F1 score of 0. 814.

Bias Detection Decision Making +1

D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research

1 code implementation LREC 2022 Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp

We present an initial analysis focused on the volume of computer science research (e. g., number of papers, authors, research activity), trends in topics of interest, and citation patterns.

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

1 code implementation28 Mar 2022 Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm

We compare and analyze three generic document embeddings, six specialized document embeddings and a pairwise classification baseline in the context of research paper recommendations.

Document Classification Recommendation Systems +1

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

1 code implementation14 Feb 2022 Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics.

Citation Prediction Contrastive Learning +3

Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort

no code implementations15 Dec 2021 Franziska Weeber, Felix Hamborg, Karsten Donnay, Bela Gipp

Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques.

Active Learning Language Modelling +1

Do You Think It's Biased? How To Ask For The Perception Of Media Bias

no code implementations14 Dec 2021 Timo Spinde, Christina Kreuter, Wolfgang Gaissmaier, Felix Hamborg, Bela Gipp, Helge Giese

To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?"

Towards A Reliable Ground-Truth For Biased Language Detection

no code implementations14 Dec 2021 Timo Spinde, David Krieger, Manuel Plank, Bela Gipp

Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset.

Bias Detection

TASSY -- A Text Annotation Survey System

no code implementations14 Dec 2021 Timo Spinde, Kanishka Sinha, Norman Meuschke, Bela Gipp

We present a free and open-source tool for creating web-based surveys that include text annotation tasks.

text annotation

Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings

no code implementations14 Dec 2021 Timo Spinde, Lada Rudnitckaia, Felix Hamborg, Bela Gipp

The underlying idea is that the context of biased words in different news outlets varies more strongly than the one of non-biased words, since the perception of a word as being biased differs depending on its context.

Word Embeddings

ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

1 code implementation13 Dec 2021 Anastasia Zhukova, Felix Hamborg, Bela Gipp

Named entity recognition (NER) is an important task that aims to resolve universal categories of named entities, e. g., persons, locations, organizations, and times.

Descriptive named-entity-recognition +2

Detecting Cross-Language Plagiarism using Open Knowledge Graphs

1 code implementation18 Nov 2021 Johannes Stegmüller, Fabian Bauer-Marquart, Norman Meuschke, Terry Ruas, Moritz Schubotz, Bela Gipp

Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations.

Knowledge Graphs Machine Translation +1

Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection

1 code implementation15 Nov 2021 Jan Philip Wahle, Nischal Ashok, Terry Ruas, Norman Meuschke, Tirthankar Ghosal, Bela Gipp

We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.

Misinformation

Newsalyze: Effective Communication of Person-Targeting Biases in News Articles

no code implementations18 Oct 2021 Felix Hamborg, Kim Heinser, Anastasia Zhukova, Karsten Donnay, Bela Gipp

Our study further suggests that our content-driven identification method detects groups of similarly slanted news articles due to substantial biases present in individual news articles.

Natural Language Understanding

How to Effectively Identify and Communicate Person-Targeting Media Bias in Daily News Consumption?

no code implementations18 Oct 2021 Felix Hamborg, Timo Spinde, Kim Heinser, Karsten Donnay, Bela Gipp

We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues.

News Recommendation

A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles

1 code implementation16 Sep 2021 Malte Ostendorff, Corinna Breitinger, Bela Gipp

We conclude that users of literature recommendation systems can benefit most from hybrid approaches that combine both link- and text-based approaches, where the user's information needs and preferences should control the weighting for the approaches used.

Recommendation Systems

Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes

1 code implementation LREC 2022 Anastasia Zhukova, Felix Hamborg, Bela Gipp

In this paper, we qualitatively and quantitatively compare the annotation schemes of ECB+, a CDCR dataset with identity coreference relations, and NewsWCL50, a CDCR dataset with a mix of loose context-dependent and strict coreference relations.

coreference-resolution Cross Document Coreference Resolution

Towards Explaining STEM Document Classification using Mathematical Entity Linking

no code implementations2 Sep 2021 Philipp Scharpf, Moritz Schubotz, Bela Gipp

The results indicate that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.

Classification Document Classification +1

Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons

no code implementations2 Jul 2021 Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Bela Gipp

Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e. g., "migrant families" = "asylum-seekers."

Clustering Dimensionality Reduction +1

Incorporating Word Sense Disambiguation in Neural Language Models

2 code implementations15 Jun 2021 Jan Philip Wahle, Terry Ruas, Norman Meuschke, Bela Gipp

We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs).

Word Sense Disambiguation

Towards Target-dependent Sentiment Classification in News Articles

1 code implementation20 May 2021 Felix Hamborg, Karsten Donnay, Bela Gipp

Extensive research on target-dependent sentiment classification (TSC) has led to strong classification performances in domains where authors tend to explicitly express sentiment about specific entities or topics, such as in reviews or on social media.

Classification Decision Making +3

Evaluating Document Representations for Content-based Legal Literature Recommendations

1 code implementation28 Apr 2021 Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm

Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets.

Recommendation Systems Representation Learning +1

Identifying Machine-Paraphrased Plagiarism

2 code implementations22 Mar 2021 Jan Philip Wahle, Terry Ruas, Tomáš Foltýnek, Norman Meuschke, Bela Gipp

Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity.

Text Matching

Aspect-based Document Similarity for Research Papers

1 code implementation COLING 2020 Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp, Georg Rehm

Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques.

Document Classification Recommendation Systems

AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels

1 code implementation25 May 2020 Moritz Schubotz, Philipp Scharpf, Olaf Teschke, Andreas Kuehnemund, Corinna Breitinger, Bela Gipp

Moreover, we find that the method's confidence score allows for reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles.

Classification General Classification +2

A First Step Towards Content Protecting Plagiarism Detection

1 code implementation23 May 2020 Cornelius Ihle, Moritz Schubotz, Norman Meuschke, Bela Gipp

Plagiarism detection systems are essential tools for safeguarding academic and educational integrity.

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

no code implementations22 May 2020 Philipp Scharpf, Moritz Schubotz, Abdou Youssef, Felix Hamborg, Norman Meuschke, Bela Gipp

In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content.

Classification Clustering +3

Mathematical Formulae in Wikimedia Projects 2020

no code implementations20 Mar 2020 Moritz Schubotz, André Greiner-Petter, Norman Meuschke, Olaf Teschke, Bela Gipp

This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae.

Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

1 code implementation7 Feb 2020 Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp

The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e. g., linking $P_{n}^{(\alpha, \beta)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems.

Information Retrieval Math +2

NewsDeps: Visualizing the Origin of Information in News Articles

no code implementations23 Sep 2019 Felix Hamborg, Philipp Meschenmoser, Moritz Schubotz, Bela Gipp

In scientific publications, citations allow readers to assess the authenticity of the presented information and verify it in the original context.

Giveme5W1H: A Universal System for Extracting Main Events from News Articles

2 code implementations6 Sep 2019 Felix Hamborg, Corinna Breitinger, Bela Gipp

Event extraction from news articles is a commonly required prerequisite for various tasks, such as article summarization, article clustering, and news aggregation.

Clustering Event Extraction

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

no code implementations27 Jun 2019 Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Karmer, Bela Gipp

Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.

Math

Why Machines Cannot Learn Mathematics, Yet

no code implementations20 May 2019 André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp

Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods.

BIG-bench Machine Learning Information Retrieval +1

Towards Formula Translation using Recursive Neural Networks

no code implementations10 Nov 2018 Felix Petersen, Moritz Schubotz, Bela Gipp

We implemented the first translator for mathematical formulae based on recursive neural networks.

Clustering Position +1

Cannot find the paper you are looking for? You can Submit a new open access paper.