Data Integration

73 papers with code • 0 benchmarks • 7 datasets

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

Dong, Srivastava: Big data integration, 2013.
Doan, Halevy, Ives: Principles of Data Integration, 2012.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Integration

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Data Integration models and implementations

morph-kgc/morph-kgc

4 papers

159

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs

NYXFLOWER/GripNet • • 29 Oct 2020

Heterogeneous graph representation learning aims to learn low-dimensional vector representations of different types of entities and relations to empower downstream tasks.

Paper
Code

DANAE: a denoising autoencoder for underwater attitude estimation

fabidicia/DANAE • • 13 Nov 2020

One of the main issues for underwater robots navigation is their accurate positioning, which heavily depends on the orientation estimation phase.

Paper
Code

Learning to Characterize Matching Experts

shraga89/MED • • 2 Dec 2020

Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements.

Paper
Code

An attention model to analyse the risk of agitation and urinary tract infections in people with dementia

RoonakR/rational_model • • 18 Jan 2021

We have developed an integrated platform to collect in-home sensor data and performed an observational study to apply machine learning models for agitation and UTI risk analysis.

Paper
Code

A Variational Information Bottleneck Approach to Multi-Omics Data Integration

chl8856/DeepIMV • • 5 Feb 2021

Due to non-uniformity and technical limitations in omics platforms, such integrative analyses on multiple omics, which we refer to as views, involve learning from incomplete observations with various view-missing patterns.

Paper
Code

VeeAlign: Multifaceted Context Representation using Dual Attention for Ontology Alignment

remorax/veealign • • EMNLP 2021

Ontology Alignment is an important research problem applied to various fields such as data integration, data transfer, data preparation, etc.

Paper
Code

Dual-Objective Fine-Tuning of BERT for Entity Matching

wbsg-uni-mannheim/jointbert • • Proceedings of the VLDB Endowment 2021

The task can be approached by learning a binary classifier which distinguishes pairs of entity descriptions for the same real-world entity from descriptions of different entities.

Paper
Code

Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

benevolentai/comp • • NeurIPS 2021

Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology.

Paper
Code

Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources

zipkinlab/doser_etal_2021_inreview • 4 Sep 2021

We present an "integrated community occupancy model" (ICOM) that unites principles of data integration and hierarchical community modeling in a single framework to provide inferences on species-specific and community occurrence dynamics using multiple data sources.

Paper
Code

PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching

shraga89/powarematch • • 15 Sep 2021

Schema matching is a core task of any data integration process.

Paper
Code

Data Integration

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result