Data Integration

73 papers with code • 0 benchmarks • 7 datasets

Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:

Libraries

Use these libraries to find Data Integration models and implementations

Most implemented papers

GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs

NYXFLOWER/GripNet 29 Oct 2020

Heterogeneous graph representation learning aims to learn low-dimensional vector representations of different types of entities and relations to empower downstream tasks.

DANAE: a denoising autoencoder for underwater attitude estimation

fabidicia/DANAE 13 Nov 2020

One of the main issues for underwater robots navigation is their accurate positioning, which heavily depends on the orientation estimation phase.

Learning to Characterize Matching Experts

shraga89/MED 2 Dec 2020

Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements.

An attention model to analyse the risk of agitation and urinary tract infections in people with dementia

RoonakR/rational_model 18 Jan 2021

We have developed an integrated platform to collect in-home sensor data and performed an observational study to apply machine learning models for agitation and UTI risk analysis.

A Variational Information Bottleneck Approach to Multi-Omics Data Integration

chl8856/DeepIMV 5 Feb 2021

Due to non-uniformity and technical limitations in omics platforms, such integrative analyses on multiple omics, which we refer to as views, involve learning from incomplete observations with various view-missing patterns.

VeeAlign: Multifaceted Context Representation using Dual Attention for Ontology Alignment

remorax/veealign EMNLP 2021

Ontology Alignment is an important research problem applied to various fields such as data integration, data transfer, data preparation, etc.

Dual-Objective Fine-Tuning of BERT for Entity Matching

wbsg-uni-mannheim/jointbert Proceedings of the VLDB Endowment 2021

The task can be approached by learning a binary classifier which distinguishes pairs of entity descriptions for the same real-world entity from descriptions of different entities.

Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

benevolentai/comp NeurIPS 2021

Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology.

Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources

zipkinlab/doser_etal_2021_inreview 4 Sep 2021

We present an "integrated community occupancy model" (ICOM) that unites principles of data integration and hierarchical community modeling in a single framework to provide inferences on species-specific and community occurrence dynamics using multiple data sources.

PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching

shraga89/powarematch 15 Sep 2021

Schema matching is a core task of any data integration process.