Data Integration
73 papers with code • 0 benchmarks • 7 datasets
Data integration (also called information integration) is the process of consolidating data from a set of heterogeneous data sources into a single uniform data set (materialized integration) or view on the data (virtual integration). Data integration pipelines involve subtasks such as schema matching, table annotation, entity resolution, value normalization, data cleansing, and data fusion. Application domains of data integration include data warehousing, data lakes, and knowledge base consolidation. Surveys on Data integration:
Benchmarks
These leaderboards are used to track progress in Data Integration
Libraries
Use these libraries to find Data Integration models and implementationsMost implemented papers
GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs
Heterogeneous graph representation learning aims to learn low-dimensional vector representations of different types of entities and relations to empower downstream tasks.
DANAE: a denoising autoencoder for underwater attitude estimation
One of the main issues for underwater robots navigation is their accurate positioning, which heavily depends on the orientation estimation phase.
Learning to Characterize Matching Experts
Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements.
An attention model to analyse the risk of agitation and urinary tract infections in people with dementia
We have developed an integrated platform to collect in-home sensor data and performed an observational study to apply machine learning models for agitation and UTI risk analysis.
A Variational Information Bottleneck Approach to Multi-Omics Data Integration
Due to non-uniformity and technical limitations in omics platforms, such integrative analyses on multiple omics, which we refer to as views, involve learning from incomplete observations with various view-missing patterns.
VeeAlign: Multifaceted Context Representation using Dual Attention for Ontology Alignment
Ontology Alignment is an important research problem applied to various fields such as data integration, data transfer, data preparation, etc.
Dual-Objective Fine-Tuning of BERT for Entity Matching
The task can be approached by learning a binary classifier which distinguishes pairs of entity descriptions for the same real-world entity from descriptions of different entities.
Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness
Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology.
Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources
We present an "integrated community occupancy model" (ICOM) that unites principles of data integration and hierarchical community modeling in a single framework to provide inferences on species-specific and community occurrence dynamics using multiple data sources.
PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching
Schema matching is a core task of any data integration process.