Search Results for author: Yeye He

Found 9 papers, 3 papers with code

Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

1 code implementation19 Apr 2024 Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface.

2k Contrastive Learning +1

Table-GPT: Table-tuned GPT for Diverse Table Tasks

no code implementations13 Oct 2023 Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri

Language models, such as GPT-3. 5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks.

Probing Language Models

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

1 code implementation27 Jul 2023 Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases.

Attribute

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

no code implementations4 Jun 2023 Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.

Ground Truth Inference for Weakly Supervised Entity Matching

no code implementations13 Nov 2022 Renzhi Wu, Alexander Bendeck, Xu Chu, Yeye He

We also show that a deep learning EM end model (DeepMatcher) trained on labels generated from our weak supervision approach is comparable to an end model trained using tens of thousands of ground-truth labels, demonstrating that our approach can significantly reduce the labeling efforts required in EM.

Auto-Tag: Tagging-Data-By-Example in Data Lakes

no code implementations11 Dec 2021 Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Anil, Blake Lassiter, Yaron Goland, Gaurav Malhotra

As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (e. g., files and databases) in data lakes with additional metadata (e. g., semantic column-types), as the inferred metadata can enable a range of downstream applications like data governance (e. g., GDPR compliance), and dataset search.

TAG

Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search

1 code implementation25 Jun 2021 Junwen Yang, Yeye He, Surajit Chaudhuri

We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators.

reinforcement-learning Reinforcement Learning (RL)

Demonstration of Panda: A Weakly Supervised Entity Matching System

no code implementations21 Jun 2021 Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He

Panda's IDE includes many novel features purpose-built for EM, such as smart data sampling, a builtin library of EM utility functions, automatically generated LFs, visual debugging of LFs, and finally, an EM-specific labeling model.

Management

Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes

no code implementations10 Apr 2021 Jie Song, Yeye He

Complex data pipelines are increasingly common in diverse applications such as BI reporting and ML modeling.

TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.