Search Results for author: Surajit Chaudhuri

Found 10 papers, 3 papers with code

Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

1 code implementation19 Apr 2024 Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface.

2k Contrastive Learning +1

Table-GPT: Table-tuned GPT for Diverse Table Tasks

no code implementations13 Oct 2023 Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri

Language models, such as GPT-3. 5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks.

Probing Language Models

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

1 code implementation27 Jul 2023 Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases.

Attribute

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

no code implementations4 Jun 2023 Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.

Auto-Tag: Tagging-Data-By-Example in Data Lakes

no code implementations11 Dec 2021 Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Anil, Blake Lassiter, Yaron Goland, Gaurav Malhotra

As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (e. g., files and databases) in data lakes with additional metadata (e. g., semantic column-types), as the inferred metadata can enable a range of downstream applications like data governance (e. g., GDPR compliance), and dataset search.

TAG

Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search

1 code implementation25 Jun 2021 Junwen Yang, Yeye He, Surajit Chaudhuri

We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators.

reinforcement-learning Reinforcement Learning (RL)

ABC: Efficient Selection of Machine Learning Configuration on Large Dataset

no code implementations8 Nov 2018 Silu Huang, Chi Wang, Bolin Ding, Surajit Chaudhuri

A machine learning configuration refers to a combination of preprocessor, learner, and hyperparameters.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.