Search Results for author: Surajit Chaudhuri

Found 10 papers, 3 papers with code

Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

1 code implementation • 19 Apr 2024 • Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface.

2k Contrastive Learning +1

Paper
Code

Table-GPT: Table-tuned GPT for Diverse Table Tasks

no code implementations • 13 Oct 2023 • Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri

Language models, such as GPT-3. 5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks.

Probing Language Models

Paper
Add Code

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

1 code implementation • 27 Jul 2023 • Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases.

Attribute

Paper
Code

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

no code implementations • 4 Jun 2023 • Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.

Paper
Add Code

Auto-Tag: Tagging-Data-By-Example in Data Lakes

no code implementations • 11 Dec 2021 • Yeye He, Jie Song, Yue Wang, Surajit Chaudhuri, Vishal Anil, Blake Lassiter, Yaron Goland, Gaurav Malhotra

As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (e. g., files and databases) in data lakes with additional metadata (e. g., semantic column-types), as the inferred metadata can enable a range of downstream applications like data governance (e. g., GDPR compliance), and dataset search.

TAG

Paper
Add Code

Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search

1 code implementation • 25 Jun 2021 • Junwen Yang, Yeye He, Surajit Chaudhuri

We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data pipelines with both string transformations and table-manipulation operators.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

TableQnA: Answering List Intent Queries With Web Tables

no code implementations • 10 Jan 2020 • Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao, Surajit Chaudhuri

For (ii), we develop novel features to compute structure-aware match and train a machine learning model.

BIG-bench Machine Learning Word Embeddings

Paper
Add Code

Petabytes to Science

no code implementations • 13 May 2019 • Amanda E. Bauer, Eric C. Bellm, Adam S. Bolton, Surajit Chaudhuri, A. J. Connolly, Kelle L. Cruz, Vandana Desai, Alex Drlica-Wagner, Frossie Economou, Niall Gaffney, J. Kavelaars, J. Kinney, Ting S. Li, B. Lundgren, R. Margutti, G. Narayan, B. Nord, Dara J. Norman, W. O'Mullane, S. Padhi, J. E. G. Peek, C. Schafer, Megan E. Schwamb, Arfon M. Smith, Erik J. Tollerud, Anne-Marie Weijmans, Alexander S. Szalay

A Kavli foundation sponsored workshop on the theme \emph{Petabytes to Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas.

Instrumentation and Methods for Astrophysics

Paper
Add Code

Selectivity Estimation for Range Predicates using Lightweight Models

no code implementations • Proceedings of the VLDB Endowment 2019 • Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, Surajit Chaudhuri

Query optimizers depend on selectivity estimates of query predicates to produce a good execution plan.

Feature Engineering regression

Paper
Add Code

ABC: Efficient Selection of Machine Learning Configuration on Large Dataset

no code implementations • 8 Nov 2018 • Silu Huang, Chi Wang, Bolin Ding, Surajit Chaudhuri

A machine learning configuration refers to a combination of preprocessor, learner, and hyperparameters.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.