1 code implementation • 19 Jun 2023 • Wenqi Jiang, Shigang Li, Yu Zhu, Johannes De Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso
Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents.
no code implementations • 28 May 2023 • Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias, Nezihe Merve Gürel, Theodoros Rekatsinas
Methods for carefully selecting or generating a small set of training data to learn from, i. e., data pruning, coreset selection, and data distillation, have been shown to be effective in reducing the ever-increasing cost of training neural networks.
no code implementations • 16 May 2023 • Ihab F. Ilyas, JP Lacerda, Yunyao Li, Umar Farooq Minhas, Ali Mousavi, Jeffrey Pound, Theodoros Rekatsinas, Chiraag Sumanth
We then describe how our platform, including graph embeddings, can be leveraged to create a Semantic Annotation service that links unstructured Web documents to entities in our KG.
no code implementations • 4 Apr 2023 • Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, Theodoros Rekatsinas
Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors.
no code implementations • 15 Apr 2022 • Ihab F. Ilyas, Theodoros Rekatsinas, Vishnu Konda, Jeffrey Pound, Xiaoguang Qi, Mohamed Soliman
We introduce Saga, a next-generation knowledge construction and serving platform for powering knowledge-based applications at industrial scale.
1 code implementation • 4 Feb 2022 • Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman
We study training of Graph Neural Networks (GNNs) for large-scale graphs.
1 code implementation • 2 Jun 2021 • Sahaana Suri, Ihab F. Ilyas, Christopher Ré, Theodoros Rekatsinas
Context enrichment, or rebuilding fragmented context, using keyless joins is an implicit or explicit step in machine learning (ML) pipelines over structured data sources.
1 code implementation • 20 Jan 2021 • Jason Mohoney, Roger Waleffe, Yiheng Xu, Theodoros Rekatsinas, Shivaram Venkataraman
We propose a new framework for computing the embeddings of large-scale graphs on a single machine.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Ankur Goswami, Akshata Bhat, Hadar Ohana, Theodoros Rekatsinas
We show that state-of-the-art self-supervised language models can be readily used to extract relations from a corpus without the need to train a fine-tuned extractive head.
no code implementations • 23 Jun 2020 • Roger Waleffe, Theodoros Rekatsinas
Recent works show that overparameterized networks contain small subnetworks that exhibit comparable accuracy to the full model when trained in isolation.
no code implementations • 18 Jun 2020 • Alireza Heidari, George Michalopoulos, Shrinu Kushagra, Ihab F. Ilyas, Theodoros Rekatsinas
We use this feature vector alongwith the ground-truth information to learn a classifier for each of the attributes of the database.
1 code implementation • 8 Jun 2020 • Zifan Liu, Zhechun Zhou, Theodoros Rekatsinas
Picket is designed as a plugin that can increase the robustness of any machine learning pipeline.
no code implementations • 6 Jun 2020 • Deepan Das, Haley Massa, Abhimanyu Kulkarni, Theodoros Rekatsinas
Generalization Performance of Deep Learning models trained using Empirical Risk Minimization can be improved significantly by using Data Augmentation strategies such as simple transformations, or using Mixed Samples.
no code implementations • 10 Feb 2020 • Zifan Liu, Jongho Park, Theodoros Rekatsinas, Christos Tzamos
We study the problem of robust mean estimation and introduce a novel Hamming distance-based measure of distribution shift for coordinate-level corruptions.
no code implementations • 28 Oct 2019 • Ankur Goswami, Joshua McGrath, Shanan Peters, Theodoros Rekatsinas
We also present a region embedding model that uses the convolutional maps of a proposal's neighbors as context to produce an embedding for each proposal.
no code implementations • 29 Jun 2019 • Alireza Heidari, Ihab F. Ilyas, Theodoros Rekatsinas
We study the problem of recovering the latent ground truth labeling of a structured instance with categorical random variables in the presence of noisy observations.
no code implementations • ICML 2020 • Amrita Roy Chowdhury, Theodoros Rekatsinas, Somesh Jha
Our solution optimizes for the utility of inference queries over the DGM and \textit{adds noise that is customized to the properties of the private input dataset and the graph structure of the DGM}.
no code implementations • 4 May 2019 • Zhihan Guo, Theodoros Rekatsinas
We show that discovering FDs from a noisy dataset is equivalent to learning the structure of a graphical model over binary random variables, where each random variable corresponds to a functional of the dataset attributes.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • ICLR Workshop LLD 2019 • Zhihan Guo, Theodoros Rekatsinas
We study the problem of functional dependency (FD) discovery to impose domain knowledge for downstream data preparation tasks.
1 code implementation • SIGMOD: International Conference on Management of Data 2018 • Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra
Entity matching (EM) finds data instances that refer to the same real-world entity.
Ranked #8 on Entity Resolution on Amazon-Google
1 code implementation • 15 Mar 2017 • Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré
We focus on knowledge base construction (KBC) from richly formatted data.
Databases
no code implementations • 7 Mar 2013 • Ben London, Theodoros Rekatsinas, Bert Huang, Lise Getoor
For the typical cases of real-valued functions and binary relations, we propose several loss functions and derive the associated parameter gradients.