no code implementations • 30 Oct 2023 • Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres, Rafael de Sousa
To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness.
no code implementations • 28 Sep 2023 • Sumit Mukherjee, Bodhisattva Sen, Subhabrata Sen
We study empirical Bayes estimation in high-dimensional linear regression.
no code implementations • 15 Jun 2021 • Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres
Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers.
no code implementations • 9 Jun 2021 • Sumit Mukherjee, Tina Sederholm, Anthony C. Roman, Ria Sankar, Sherrie Caltagirone, Juan Lavista Ferres
Child trafficking in a serious problem around the world.
no code implementations • 25 Apr 2021 • Sumit Mukherjee, Subhabrata Sen
Using the nascent theory of non-linear large deviations (Chatterjee and Dembo, 2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution.
no code implementations • 18 Jan 2021 • Jean-Francois Rajotte, Sumit Mukherjee, Caleb Robinson, Anthony Ortiz, Christopher West, Juan Lavista Ferres, Raymond T Ng
We show that by using the FELICIA mechanism, a data owner with limited image samples can generate high-quality synthetic images with high utility while neither data owners has to provide access to its data.
no code implementations • 10 Dec 2020 • Nabarun Deb, Rajarshi Mukherjee, Sumit Mukherjee, Ming Yuan
In this paper, we study the effect of dependence on detecting a class of signals in Ising models, where the signals are present in a structured way.
Probability Statistics Theory Statistics Theory 62G10, 62G20, 62C20
no code implementations • 11 Sep 2020 • Yixi Xu, Sumit Mukherjee, Xiyang Liu, Shruti Tople, Rahul Dodhia, Juan Lavista Ferres
In this work, we propose the first formal framework for membership privacy estimation in generative models.
1 code implementation • 31 Dec 2019 • Sumit Mukherjee, Yixi Xu, Anusua Trivedi, Juan Lavista Ferres
It has been shown that such synthetic data can be used for a variety of downstream tasks such as training classifiers that would otherwise require the original dataset to be shared.
no code implementations • 4 Oct 2019 • Anusua Trivedi, Sumit Mukherjee, Edmund Tse, Anne Ewing, Juan Lavista Ferres
As a result, we can often end up using data that is not representative of the problem we are trying to solve.