The Implicit Bias of Heterogeneity towards Invariance and Causality

3 Mar 2024  ·  Yang Xu, Yihong Gu, Cong Fang ·

It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that ``association is not causation'' and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the design of methods. It is a mystery why causality, in a higher layer of understanding, can emerge from the regression task that pursues associations. In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the heterogeneity of the source data, stochasticity of training algorithms, and over-parameterization of the learning models. We illustrate such an intuition using a simple but insightful model that learns invariance, a quasi-causality, using regression loss. To be specific, we consider multi-environment low-rank matrix sensing problems where the unknown r-rank ground-truth d*d matrices diverge across the environments but contain a lower-rank invariant, causal part. In this case, running pooled gradient descent will result in biased solutions that only learn associations in general. We show that running large-batch Stochastic Gradient Descent, whose each batch being linear measurement samples randomly selected from a certain environment, can successfully drive the solution towards the invariant, causal solution under certain conditions. This step is related to the relatively strong heterogeneity of the environments, the large step size and noises in the optimization algorithm, and the over-parameterization of the model. In summary, we unveil another implicit bias that is a result of the symbiosis between the heterogeneity of data and modern algorithms, which is, to the best of our knowledge, first in the literature.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods