Temporal Distribution Matching Explained

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**Temporal Distribution Matching**, or **TDM**,  is a module used in the [AdaRNN](https://paperswithcode.com/method/adarnn) architecture to match the distributions of the discovered periods to build a time series prediction model $\mathcal{M}$ Given the learned time periods, the TDM module is designed to learn the common knowledge shared by different periods via matching their distributions. Thus, the learned model $\mathcal{M}$ is expected to generalize well on unseen test data compared with the methods which only rely on local or statistical information.

Within the context of AdaRNN, Temporal Distribution Matching aims to adaptively match the distributions between the [RNN](https://paperswithcode.com/methods/category/recurrent-neural-networks) cells of two periods while capturing the temporal dependencies. TDM introduces the importance vector $\mathbf{\alpha} \in \mathbb{R}^{\hat{V}}$ to learn the relative importance of $V$ hidden states inside the RNN, where all the hidden states are weighted with a normalized $\alpha$. Note that for each pair of periods, there is an $\mathbf{\alpha}$, and we omit the subscript if there is no confusion. In this way, we can dynamically reduce the distribution divergence of cross-periods.

Given a period-pair $\left(\mathcal{D}\_{i}, \mathcal{D}\_{j}\right)$, the loss of temporal distribution matching is formulated as:

$$
\mathcal{L}\_{t d m}\left(\mathcal{D}\_{i}, \mathcal{D}\_{j} ; \theta\right)=\sum_{t=1}^{V} \alpha\_{i, j}^{t} d\left(\mathbf{h}\_{i}^{t}, \mathbf{h}\_{j}^{t} ; \theta\right)
$$

where $\alpha\_{i, j}^{t}$ denotes the distribution importance between the periods $\mathcal{D}\_{i}$ and $\mathcal{D}\_{j}$ at state $t$.

All the hidden states of the RNN can be easily computed by following the standard RNN computation. Denote by $\delta(\cdot)$ the computation of a next hidden state based on a previous state. The state computation can be formulated as

$$
\mathbf{h}\_{i}^{t}=\delta\left(\mathbf{x}\_{i}^{t}, \mathbf{h}\_{i}^{t-1}\right)
$$

The final objective of temporal distribution matching (one RNN layer) is:

$$
\mathcal{L}(\theta, \mathbf{\alpha})=\mathcal{L}\_{\text {pred }}(\theta)+\lambda \frac{2}{K(K-1)} \sum\_{i, j}^{i \neq j} \mathcal{L}\_{t d m}\left(\mathcal{D}\_{i}, \mathcal{D}\_{j} ; \theta, \mathbf{\alpha}\right)
$$

where $\lambda$ is a trade-off hyper-parameter. Note that in the second term, we compute the average of the distribution distances of all pairwise periods. For computation, we take a mini-batch of $\mathcal{D}_{i}$ and $\mathcal{D}\_{j}$ to perform forward operation in RNN layers and concatenate all hidden features. Then, we can perform TDM using the above equation.

Code Snippet URL (optional):

Image

Currently: methods/ce6113ea-40be-4566-a5dc-a402faba0b48.png Clear
Change:

Attached collections:

TIME SERIES MODULES

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Activity Recognition	1	33.33%
Human Activity Recognition	1	33.33%
Time Series Analysis	1	33.33%

Temporal Distribution Matching

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove