The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a recurrent architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure... (read more)

Results in Papers With Code
(↓ scroll down to see all results)