no code implementations • ICLR 2020 • Hyeong-Seok Choi, Changdae Park, Kyogu Lee
We analyze the extent to which the network can naturally disentangle two latent factors that contribute to the generation of a face image - one that comes directly from a speech signal and the other that is not related to it - and explore whether the network can learn to generate natural human face image distribution by modeling these factors.