This task deals with lip-syncing a video (or) an image to the desired target speech. Approaches in this task work only for a specific (limited set) of identities, languages, speech/voice. See also: Unconstrained lip-synchronization - https://paperswithcode.com/task/lip-sync
The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8. 97% equal error rate on high quality Deepfakes.
Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip.