Protecting World Leader Using Facial Speaking Pattern Against Deepfakes

Face forgery instances involving celebrities are on the rise, owing to the ease with which their large quantity of videos may be accessible on the Internet, world leaders particularly. While current face manipulation detectors have achieved impressive results on several open datasets, which incorporate persons with various identities, they show performance degradation on these high-quality ones targeting at celebrities. What is more, these online videos usually undergo compression processing, marking the detection task harder. Besides, more face manipulation techniques arise for celebrities other than face-swap, such as lip-synchronize and image-animation, with which most works have not been concerned. This paper proposes a dual stream learning facial and speaking patterns method to protect celebrities against deepfakes. We design an action unit module based on facial action coding system along with an Action Unit Transformer (AUT) to exploit facial expressions embeddings. Besides, our method's dual stream architecture utilizes a Temporal Convolutional Network (TCN) to extract lip motion pattern and learns the relatedness between facial and speaking patterns. Our method could protect the person of interest (POI) against deepfakes in an end-to-end manner. Extensive experiments show that our method achieves better performance and has a higher resistance to video compression than state-of-the-art detection models.

PDF
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods