Finding AI-generated (deepfake) faces in the wild

March 21, 2024

Co-authors: Co-authored byGonzalo Aniano Porcile, PhD, Co-authored byJames Verbus, Co-authored byShivansh Mundra, Co-authored byJack G., and Co-authored byHany Farid

At LinkedIn, we put our members first, which includes making sure that they have real and authentic interactions on our platform. To do so, we constantly evaluate the ways that those interactions might be discredited or undermined and then be proactive about safeguarding them. That’s why we use cutting-edge models to find and remove fake accounts and content, including accounts that use AI-generated profile images.

AI-generated images, also known as deepfake images, are becoming more realistic and harder to spot. In an era of rapid technological advancement, distinguishing between accounts using synthetic faces and those using authentic images poses a significant technical challenge at LinkedIn’s scale, which is now over a billion members. In addition to detection, our models also need to maintain an extremely low false positive rate and robustness to different – and rapidly evolving – image generation technologies.

One way we do this is by collaborating with academic experts through the LinkedIn Scholars program. We have worked closely with Professor Hany Farid on new concepts and techniques for AI-generated image detection. In 2022, we co-created the industry’s first large-scale detector for AI-generated images. In addition to our production model, we also developed a simple, novel approach that can detect 99.6% of StyleGAN-generated profile photos and published our first paper on AI image detection, “Exposing GAN-Generated Profile Photos from Compact Embeddings.” A summary of these previous results are available in the blog post “New Approaches For Detecting AI-Generated Profile Photos.”

In the almost one year since we released our first paper, this field of study and our own research remain dynamic and fast paced. We are excited to share our latest research paper titled Finding AI-Generated Faces in the Wild, where we introduce a new concept for a model that can detect AI-generated images produced by a variety of different generative algorithms. This new concept can tell the difference between real profile photos and those generated by different types of AI models, such as adversarial-based StyleGANs, generated.photos and EG3d, and Generative AI-based Stable Diffusion, DALL-E 2, and Midjourney. In this post, we will share some of the highlights of our research and this new concept for deepfake profile photo detection.

Details on Our Research

This new model concept focuses on the specific task of differentiating between real human faces and those generated by AI algorithms. By concentrating on facial images we are able to develop detection methods that are robust across various synthesis engines, resolutions, and image qualities. By training this model on a diverse range of synthesis engines, including GANs and diffusion models, we aim to avoid overfitting to specific artifacts while maintaining a focus on detecting AI-generated faces accurately.

Datasets

Our training and evaluation leverages 18 data sets consisting of 120,000 real LinkedIn profile photos and 105,900 AI-generated faces spanning five different GAN (StyleGAN versions 1, 2, and 3, EG3D, and generated.photos), and five different diffusion synthesis engines (Stable Diffusion versions 1, 2, and xl, DALL-E2, and Midjourney). Figure 1 shows a sample of the AI-generated photos for the different algorithms.

Image of representative examples of AI-generated images used in our training and evaluation — Figure 1. Representative examples of AI-generated images used in our training and evaluation. Some synthesis engines were used to generate faces only and others were used to synthesize both faces and non-faces

The dataset is partitioned into training and evaluation subsets for model assessment. The training dataset includes 30,000 real faces and 30,000 AI-generated faces, drawn from various synthesis engines, with varying proportions from each engine. Subsequently, the model is evaluated against distinct sets of images, including images from synthesis engines used in training, and images from engines not included in training. This structured evaluation framework ensures rigorous testing across a diverse range of image sources and types, facilitating a robust evaluation of the model's performance and robustness.

Type	model	category	number
real	-	faces	120,000
GAN	generated.photos	faces	10,000
GAN	StyleGAN 1	faces	10,000
GAN	StyleGAN 2	faces	10,000
GAN	StyleGAN 3	faces	10,000
GAN	EG3D	faces	10,000
GAN	StyleGAN 1	bedrooms	5,000
GAN	StyleGAN 1	cars	5,000
GAN	StyleGAN 1	cars	5,000
Diffusion	DALL-E 2	faces	9,000
Diffusion	Midjourney	faces	9,000
Diffusion	Stable Diffusion 1	faces	9,000
Diffusion	Stable Diffusion 2	faces	9,000
Diffusion	Stable Diffusion xl	faces	900
Diffusion	DALL-E 2	random	1,000
Diffusion	Midjourney	random	1,000
Diffusion	Stable Diffusion 1	random	1,000
Diffusion	Stable Diffusion 2	random	1,000

Table 1. A breakdown of the number of real and AI-generated images used in our training and evaluation

Model Concept

We trained several different deep-learning models. Our modeling pipeline comprises three stages: image preprocessing, image embedding, and scoring.

At the outset, the input image undergoes preprocessing, where it is resized to a standardized resolution of 512×512 pixels, and normalized. Subsequently, the normalized color image is fed into an EfficientNet-B1 convolutional neural network (CNN) transfer learning layer. With its 7.8 million internal parameters pre-trained on the ImageNet1K image dataset, EfficientNet-B1 offers superior performance compared to other contemporary architectures such as Swin-T, Resnet50, and XceptionNet. For scoring we stack 2 fully-connected layers of 2048 width, a dropout layer, and a fully connected layer for producing the image score.

Results

We evaluate our models using established metrics, including the true positive rate (TPR), which represents the proportion of correctly identified deepfake faces to the total number of deepfake faces in the dataset (also known as recall), and the false positive rate (FPR), which signifies the ratio of real faces incorrectly classified as deepfakes (false positives) to the total number of real faces. Our evaluation covers both in-engine (i.e., images from the AI-generation algorithm being validated were used in training the model) and out-of-engine (i.e., no images from the AI-generation algorithm being validated were used in training the model) images, ensuring a comprehensive and robust assessment across a diverse array of synthesis engines.

Condition	Images	TPR (Recall)
Training	Faces	100%
Evaluation (In-engine)	Faces	98%
Evaluation (out-of-engine)	Faces	84%
Evaluation (in/out-engine	Non-faces	0%

Table 2. Baseline training and evaluation true positive rate (TPR) averaged across all synthesis engines for the threshold for which FPR = 0.5%. This high threshold forces the detector to have minimal negative impact on real members

For each trained model, we establish the detection threshold to achieve a conservative FPR of 0.5%. Our primary model effectively classifies AI-generated faces during both training and evaluation, achieving an outstanding recall of 98% across various synthesis engines (StyleGAN 1, 2, 3; Stable Diffusion 1, 2; and DALL-E 2) utilized in the training process. However, when assessing faces generated by synthesis engines not utilized in training (out-of-engine), the TPR decreases to 84.5%, indicating satisfactory but not flawless out-of-domain generalization across the diverse synthesis engines not included in training. Nevertheless, this constraint can likely be alleviated by integrating these out-of-engine images into the initial training phase, resulting in a TPR increase back to 98% for all engines.

An intriguing finding is that only the AI-generated faces trigger detections; for instance, our model categorizes AI-generated images without faces as "real." This phenomenon likely arises because some of our real images contain non-faces, whereas all AI-generated images used in training include faces. This characteristic enables users to alter the image background using AI tools without impacting our classification as long as the real face remains unchanged. Moreover, this outcome implies that our classifier has identified a particular property unique to AI-generated faces rather than some minor artifact from the synthesis process (such as a noise fingerprint).

Another crucial aspect we examined is the model's resilience to JPEG compression artifacts, which are frequently encountered in real-world image-sharing platforms, and image resolution modifications. Our systematic experiments illustrate the model's capacity to sustain a high TPR for detecting AI-generated faces even when subjected to significant compression, or resizing underscoring its robustness and practical utility.

Image of chart of Recall (TPR) for classifying an AI-generated face — Figure 2. Recall (TPR) for correctly classifying an AI-generated face (with a fixed FPR of 0.5%) as a function of: (a) resolution where the model is trained on 512 × 512 images and evaluated against different resolution (solid blue) and trained and evaluated on a single resolution N ×N (dashed red); and (b) JPEG quality where the model is trained on uncompressed images and a range of JPEG compressed images and evaluated on JPEG qualities between 20 (lowest) to 100 (highest).

The model's performance on horizontally-flipped faces remains consistent with original images. However, for vertically inverted faces, the TPR decreased by 20 percentage points, from 98.0% to 77.7%. This pair of results, along with the model's resilience to changes in resolution and compression quality, suggests that our model hasn't fixated on a minor detail but rather may have identified a structural or semantic characteristic that sets apart AI-generated faces from real ones.

To further understand and visualize what the model is learning, we use a method called integrated gradients, which is a measure of how important each pixel in the image is to the model's decision. Across all cases, the most relevant pixels, denoted by larger gradients, are primarily concentrated around facial regions and other areas of skin.

Image of the strength of these gradients averaged over 100 StyleGan 2 images — Strength of these gradients averaged over 100 StyleGAN 2 images (top row, a), which have consistent facial features due to alignment, and subsequent rows show an example images and their gradients for DALL-E 2 (b), Midjourney (c), Stable Diffusion 1 (d), and Stable Diffusion 2 (e)

Please see the full paper for a complete quantitative discussion of our results.

Conclusions

Our collaboration with academic experts has culminated in the development of a cutting-edge AI-generated photo detector. Our work stands out from prior research in the field by focusing specifically on synthetic faces, a departure from approaches that predominantly utilized generic synthetic images. We can validate our technique using a vast sample of real LinkedIn member profile photos, providing insights into its real-world performance. Our model exhibits superior detection capabilities for faces created by both GAN- and diffusion-based generators, achieving state-of-the-art performance. This pioneering research continues to provide LinkedIn the ability to enhance its best-in-class automated anti-abuse defenses, improving how we detect and eliminate fake accounts before they can pose a threat to our members.

Acknowledgements

This work is the product of a collaboration between Professor Hany Farid and the Trust Data team at LinkedIn. We thank the LinkedIn Scholars program for enabling this collaboration. We also thank Ya Xu, Daniel Olmedilla, Jenelle Bray, Dinesh Palanivelu, Milinda Lakkam, Kim Capps-Tanaka, Shaunak Chatterjee, Vidit Jain, Ting Chen, Vipin Gupta, and Natesh Pillai for their support of this work, Matyas Bohacek for helping us generate some AI-generated datasets. We appreciate Katherine Vaiente and Jon Adams for their inputs, and Siddharth Dangi, Rohit Pitke and Smitkumar Narotambhai Marvaniya for their valuable technical input while reviewing the paper. We are grateful to David Luebke, Margaret Albrecht, Edwin Nieda, Koki Nagano, George Chellapa, Burak Yoldemir, and Ankit Patel at NVIDIA for facilitating our work by making the StyleGAN generation software, trained models and synthesized images publicly available, and for their valuable suggestions, and DARPA's Semantic Forensics program for providing valuable inspiration, proof-of-concept examples, and benchmarks for comparison.

Topics: Generative AI AI Machine Learning