In the world of augmented reality and face editing, there has been a surge of interest from both consumers and researchers. People want to create integrated AR experiences and enhance their mobile applications, videos, virtual reality, and gaming with real-time face features and editing functions. But here’s the thing, developing lightweight and high-quality face generation and editing models hasn’t been easy.
Most models rely on generative adversarial network (GAN) techniques, which can be computationally complex and require a large training dataset. And let’s not forget about the importance of responsible AI practices. We need to develop these models responsibly, keeping Google’s AI Principles in mind.
That’s where MediaPipe FaceStylizer comes in. It’s a design that tackles the challenges of complexity and data efficiency while adhering to responsible AI principles. The model consists of a face generator and a face encoder, which uses GAN inversion to map images into a latent code for the generator.
We’ve also created a mobile-friendly synthesis network for the face generator. It converts features to RGB at each level to generate high-quality images. We’ve carefully designed loss functions and combined them with common GAN loss functions to distill the student generator from the teacher StyleGAN model. The result is a lightweight model that maintains high-quality generation.
Now, let’s talk about the pipeline. Our goal is to create a pipeline that allows users to adapt the MediaPipe FaceStylizer to different styles by fine-tuning the model with just a few examples. We’ve built the pipeline with a GAN inversion encoder and an efficient face generator model. Users can send a few style samples to MediaPipe ModelMaker for fine-tuning. The training process involves optimizing a joint adversarial loss function to reconstruct a person’s face in the desired style.
Our generator, BlazeStyleGAN, is based on StyleGAN but with a more efficient synthesis network. It reduces the model complexity while maintaining high visual quality. We’ve also introduced an efficient GAN inversion encoder to support image-to-image stylization.
In terms of performance, BlazeStyleGAN achieves real-time performance on various high-end mobile devices. It can run in less than 10 ms on a high-end phone’s GPU. We’ve also ensured fairness in our model by training it with a diverse dataset of human faces.
To make all of this accessible to users, we’re releasing the MediaPipe FaceStylizer in MediaPipe Solutions. Users can train their own customized face stylization models using MediaPipe Model Maker and deploy them to different platforms using the MediaPipe Tasks FaceStylizer API.
We’re excited to see how users will leverage these tools to create amazing face stylization experiences. It’s a new era of AR and mobile applications, and we’re here to make it accessible and responsible.