Software

Creating Deep Fakes

Published

on

How much AI has developed and how much it can do is simply astonishing – I mean truly, the mind boggles. One of these astonishing developments, if you choose to look at it that way, has been deep fakes. Have you ever wondered how in the hell AI synthetic media can create a face so perfect that it will make you believe that it is an actual person – all the while the person doesn’t even exist? 

If your answer is yes, consider this article as the answer to your wonder. Let me first touch on the what before we get into the how:

What Are Deep Fakes?

A deep fake is a media file typically representing a human subject and that has been altered and doctored using deep neural networks to alter a person’s identity – pardon all the technicality. 

The most notorious application of deep fakes has been face swaps where the identity of the source subject is transferred onto a destination subject.

How Deep Fakes Are Made

Gathering of Source and Destination Video

A minimum 4K video of about several minutes is required as well as the destination footage. Ideally, both videos should demonstrate similar ranges of facial expressions, eye movements, and head turns. One final note is that identities of source and destination should already bear some resemblance in terms of head and face shape as well as facial hair patterns and skin tone.

Failure to meet these conditions and even the best AI facial reenactment cannot remedy this. The swapping process will show these differences as visual artifacts and even significant post-processing may be unable to remove these artifacts.

Extraction

In this step, each video is broken down into frames and within each frame, around 30 facial landmarks are identified to serve as landmarks of some sorts. These landmarks serve as anchor points for the model to learn the location of facial features.

Training

Here is where the magic of video facial reenactment happens. Each set of aligned faces is input into the network, a general schematic of an encoder decoder network for training and conversion. Batches of aligned and masked input faces are both fed into the same network.

The output of the encoder network will be a representation of all the input faces in a lower dimensional vector space. This output is then separately passed through decoder networks for both networks to attempt to recreate each of the faces separately. 

The generated faces are compared to the original faces, loss function calculated, backpropagation occurs and the weights for the decoder and encoder networks are updated. This algorithm is repeated until the desired number of epochs is achieved – you can see just how complex AI synthetic media is.

Conversion

This step is where the deepfake is created using the process in the training step. Following that pseudocode, the system will attempt to generate the second face swapped identities with the identity of A. In this step, there is no training involved and conversion is a one-way pass of a set of input faces through the encoder-decoder network.

The output of this step is a set of frames that need to be redirected to other software for video facial reenactment so the frames are converted to video.

Post Processing

This last step requires lots of skill and time. The video is polished up in this step to ensure the rendered video looks as real as possible. Minor artifacts from AI facial reenactment can be removed but it’s harder to remove larger differences. 

You can leverage the deep fake software framework built in compositing for post processing although the results will be less desirable.

Each open-source tool in the market has a number of settings and neural-network with some similarities between tools. Differences between tools is mainly because of the differences in the neural-network architecture.

Trending

Exit mobile version