What is deepfake?
Deepfake refers to manipulated media where a person's voice, face or body is altered with someone else's using Artificial Intelligence (AI) and Machine Learning. This technology can create highly convincing fake videos and audio recordings used in the entertainment industry — but has also raised significant concerns about misinformation and privacy breaches.
How do deepfake images and videos get created?
To create a deepfake video, AI neural networks like Convolutional Neural Networks (CNNs) are trained on a varied dataset of videos and images. Here’s a simplified breakdown of the process. 👇
- Data Collection - a large dataset of images or videos of the target person is collected to train the neural network.
- Training the CNN - the collected data is fed into a CNN, which analyses audio and visuals. CNN learns to recognise and replicate patterns in facial features and enhance the clarity of the target person from training or source data.
- Autoencoder and decoder - the encoder reduces the input images into a simpler numerical representation, and the decoder reconstructs images from this reduced space.
- Swapping - this allows the system to superimpose the facial expressions and other features of the source onto the target while maintaining the target's distinct facial characteristics.
- Refinement - the output often undergoes further refinement to improve realism, such as adjusting lighting, smoothing edges and ensuring that facial expressions are naturally aligned with the body.
How does deepfake audio get created?
Here's a step-by-step explanation of how deepfake audio, often referred to as voice cloning is created. 👇
Data collection - audio recordings of the target speaker are gathered, which hold various speaking styles, tones and emotions for a well-rounded clone.
Preprocessing - this audio data is cleaned to remove background noise and normalise volume levels and enhance clarity.
Feature extraction - key features like spectrograms, pitch and rhythm are extracted from the audio to help the model understand the unique characteristics of the target speaker's voice.
Model training - a deep learning model, a variant of a generative model like a Generative Adversarial Network (GAN) or a sequence-to-sequence model is trained on the extracted features.
Voice synthesis - post-training, the model can generate new audio by synthesising the learned features.
Post-processing - the generated audio is fine-tuned to improve its quality and naturalness. This may involve additional processing steps to smooth out any inconsistencies in the synthesised voice.
Evaluation - the quality and authenticity of the generated audio is assessed by measuring the similarity between the generated deepfake audio and the target speaker's voice.
Iteration - based on the evaluation, this process may involve collecting more data, refining the model or adjusting the training parameters.
How do you spot a deepfake?
Here are some signs and ways you can use to find manipulated media...
- Unnatural facial features - inconsistencies in facial features like blurred edges around the lips, eyes or difference in skin tones.
- Inconsistent lighting and shadow - deepfakes often struggle with accurately replicating lighting and shadows with the surroundings.
- Poor eye movement and lip-sync - unnatural and uncoordinated lip sync to the spoken words and eye movements.
- Audio irregularities - discrepancies in the voice that might not match the person’s usual tone, rhythm or accent.
- Contextual clues - if the source and context seems out of character or unusual for the person featured in the video, it might be a deepfake.
What are the potential impacts of deepfake?
Deepfake technology, while innovative, carries a mix of potential positives and negatives that impact various aspects of society.
Positive impact of deepfakes
Let’s briefly understand the positives of deepfake media…
- Entertainment and media - deepfakes can revolutionise the entertainment industry by allowing filmmakers and content creators to produce more creative and visually engaging content. For example, they can bring historical figures to life in movies.
- Education - it can create interactive and engaging learning experiences. For example, historical speeches could be recreated, allowing students to witness events as they happened.
- Personalisation - in marketing, deepfakes can offer personalised experiences to customers, such as virtual try-ons or advertisements featuring the customer themselves, enhancing customer satisfaction.
- Corporate training and conferences - it can be used to create personalised training videos where a CEO or industry expert appears to be speaking directly to employees, making the training more engaging.
Negative impact of deepfakes
The advent of deepfake technology challenges AI ethics. This is because it enables the creation of convincingly false media, which compromises trust and violates privacy.
Let’s dive deep into understanding the potential negatives of deepfake media…
- Misinformation - it holds the potential to create convincing yet entirely false content, which can be used to spread misinformation or propaganda, influencing public opinion.
- Privacy violations - Deepfakes can be made from images and videos of individuals without their consent, leading to privacy breaches and even potential harm to those individuals’ reputations.
- Fraud - the ability to mimic someone’s voice and appearance can lead to fraud and scams by tricking people into believing they’re interacting with trusted individuals or entities.
- Psychological impact - deepfakes can cause distress and confusion among the public, as distinguishing between real and fake content becomes increasingly difficult, eroding trust.
- Legal and ethical issues - they pose new challenges for laws and regulations related to AI ethics, particularly concerning privacy, consent, intellectual property rights and defamation.
Get a free app prototype now!
Bring your software to life in under 10 mins. Zero commitments.