Repairing Audio Artifacts via Independent Component Analysis (ICA)
Explore ICA in theory and practice for enhancing YouTube audio
By Kuriko IWAI

Table of Contents
IntroductionWhat is Independent Component Analysis (ICA)Introduction
Repairing audio artifacts is a common task in audio engineering and sound design, essential for cleaning up recordings and ensuring high-quality sound.
But the process is challenging due to the complex, overlapping nature of unwanted sounds, making Independent Component Analysis (ICA) a handy tool for separation.
In this article, I'll walk you through how ICA works with a practical example of repairing audio artifacts from YouTube audio.
What is Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is a computational method for separating a multivariate signal into statistically independent components (ICs).
The below diagram simplifies the core concept of ICA:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure A. Core concept of ICA (Created by Kuriko IWAI)
ICA separates a combined signal from Generators A to C into ICs based on the core assumptions:
The observed signal (input) is a linear combination of statistically independent components (ICs),
Generators and recoding sites are stationary throughout the recording,
The propagation delays (the traveling time of a signal from its source to the destination) are negligible, and
The probability distributions of the individual ICs are non-Gaussian.
◼ Competitive Advantages and Use Cases
Traditional separation methods rely on weaker statistical assumptions.
For example:
Filtering relies on frequency and timing of ICs, making it ineffective when ICs overlap spectrally or temporally,
Principal Component Analysis (PCA) relies on separation by maximizing variance, a statistically weak criterion, and
Simple subtraction requires a known reference pattern of the noise to be removed.
ICA can achieve blind source separation without any prior knowledge of the source waveforms, making it efficient in isolating a specific unwanted source linearly mixed with the desired signal.
Its typical use cases include:
EEG/MEG artifact removal: Separate non-brain signals like eye blinks (EEG) or muscle activity (EMG) from neurological recordings to isolate pure brain data.
“Cocktail party problem”: Separate individual voices from a single audio recording where multiple sources are mixed like a crowded room.
Image feature learning: Extract statistically independent features from images, such as local, oriented edge detectors.
Financial factor modeling: Decompose observed stock price movements into a distinct set of economic risk factors.
How ICA Works - Mathematical Formulation
We’ve learned that ICA can decompose a linear combination of non-Gaussian ICs.
Mathematically, the fundamental goal of ICA is to find an unmixing matrix (W) that transforms the input (x, observed data) into a set of IC activations (a).
These IC activations secure maximum statistical independence among each other.
Let us see each element.
◼ The Observed Data
The observed (input) data represents as a mixed channel data x in a 2-D matrix such that:
where:
N: The number of the channels, individual sensors used to record a signal from a specific spatial location, and
T: The number of the samples, a discrete measurement of the signal amplitude taken at a specific instant in time.
In case of audio data, the raw continuous signal is converted into a sequence of discrete samples during the digitization process, creating the digital representation x:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure B. Processing audio data (Created by Kuriko IWAI)
For example, 1-second stereo audio clip sampled at 16kHz has 2 × 16k dimensions:
N = 2 corresponds to the left and right channels of stereo audio, and
T = 16k corresponds to the total number of samples collected in one second with the sampling rate of 16kHz.
In Figure B, the Analog-to-Digital Converter handles sampling with a sampling rate of 14 hertz (Hz) as showed in the yellow graph.
In reality, typical sampling rates are:
192,000 Hz (192 kHz): High-resolution audio.
44,100 Hz (44.1 kHz): CD-quality audio.
16,000 Hz (16 kHz): For training speech models.
8,000 Hz (8 kHz): Human speech.
So, the total number of samples T depends on this sampling rate and the duration of the audio clip such that:
If the audio clip lasts 2 seconds with the sampling rate of 16kHz, the total samples T = 32k, hence the dimension of x becomes 2 × 32k.
◼ Core ICA Model: The Unmixing Matrix
This unmixing process is defined by the linear transformation of the input x:
where:
W: The unmixing matrix with N x N dimensions (N: The number of components extracted from the N channels),
x: The observed data (input), and
a: IC activation matrix such that:
where the columns represent T samples, and the rows represent N ICs.
A specific component in the IC activation matrix - a_{i,j} (activation of component i at time j) is calculated as a linear sum of weighted channel activations such that:
where:
W_{i, k} represents i-th channel (row) of the unmixing matrix W, and
x_{k, j} represents j-th sample (column) of the input x.
◼ Back-Projection
Since ICA is a decomposition technique extracting N components from N channels, it can reconstruct the original data by reversing the process:
where:
x: The original data (reconstructed),
W^{-1}: The inverse of the unmixing matrix W, whose columns contain the relative weights and polarities for the back-projection, and
a: IC activation matrix.
◼ Computational Goal of ICA: Maximizing Independence
The computational goal of ICA is to find the optimal unmixing matrix W that can maximize statistical independence among ICs.
Because ICA assumes these independent components are non-Gaussians, it attempts to secure the independence by maximizing non-Gaussianity measured by kurtosis or negentropy of the components.
Kurtosis measures how different a distribution is from a Gaussian distribution (kurtosis = 0) such that:
where:
y: A zero-mean random variable, and
E[y^k]: The expected value of the k-th moment of the random variable y about its mean.
ICA algorithms seek the large magnitude of kurtosis |kurt(y)| which indicates y is absolutely non-Gaussian.
Let us see a quick example of a normalized (E[y²] = 1) random variable y.
The Gaussian baseline: E[y^4] = 3 → kurt(y) = 3 - 3 = 0
IC A: E[y^4] = 15 → kurt(y)= 15 - 3 = 12 \> 0 (Super-Gaussian)
IC B: E[y^4] = 1 → kurt(y)= 1 - 3 = -2 < 0 (Sub-Gaussian)
IC A is called super-Gaussian whose kurtosis value is positive, while IC B is called sub-Gaussian whose kurtosis value is negative.
The ICA algorithm would favor IC A as it represents a strong, non-Gaussian independence with the large kurtosis magnitude of +12.
◼ Various ICA Algorithms and Optimization Process
There are various ICA algorithms to address multiple aspects of the ICA problem.
Each algorithm implements its unique optimization process to find the optimal unmixing matrix W.
For example:
FastICA iteratively applies a fixed-point algorithm to an initial random guess of W until convergence.
InfomaxICA leverages stochastic gradient ascent, and
JADE (Joint Approximate Diagonalization of Eigenmatrices) applies closed-form calculation and spectral decomposition, instead of iteration.
These algorithms have the same goal of finding the optimal W, but their approaches are different.
ICA in Action: Audio Artifact Removal
In this section, I’ll demonstrate how ICA works for audio artifact removal by applying FastICA algorithm to audio from a randomly selected YouTube content.
◼ Step 1. Fetching Audio from YouTube Content
The first step is to define the fetch_youtube_audio_for_ica function to prepare stereo audio from a YouTube URL using yt-dlp.
The function returns the digital representation of the audio in Numpy’s ndarray with the N x T shape.
1import os
2import numpy as np
3import librosa
4import subprocess
5
6def fetch_youtube_audio_for_ica(
7 youtube_url: str,
8 output_path: str,
9 sample_rate: int = 16000,
10 n_channels: int = 2
11 ) -> tuple[bool, np.ndarray | None]:
12 # ensure the output file is removed before starting
13 if os.path.exists(output_path): os.remove(output_path)
14
15 try:
16 # define the yt-dlp command to download and convert to wav
17 command = [
18 'yt-dlp',
19 '--extract-audio',
20 '--audio-format', 'wav',
21 '--postprocessor-args', f'-ar {sample_rate} -ac {n_channels}',
22 '-o', output_path,
23 youtube_url
24 ]
25
26 # verify the output path
27 if not os.path.exists(output_path):
28 os.makedirs(os.path.dirname(output_path), exist_ok=True)
29
30 # execute the command
31 result = subprocess.run(command, capture_output=True, text=True)
32
33 if result.returncode != 0:
34 print(f"❌ yt-dlp failed (return code {result.returncode})\n--- yt-dlp stderr output: ---\n{result.stderr}")
35 return False, None
36 else:
37 print(f"✅ downloaded successfully. audio saved to {output_path}")
38
39 # load the downloaded wav file to get the data for plotting and ica preparation
40 y_loaded, sr = librosa.load(output_path, sr=sample_rate, mono=False)
41 return True, y_loaded
42
43 except FileNotFoundError:
44 print("❌ 'yt-dlp' command not found. cannot proceed with external download.")
45 return False, None
46
47 except Exception as e:
48 print(f"❌ error during audio fetch: {e}")
49 return False, None
50
◼ Step 2. Instantiating FastICA
Next, I’ll define the run_fastica function to instantitae and fit the FastICA instance from the Scikit-learn library.
1import os
2import librosa
3import soundfile as sf
4import numpy as np
5from sklearn.decomposition import FastICA
6
7def run_fastica(
8 input_audio_path: str,
9 output_dir: str,
10 n_components: int = 2,
11 y_mixed = None
12 ):
13 os.makedirs(output_dir, exist_ok=True)
14
15 try:
16 # load the audio file
17 y_loaded, sr = librosa.load(input_audio_path, sr=None, mono=False)
18
19 # check if the audio is stereo (n_channels = 2)
20 if y_loaded.ndim == 1:
21 print(f"⚠️ audio loaded as mono (shape: {y_loaded.shape}). duplicating channel to form stereo input for ica.")
22 y_loaded = np.stack([y_loaded, y_loaded], axis=0)
23
24 if y_loaded.ndim != 2 or y_loaded.shape[0] != 2:
25 print(f"❌ downloaded file is not stereo. found shape: {y_loaded.shape}")
26 return
27
28 # transpose y_loaded (shape: n_channels, n_samples) as fastica expects (n_samples, n_channels) shape
29 X = y_loaded.T
30
31 # instantiate fastica
32 ica = FastICA(n_components=n_components, random_state=42, max_iter=500)
33
34 # fit fastica
35 X_separated = ica.fit_transform(X)
36
37 # separate and save the signals
38 y_sources = list()
39 n_sources = X_separated.shape[1]
40 for i in range(0, n_sources):
41 y_source = X_separated[:, i]
42 y_sources.append(y_source)
43 output_path = os.path.join(output_dir, f'sep_{i}.wav')
44 sf.write(output_path, y_source, sr, format='WAV')
45 print(f"✅ saved separated source {i+1} to {output_path}")
46
◼ Step 3. Execution
Lastly, execute the functions:
1if __name__ == "__main__":
2 import os
3
4 # args
5 DEFAULT_YOUTUBE_URL = "https://www.youtube.com/watch?v=kPa7bsKwL-c"
6 DEFAULT_SAMPLE_RATE = 16000
7 DEFAULT_N_CHANNELS = 2
8 DEFAULT_N_COMPONENTS = 2
9
10 # filepaths
11 TEMP_ORIGINAL_AUDIO_PATH = os.path.join(project_root, 'audio_ica', "original.wav")
12 SOURCE_1_OUTPUT = os.path.join(project_root, 'audio_ica', "sep_1.wav")
13 SOURCE_2_OUTPUT = os.path.join(project_root, 'audio_ica', "sep_2.wav")
14
15 os.makedirs(os.path.dirname(TEMP_ORIGINAL_AUDIO_PATH), exist_ok=True)
16
17 # fetch audio
18 success, y_mixed = fetch_youtube_audio_for_ica(
19 youtube_url=DEFAULT_YOUTUBE_URL,
20 output_path=TEMP_ORIGINAL_AUDIO_PATH,
21 sample_rate=DEFAULT_SAMPLE_RATE,
22 n_channels=DEFAULT_N_CHANNELS
23 )
24
25 # perform fastica separation on the downloaded audio
26 if success:
27 run_fastica(
28 input_audio_path=TEMP_ORIGINAL_AUDIO_PATH,
29 output_dir=os.path.join(project_root, 'audio_ica'),
30 n_components=DEFAULT_N_COMPONENTS,
31 y_mixed=y_mixed,
32 )
33
◼ Results
I ran the scripts on two distinct YouTube contents: music video and TED speech.
1) Music Video
librosa’s loaded sound (N=2, T=16k):

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C-1. Loaded original signals (music video)

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C-2. ICs signals (music video)
These two components are statistically independent of each other and correspond to two distinct sounds.
2) TED Speech
librosa’s loaded sound (N=2, T=16k):

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure D-1. Loaded original signals (TED speech)

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure D-2. ICs signals (TED speech)
Figures C-2 and D-2 have significantly different peak amplitudes due to the scale ambiguity and whitening process inherent in ICA.
The input x inherits underlying true source signals:
where:
x: The input,
A: The mixing matrix, an unknown way that the true sources S were combined, and
S: The true source signals.
The scale ambiguity in ICA comes from the infinite factoring scenarios of A and S.
For example:
x = A * S → IC activation a = S
x = ½ A * 2 S → IC activation a = 2S
x = 3A * 1/3 S → IC activation a = 1/3 S
and so on.
The resulting IC activation ignores the scaling factor to the true source signals S.
ICA algorithms also perform initial whitening to standardize the data to have unit variance.
This normalization step further ensures that the final ICs will have normalized amplitudes rather than their original physical scale.
These two reasons make the absolute peak amplitudes in Figures C-2 and D-2 not trustworthy for measuring relative volume.
However, the shape and temporal patterns of the separated ICs are trustworthy, as these features are preserved by ICA.
Wrapping Up
Independent Component Analysis (ICA) is a competitive tool for source separation, providing a powerful means to untangle complex audio mixtures.
In the demonstration, we observed how ICA works both mathematically by maximizing the statistical independence of components, and in a practical example of removing artifacts from a mixed audio track.
However, ICA has some inherent challenges like requiring more observed signals than sources.
To successfully separate N independent sources, we ideally need N or more mixed signals like recordings from multiple microphones.
When working with a single audio channel, its effectiveness is limited and requires more advanced techniques like Short-Time ICA or Non-Negative Matrix Factorization (NMF).
Moving forward, it is expected that deep learning methods like deep clustering or permutation-invariant training will continue to complement or even replace traditional ICA for further refining artifact removal, especially in single-channel or highly complex audio scenarios.
Continue Your Learning
If you enjoyed this blog, these related entries will complete the picture:
Mastering the Bias-Variance Trade-Off: An Empirical Study of VC Dimension and Generalization Bounds
Achieving Accuracy
Dimensionality Reduction Unveiled: LLM Fine-tuning and Mechanics of SVD and PCA
Related Books for Further Understanding
These books cover the wide range of theories and practices; from fundamentals to PhD level.

Linear Algebra Done Right

Foundations of Machine Learning, second edition (Adaptive Computation and Machine Learning series)

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
Share What You Learned
Kuriko IWAI, "Repairing Audio Artifacts via Independent Component Analysis (ICA)" in Kernel Labs
https://kuriko-iwai.com/repair-audio-artifacts-with-ica
Looking for Solutions?
- Deploying ML Systems 👉 Book a briefing session
- Hiring an ML Engineer 👉 Drop an email
- Learn by Doing 👉 Enroll AI Engineering Masterclass
Written by Kuriko IWAI. All images, unless otherwise noted, are by the author. All experimentations on this blog utilize synthetic or licensed data.

