Engineering4 min read

What Are Advanced Feature Engineering Techniques Like PCA and LDA?

You’re staring at a dataset with dozens of features—some critical, some redundant, some pure chaos. Your goal? Cut through the noise, simplify the data, and make your model perform. This is where PCA and LDA step in. PCA summarizes the data; LDA separates the classes. Both reduce dimensionality, but their purpose and approach are entirely distinct.

Tega Adeyemi
Tega Adeyemi
What Are Advanced Feature Engineering Techniques Like PCA and LDA?

You’re handed a dataset with dozens of features — some useful, some redundant, and some pure noise. Your task? Find what matters, simplify the data, and get your model to shine. Overwhelming? Not when you’ve got tools like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).

Both PCA and LDA are dimensionality reduction techniques, but they work differently. PCA focuses on summarizing the data, while LDA prioritizes separating classes. In this article, we’ll explore both techniques using a single dataset and compare them side by side for better intuition.

Introducing the Dataset

We’ll use a simple dataset to keep things relatable. Imagine you’re working with a flower classification problem:

                                                                                                                               
FeatureDescription
Sepal LengthLength of the flower’s sepal (cm).
Sepal WidthWidth of the flower’s sepal (cm).
Petal LengthLength of the flower’s petal (cm).
Petal WidthWidth of the flower’s petal (cm).
SpeciesThe flower’s type (Setosa, Versicolor, or Virginica).

Here’s a sample of what the data looks like:

                                                                                                                                                               
Sepal LengthSepal WidthPetal LengthPetal WidthSpecies
5.13.51.40.2Setosa
6.02.24.01.0Versicolor
6.33.36.02.5Virginica

Goal:

Principal Component Analysis (PCA): Simplifying Data

PCA is like organizing your closet by finding the most popular colors and arranging everything along those shades. It identifies the directions (called principal components) where the data varies the most, reducing the dimensionality while preserving as much variability as possible.

How PCA Works (Step-by-Step)

1. Standardize the Data:

Since PCA is influenced by scale, all features are standardized (e.g., z-score transformation).

2. Find Principal Components:

PCA identifies directions in the data where the variance is maximized. Each principal component (PC) is a linear combination of the original features.

3. Rank Components by Variance:

The first principal component explains the most variance, the second explains the next most, and so on.

4. Transform the Data:

Project the dataset onto the top components, reducing its dimensionality.

PCA Example: Simplifying the Flower Dataset

Using PCA, let’s reduce the 4 original features (sepal length, sepal width, petal length, petal width) into 2 principal components.

                                               
Original FeaturesPrincipal Components
Sepal Length, Sepal Width...PC1, PC2
                                                                                       
PC1PC2
2.920.15
-2.680.40
3.00-1.05

Here:

So, by keeping just 2 components, we’ve reduced the dataset’s dimensionality from 4 to 2 while retaining 95% of the variance!

What PCA Tells Us

PCA focuses solely on summarizing data. While it helps simplify datasets for tasks like clustering or visualization, it doesn’t care about the flower species (the target variable). If you plotted PC1 vs. PC2, you might see some clusters, but PCA doesn’t guarantee they’ll align with the species.

Linear Discriminant Analysis (LDA): Separating Classes

LDA, on the other hand, is like organizing your closet by finding colors that separate work clothes from casual wear. It creates a new feature space where the classes (e.g., Setosa, Versicolor, Virginica) are as distinct as possible.

How LDA Works (Step-by-Step)

1. Compute Class Means:

Calculate the mean of each feature for each class.

2. Maximize Between-Class Variance:

LDA maximizes the distance between the class means to separate them clearly.

3. Minimize Within-Class Variance:

Simultaneously, it minimizes the spread of data within each class.

4. Transform the Data:

The original dataset is projected onto the directions (linear discriminants) that maximize class separability.

LDA Example: Separating the Flower Dataset

Using LDA, let’s reduce the 4 features into 2 linear discriminants (LD1, LD2).

                                               
Original FeaturesLinear Discriminants
Sepal Length, Sepal Width...LD1, LD2
                                                                                       
LD1LD2
4.200.18
-3.50-0.12
5.100.30

Here:

When plotted, flowers from different species form distinct clusters. This makes LDA especially powerful for classification tasks.

What LDA Tells Us

Unlike PCA, LDA uses the target variable (species) to guide the feature transformation. It ensures that the new features (LD1, LD2) make it easier for your model to classify the flowers correctly.

PCA vs. LDA: Key Differences

Here’s how PCA and LDA compare when applied to the same flower dataset:

                                                                                                                                         
AspectPCALDA
FocusMaximizes variance in the data.Maximizes class separability.
Type of LearningUnsupervised (ignores target variable).Supervised (uses target variable).
OutputPrincipal components (uncorrelated).Linear discriminants (class-focused).
Use CaseSimplifying datasets, clustering.Classification tasks.

Visualizing the Difference

If you plot the results of PCA and LDA side by side:

When to Use Which?

Conclusion: PCA and LDA Are Your Power Tools

PCA and LDA are like two sides of the same coin. While PCA simplifies data by focusing on variance, LDA transforms it to highlight class distinctions. By understanding when and how to use these techniques, you can turn messy datasets into structured, insightful inputs for your models.

Tega AdeyemiDecember 6, 2024