What Are Advanced Feature Engineering Techniques Like PCA and LDA?

You’re staring at a dataset with dozens of features—some critical, some redundant, some pure chaos. Your goal? Cut through the noise, simplify the data, and make your model perform. This is where PCA and LDA step in. PCA summarizes the data; LDA separates the classes. Both reduce dimensionality, but their purpose and approach are entirely distinct.

You’re handed a dataset with dozens of features — some useful, some redundant, and some pure noise. Your task? Find what matters, simplify the data, and get your model to shine. Overwhelming? Not when you’ve got tools like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).

Both PCA and LDA are dimensionality reduction techniques, but they work differently. PCA focuses on summarizing the data, while LDA prioritizes separating classes. In this article, we’ll explore both techniques using a single dataset and compare them side by side for better intuition.

Introducing the Dataset

We’ll use a simple dataset to keep things relatable. Imagine you’re working with a flower classification problem:

Feature	Description
Sepal Length	Length of the flower’s sepal (cm).
Sepal Width	Width of the flower’s sepal (cm).
Petal Length	Length of the flower’s petal (cm).
Petal Width	Width of the flower’s petal (cm).
Species	The flower’s type (Setosa, Versicolor, or Virginica).

Here’s a sample of what the data looks like:

Sepal Length	Sepal Width	Petal Length	Petal Width	Species
5.1	3.5	1.4	0.2	Setosa
6.0	2.2	4.0	1.0	Versicolor
6.3	3.3	6.0	2.5	Virginica

Goal:

Use PCA to simplify the dataset by reducing the number of features while retaining variance.
Use LDA to create new features that maximize class separability (e.g., better distinguish Setosa, Versicolor, and Virginica).

Principal Component Analysis (PCA): Simplifying Data

PCA is like organizing your closet by finding the most popular colors and arranging everything along those shades. It identifies the directions (called principal components) where the data varies the most, reducing the dimensionality while preserving as much variability as possible.

How PCA Works (Step-by-Step)

1. Standardize the Data:

Since PCA is influenced by scale, all features are standardized (e.g., z-score transformation).

2. Find Principal Components:

PCA identifies directions in the data where the variance is maximized. Each principal component (PC) is a linear combination of the original features.

3. Rank Components by Variance:

The first principal component explains the most variance, the second explains the next most, and so on.

4. Transform the Data:

Project the dataset onto the top components, reducing its dimensionality.

PCA Example: Simplifying the Flower Dataset

Using PCA, let’s reduce the 4 original features (sepal length, sepal width, petal length, petal width) into 2 principal components.

Original Features	Principal Components
Sepal Length, Sepal Width...	PC1, PC2

PC1	PC2
2.92	0.15
-2.68	0.40
3.00	-1.05

Here:

PC1 captures 90% of the variance in the dataset.
PC2 adds another 5%.

So, by keeping just 2 components, we’ve reduced the dataset’s dimensionality from 4 to 2 while retaining 95% of the variance!

What PCA Tells Us

PCA focuses solely on summarizing data. While it helps simplify datasets for tasks like clustering or visualization, it doesn’t care about the flower species (the target variable). If you plotted PC1 vs. PC2, you might see some clusters, but PCA doesn’t guarantee they’ll align with the species.

Linear Discriminant Analysis (LDA): Separating Classes

LDA, on the other hand, is like organizing your closet by finding colors that separate work clothes from casual wear. It creates a new feature space where the classes (e.g., Setosa, Versicolor, Virginica) are as distinct as possible.

How LDA Works (Step-by-Step)

1. Compute Class Means:

Calculate the mean of each feature for each class.

2. Maximize Between-Class Variance:

LDA maximizes the distance between the class means to separate them clearly.

3. Minimize Within-Class Variance:

Simultaneously, it minimizes the spread of data within each class.

4. Transform the Data:

The original dataset is projected onto the directions (linear discriminants) that maximize class separability.

LDA Example: Separating the Flower Dataset

Using LDA, let’s reduce the 4 features into 2 linear discriminants (LD1, LD2).

Original Features	Linear Discriminants
Sepal Length, Sepal Width...	LD1, LD2

LD1	LD2
4.20	0.18
-3.50	-0.12
5.10	0.30

Here:

LD1 captures the separation between Setosa and the other two species.
LD2 captures the separation between Versicolor and Virginica.

When plotted, flowers from different species form distinct clusters. This makes LDA especially powerful for classification tasks.

What LDA Tells Us

Unlike PCA, LDA uses the target variable (species) to guide the feature transformation. It ensures that the new features (LD1, LD2) make it easier for your model to classify the flowers correctly.

PCA vs. LDA: Key Differences

Here’s how PCA and LDA compare when applied to the same flower dataset:

Aspect	PCA	LDA
Focus	Maximizes variance in the data.	Maximizes class separability.
Type of Learning	Unsupervised (ignores target variable).	Supervised (uses target variable).
Output	Principal components (uncorrelated).	Linear discriminants (class-focused).
Use Case	Simplifying datasets, clustering.	Classification tasks.

Visualizing the Difference

If you plot the results of PCA and LDA side by side:

PCA: The clusters (species) may overlap because it’s only concerned with variance.
LDA: The clusters are more distinct because it’s explicitly designed for class separation.

When to Use Which?

Use PCA when:
- You have no target variable.
- You’re exploring data or performing clustering.
- You want to visualize high-dimensional data.
Use LDA when:
- You’re working on a classification task.
- You need to reduce dimensions while maintaining class separability.

Conclusion: PCA and LDA Are Your Power Tools

PCA and LDA are like two sides of the same coin. While PCA simplifies data by focusing on variance, LDA transforms it to highlight class distinctions. By understanding when and how to use these techniques, you can turn messy datasets into structured, insightful inputs for your models.

‍

Cohorte Team

December 6, 2024