Feature Engineering Catalog

This page provides detailed descriptions of all 11 feature engineering methods available in EnsembleSet.

String Feature Encoding

1. One-Hot Encoding

Converts categorical string features into binary indicator columns.

Mathematical Description:

For a categorical feature with \(k\) unique categories, one-hot encoding creates \(k\) binary columns where each column represents one category. For a given sample, exactly one column has value 1 (the category present) and all others are 0.

Use Cases:

Nominal categorical features without inherent ordering
Features with low to moderate cardinality
When treating each category as independent is appropriate

Example:

# Input: ['A', 'B', 'A', 'C']
# Output:
#   A  B  C
#   1  0  0
#   0  1  0
#   1  0  0
#   0  0  1

2. Ordinal Encoding

Converts categorical string features into integer codes.

Mathematical Description:

Each unique category is mapped to an integer. For \(k\) unique categories, integers from 0 to \(k-1\) are assigned.

Use Cases:

Ordinal categorical features with inherent ordering
High-cardinality categorical features where one-hot encoding would create too many columns
Tree-based models that can handle encoded categories

Example:

# Input: ['low', 'medium', 'high', 'low']
# Output: [0, 1, 2, 0]

Numerical Feature Engineering

3. Polynomial Features

Generates polynomial and interaction features from existing features.

Mathematical Description:

For degree \(d\), polynomial features include all monomials of degree \(\leq d\). For two features \(x_1\) and \(x_2\) with degree 2:

\[[1, x_1, x_2, x_1^2, x_1 x_2, x_2^2]\]

Use Cases:

Capturing non-linear relationships
Modeling feature interactions
Polynomial regression models

Parameters:

Degree: 2 or 3
Interaction only: Include only cross-products
Include bias: Add constant term

4. Spline Features

Applies spline basis transformations to features.

Mathematical Description:

Spline transformations create piecewise polynomial functions. B-splines of degree \(d\) with \(k\) knots create smooth curves defined by control points.

Use Cases:

Flexible non-linear transformations
Smoother than polynomial features
Capturing complex non-linear patterns

Parameters:

Degree: 2, 3, or 4
Knots: Number and placement (uniform or quantile)
Extrapolation: Behavior outside knot range

5. Logarithmic Features

Applies logarithmic transformations to compress large value ranges.

Mathematical Description:

\[y = \log_b(x)\]

where \(b \in \{2, e, 10\}\)

Use Cases:

Features with exponential distributions or heavy right tails
Reducing the impact of outliers
Making multiplicative relationships additive

Parameters:

Base: 2, e (natural log), or 10

Note: Handles zero and negative values by preprocessing.

6. Ratio Features

Creates ratio features from all pairwise divisions of selected features.

Mathematical Description:

For features \(x_1, x_2, ..., x_n\), creates:

\[r_{ij} = \frac{x_i}{x_j} \quad \forall i \neq j\]

Use Cases:

Capturing relative relationships between features
Normalizing features by reference values
Financial ratios (e.g., price/earnings)

Parameters:

Division by zero value: Replacement value (default: NaN)

7. Exponential Features

Applies exponential transformations to features.

Mathematical Description:

\[y = b^x\]

where \(b \in \{2, e\}\)

Use Cases:

Inverse of logarithmic transformation
Amplifying small differences
Modeling exponential growth

Parameters:

Base: 2 or e (natural exponential)

Note: Handles overflow by preprocessing.

8. Sum Features

Creates features by summing combinations of selected features.

Mathematical Description:

For \(n\) addends, creates sums of all combinations:

\[s = x_{i_1} + x_{i_2} + ... + x_{i_n}\]

where \(n \in \{2, 3, 4\}\)

Use Cases:

Capturing aggregate effects
Total or cumulative values
Additive relationships

Parameters:

Number of addends: 2, 3, or 4

9. Difference Features

Creates features by computing differences of feature combinations.

Mathematical Description:

For \(n\) subtrahends, creates:

\[d = x_{i_1} - x_{i_2} - ... - x_{i_n}\]

where \(n \in \{2, 3, 4\}\)

Use Cases:

Change or delta features
Comparing related measurements
Removing baseline effects

Parameters:

Number of subtrahends: 2, 3, or 4

10. Gaussian KDE Smoothing

Applies Gaussian kernel density estimation to smooth features.

Mathematical Description:

For each feature value \(x\), estimates the probability density:

\[\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)\]

where \(K\) is the Gaussian kernel and \(h\) is the bandwidth.

Use Cases:

Noise reduction
Identifying underlying distributions
Smoothing irregular patterns

Parameters:

Bandwidth: ‘scott’ or ‘silverman’ method
Sample size: Number of samples for KDE calculation

Note: Fitted on training data only, then applied to both train and test.

11. K-Bins Quantization

Discretizes continuous features into bins.

Mathematical Description:

Divides the feature range into \(k\) bins and assigns each value to a bin:

\[y = \text{bin}(x) \in \{0, 1, ..., k-1\}\]

Use Cases:

Converting continuous to categorical features
Reducing sensitivity to small variations
Handling non-linear relationships with linear models

Parameters:

Number of bins: 4, 8, or 16
Strategy: uniform, quantile, or k-means
Encoding: ordinal

Feature Engineering Pipeline

During ensemble generation, these methods are:

Randomly selected - Each dataset gets a unique sequence
Applied in sequence - Methods build on previous transformations
Applied to random subsets - Only a fraction of features are transformed at each step
Fitted on training data - All transformations use training data statistics to prevent leakage
Applied to test data - The same fitted transformations are applied to test data

This randomization strategy creates diverse datasets suitable for training ensemble models while maintaining consistent transformations between training and testing data.