Feature Engineering Catalog

This page provides detailed descriptions of all 11 feature engineering methods available in EnsembleSet.

String Feature Encoding

1. One-Hot Encoding

Converts categorical string features into binary indicator columns.

Mathematical Description:

For a categorical feature with \(k\) unique categories, one-hot encoding creates \(k\) binary columns where each column represents one category. For a given sample, exactly one column has value 1 (the category present) and all others are 0.

Use Cases:

  • Nominal categorical features without inherent ordering

  • Features with low to moderate cardinality

  • When treating each category as independent is appropriate

Example:

# Input: ['A', 'B', 'A', 'C']
# Output:
#   A  B  C
#   1  0  0
#   0  1  0
#   1  0  0
#   0  0  1

2. Ordinal Encoding

Converts categorical string features into integer codes.

Mathematical Description:

Each unique category is mapped to an integer. For \(k\) unique categories, integers from 0 to \(k-1\) are assigned.

Use Cases:

  • Ordinal categorical features with inherent ordering

  • High-cardinality categorical features where one-hot encoding would create too many columns

  • Tree-based models that can handle encoded categories

Example:

# Input: ['low', 'medium', 'high', 'low']
# Output: [0, 1, 2, 0]

Numerical Feature Engineering

3. Polynomial Features

Generates polynomial and interaction features from existing features.

Mathematical Description:

For degree \(d\), polynomial features include all monomials of degree \(\leq d\). For two features \(x_1\) and \(x_2\) with degree 2:

\[[1, x_1, x_2, x_1^2, x_1 x_2, x_2^2]\]

Use Cases:

  • Capturing non-linear relationships

  • Modeling feature interactions

  • Polynomial regression models

Parameters:

  • Degree: 2 or 3

  • Interaction only: Include only cross-products

  • Include bias: Add constant term

4. Spline Features

Applies spline basis transformations to features.

Mathematical Description:

Spline transformations create piecewise polynomial functions. B-splines of degree \(d\) with \(k\) knots create smooth curves defined by control points.

Use Cases:

  • Flexible non-linear transformations

  • Smoother than polynomial features

  • Capturing complex non-linear patterns

Parameters:

  • Degree: 2, 3, or 4

  • Knots: Number and placement (uniform or quantile)

  • Extrapolation: Behavior outside knot range

5. Logarithmic Features

Applies logarithmic transformations to compress large value ranges.

Mathematical Description:

\[y = \log_b(x)\]

where \(b \in \{2, e, 10\}\)

Use Cases:

  • Features with exponential distributions or heavy right tails

  • Reducing the impact of outliers

  • Making multiplicative relationships additive

Parameters:

  • Base: 2, e (natural log), or 10

Note: Handles zero and negative values by preprocessing.

6. Ratio Features

Creates ratio features from all pairwise divisions of selected features.

Mathematical Description:

For features \(x_1, x_2, ..., x_n\), creates:

\[r_{ij} = \frac{x_i}{x_j} \quad \forall i \neq j\]

Use Cases:

  • Capturing relative relationships between features

  • Normalizing features by reference values

  • Financial ratios (e.g., price/earnings)

Parameters:

  • Division by zero value: Replacement value (default: NaN)

7. Exponential Features

Applies exponential transformations to features.

Mathematical Description:

\[y = b^x\]

where \(b \in \{2, e\}\)

Use Cases:

  • Inverse of logarithmic transformation

  • Amplifying small differences

  • Modeling exponential growth

Parameters:

  • Base: 2 or e (natural exponential)

Note: Handles overflow by preprocessing.

8. Sum Features

Creates features by summing combinations of selected features.

Mathematical Description:

For \(n\) addends, creates sums of all combinations:

\[s = x_{i_1} + x_{i_2} + ... + x_{i_n}\]

where \(n \in \{2, 3, 4\}\)

Use Cases:

  • Capturing aggregate effects

  • Total or cumulative values

  • Additive relationships

Parameters:

  • Number of addends: 2, 3, or 4

9. Difference Features

Creates features by computing differences of feature combinations.

Mathematical Description:

For \(n\) subtrahends, creates:

\[d = x_{i_1} - x_{i_2} - ... - x_{i_n}\]

where \(n \in \{2, 3, 4\}\)

Use Cases:

  • Change or delta features

  • Comparing related measurements

  • Removing baseline effects

Parameters:

  • Number of subtrahends: 2, 3, or 4

10. Gaussian KDE Smoothing

Applies Gaussian kernel density estimation to smooth features.

Mathematical Description:

For each feature value \(x\), estimates the probability density:

\[\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)\]

where \(K\) is the Gaussian kernel and \(h\) is the bandwidth.

Use Cases:

  • Noise reduction

  • Identifying underlying distributions

  • Smoothing irregular patterns

Parameters:

  • Bandwidth: ‘scott’ or ‘silverman’ method

  • Sample size: Number of samples for KDE calculation

Note: Fitted on training data only, then applied to both train and test.

11. K-Bins Quantization

Discretizes continuous features into bins.

Mathematical Description:

Divides the feature range into \(k\) bins and assigns each value to a bin:

\[y = \text{bin}(x) \in \{0, 1, ..., k-1\}\]

Use Cases:

  • Converting continuous to categorical features

  • Reducing sensitivity to small variations

  • Handling non-linear relationships with linear models

Parameters:

  • Number of bins: 4, 8, or 16

  • Strategy: uniform, quantile, or k-means

  • Encoding: ordinal

Feature Engineering Pipeline

During ensemble generation, these methods are:

  1. Randomly selected - Each dataset gets a unique sequence

  2. Applied in sequence - Methods build on previous transformations

  3. Applied to random subsets - Only a fraction of features are transformed at each step

  4. Fitted on training data - All transformations use training data statistics to prevent leakage

  5. Applied to test data - The same fitted transformations are applied to test data

This randomization strategy creates diverse datasets suitable for training ensemble models while maintaining consistent transformations between training and testing data.