EnsembleSet Documentationο
EnsembleSet generates dataset ensembles by applying a randomized sequence of feature engineering methods to a randomized subset of input features.
Version: 1.0-alpha.23
Overviewο
EnsembleSet is a Python package designed for generating ensemble datasets through randomized feature engineering. Itβs particularly useful for training ensemble machine learning models on tabular data prediction and modeling projects.
Key features:
Generates multiple dataset variations from a single input dataset
Applies 11 different feature engineering techniques in random sequences
Supports both training and testing datasets with minimal data leakage
Outputs ensembles to HDF5 format for efficient storage
Uses multiprocessing for parallel dataset generation
User Guide