EnsembleSet Documentation

Publish to PyPI Publish to TestPyPI PR Validation pages-build-deployment Documentation

EnsembleSet generates dataset ensembles by applying a randomized sequence of feature engineering methods to a randomized subset of input features.

Version: 1.0-alpha.23

Overview

EnsembleSet is a Python package designed for generating ensemble datasets through randomized feature engineering. It’s particularly useful for training ensemble machine learning models on tabular data prediction and modeling projects.

Key features:

  • Generates multiple dataset variations from a single input dataset

  • Applies 11 different feature engineering techniques in random sequences

  • Supports both training and testing datasets with minimal data leakage

  • Outputs ensembles to HDF5 format for efficient storage

  • Uses multiprocessing for parallel dataset generation

Indices and tables