Aggregate features
Quantile bin summary statistic features.
featurely.aggregate
Bin-level aggregate features.
Binning a driver feature and attaching per-bin summary statistics gives each row context about the group it belongs to, for example the mean of one feature across all rows in the same quantile bin of another. Only feature columns are aggregated, so the candidates carry no target information and cannot leak.
compute_bin_aggregates(df, bin_feature, agg_features, n_bins=10, stats=('mean',))
Return per-bin summary statistic candidates for one binned feature.
Rows are assigned to quantile bins of bin_feature (equal-count bins,
so sparse regions do not produce empty groups), then each row receives
the bin-level statistic of every column in agg_features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame; not modified. |
required |
bin_feature
|
str
|
Column whose quantile bins define the groups. |
required |
agg_features
|
list[str]
|
Columns to summarize within each bin. |
required |
n_bins
|
int
|
Number of quantile bins. |
10
|
stats
|
tuple[str, ...]
|
Summary statistics to compute, by pandas name, e.g.
|
('mean',)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A frame of candidate columns named |
DataFrame
|
|
Source code in src/featurely/aggregate.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |