Outliers
IQR-based outlier clipping, log-style transformation, and KNN imputation.
featurely.outliers
impute_outliers_with_knn(df, features, n_neighbors=7, threshold=1.5)
Replace IQR outliers with NaN, then impute them with KNN.
Values outside [Q1 - threshold * IQR, Q3 + threshold * IQR] are
treated as missing and reconstructed from the n_neighbors most
similar rows, which preserves multivariate structure better than
clipping when outliers are recording errors rather than real extremes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame; not modified. |
required |
features
|
list[str]
|
Columns to screen for outliers and impute. |
required |
n_neighbors
|
int
|
Number of neighbor rows used by the KNN imputer. |
7
|
threshold
|
float
|
IQR multiplier that defines the outlier fences. |
1.5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
Source code in src/featurely/outliers.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
clip_outliers(df, features, threshold=1.5)
Clip feature values to their IQR fences.
Winsorizes each column to [Q1 - threshold * IQR, Q3 + threshold * IQR].
Clipping keeps every row and caps the influence of extreme values, at the
cost of piling clipped observations onto the fence values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame; not modified. |
required |
features
|
list[str]
|
Columns to clip. |
required |
threshold
|
float
|
IQR multiplier that defines the clip bounds. |
1.5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
Source code in src/featurely/outliers.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
transform_outliers(df, features, threshold=1.5)
Log-transform features that contain IQR outliers and are non-negative.
Applies log1p only to columns where outliers are present and all
values are non-negative, compressing long right tails instead of
discarding or capping them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame; not modified. |
required |
features
|
list[str]
|
Columns to screen and potentially transform. |
required |
threshold
|
float
|
IQR multiplier that defines the outlier fences. |
1.5
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A copy of |
Source code in src/featurely/outliers.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |