Skip to content

Diagnostics

Variance inflation factor calculation.

featurely.diagnostics

compute_vif(df, cols)

Compute variance inflation factors for the selected columns.

VIF measures how much a coefficient's variance is inflated by collinearity with the other columns; values above roughly 10 signal problematic redundancy. Columns whose VIF cannot be computed are reported as infinity.

Parameters:

Name Type Description Default
df DataFrame

Input frame. NaNs are filled with 0 before computation.

required
cols list[str]

Columns to evaluate.

required

Returns:

Type Description
DataFrame

A frame with feature and VIF columns, sorted descending by VIF.

Source code in src/featurely/diagnostics.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def compute_vif(df: pd.DataFrame, cols: list[str]) -> pd.DataFrame:
    """Compute variance inflation factors for the selected columns.

    VIF measures how much a coefficient's variance is inflated by collinearity
    with the other columns; values above roughly 10 signal problematic
    redundancy. Columns whose VIF cannot be computed are reported as infinity.

    Args:
        df: Input frame. NaNs are filled with 0 before computation.
        cols: Columns to evaluate.

    Returns:
        A frame with ``feature`` and ``VIF`` columns, sorted descending by VIF.
    """

    x = df[cols].fillna(0).values.astype(float)
    vifs: list[float] = []

    for i in range(len(cols)):
        try:
            value = variance_inflation_factor(x, i)

        except Exception:
            value = np.inf

        vifs.append(value)

    return pd.DataFrame({"feature": cols, "VIF": vifs}).sort_values("VIF", ascending=False).reset_index(drop=True)