Skip to content

Examples

1. Simulated data -- null is true (MAR)

When conditional independence holds, the test should produce a non-significant p-value.

from citest import CIMissTest
from citest.data import MAR1

# ci=True means conditional independence holds
dataset = MAR1(n=1000, ci=True)

test = CIMissTest(
    dataset,
    m=10,
    n_folds=10,
    classifier_args={"n_estimators": 20, "target_n_jobs": 8},
)
test.run()
test.summary()

Expected output: a large p-value (e.g. > 0.05), indicating no evidence against conditional independence.

2. Simulated data -- alternative is true (MNAR)

When the outcome influences missingness, the test should reject.

from citest.data import MAR1

# ci=False means missingness depends on Y
dataset = MAR1(n=1000, ci=False)

test = CIMissTest(
    dataset,
    m=10,
    n_folds=10,
    classifier_args={"n_estimators": 20, "target_n_jobs": 8},
)
test.run()
test.summary()

Expected output: a small p-value (e.g. < 0.05), indicating evidence against conditional independence.

3. Real data -- UCI Adult

Test conditional independence on the Adult income dataset, with missingness imposed on education columns.

from citest.data import adult

dataset = adult(n=1000, ci=True, mcar_prop=0.5)

test = CIMissTest(
    dataset,
    m=10,
    n_folds=10,
    classifier_args={"n_estimators": 20, "target_n_jobs": 8},
)
test.run()
test.summary()

The adult DGP downloads the UCI Adult dataset and applies controlled MAR missingness. Set ci=False to impose outcome-dependent missingness instead.

4. Custom imputer and classifier

Swap in a different imputer and classifier:

from citest import CIMissTest
from citest.data import MAR1
from citest.imputer import IterativeImputer
from citest.classifier import LogisticClassifier

dataset = MAR1(n=500, ci=False)

test = CIMissTest(
    dataset,
    imputer=IterativeImputer,
    classifier=LogisticClassifier,
    m=10,
    n_folds=10,
    imputer_args={"max_iter": 20},
)
test.run()
test.summary()

The IterativeImputer is faster than the default MIDAS imputer and works well for moderate-sized numeric data. LogisticClassifier assumes a linear relationship between features and missingness.

5. Kappa calibration

Use the kappa diagnostic to assess potential imputation bias:

from citest import kappa_calibration_table, print_calibration_pivot

# Generate a full calibration table
table = kappa_calibration_table()
print(table.head(10))

# View a pivot for a fixed beta_yx
pivot = print_calibration_pivot(beta_yx=0.3)
print(pivot)

The pivot table shows kappa values with R-squared as rows and gamma as columns, making it easy to assess whether imputation bias is a concern for your data.

You can also compute kappa for specific parameter values:

from citest import compute_kappa

kappa = compute_kappa(r2_x_z=0.5, beta_yx=0.3, gamma_x=0.2)
print(f"kappa = {kappa:.4f}")

Small absolute values of kappa (e.g. < 0.05) suggest that imputation bias is unlikely to affect the test result.