Skip to content

Classifiers

Per-column probabilistic classifiers used to predict missingness indicators. Each classifier wraps a scikit-learn estimator and fits a separate model per target column.

Class Estimator Notes
RFClassifier (default) Random forest Auto-tunes max_features and min_samples_leaf
ETClassifier Extra trees Faster training; more variance
LogisticClassifier Logistic regression Assumes linear relationships

RFClassifier

RFClassifier(n_estimators=100, max_features='auto', min_samples_leaf='auto', class_weight='balanced', n_features=None, target_n_jobs=1, n_jobs=None, random_state=None, **kwargs)

Bases: ProbClassifier

sklearn.ensemble.RandomForestClassifier_ wrapper for CI testing.

Uses piecewise max_features heuristics based on n_features. Set min_samples_leaf='auto' for adaptive leaf sizing.

.. _sklearn.ensemble.RandomForestClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

ETClassifier

ETClassifier(n_estimators=100, max_features='auto', min_samples_leaf='auto', class_weight='balanced', n_features=None, target_n_jobs=1, n_jobs=None, random_state=None, **kwargs)

Bases: ProbClassifier

sklearn.ensemble.ExtraTreesClassifier_ wrapper for CI testing.

Uses piecewise max_features heuristics based on n_features. Set min_samples_leaf='auto' for adaptive leaf sizing.

.. _sklearn.ensemble.ExtraTreesClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html

LogisticClassifier

LogisticClassifier(penalty='l2', C=1000000.0, solver='lbfgs', max_iter=5000, random_state=None, n_features=None, target_n_jobs=1, **kwargs)

Bases: ProbClassifier

sklearn.linear_model.LogisticRegression_ wrapper for CI testing.

.. _sklearn.linear_model.LogisticRegression: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html