Skip to content

Dataset

Container for observed data, missingness masks, and (optionally) complete pre-amputation data used for simulation evaluation.

Dataset

Bases: BaseModel

Container for observed data, missingness mask, and (optionally) the complete pre-amputation data.

Call make(data, y, ...) to populate the dataset from a DataFrame.

Attributes:

Name Type Description
miss_data DataFrame

Data with missing values (NaN).

mask ndarray

Boolean array, True where data is observed.

n int

Number of observations.

full_data DataFrame or None

Complete data (simulation only).

expl_vars list[str]

Column names included in the analysis model.

make(data, y, expl_vars=None, _onehot=True)

Populate the dataset from a pandas DataFrame.

Parameters:

Name Type Description Default
data DataFrame

Data with missing values as NaN.

required
y str

Name of the outcome variable (moved to the first column).

required
expl_vars list[str]

Columns for the analysis model. Defaults to all non-outcome columns.

None
_onehot bool

One-hot encode categorical columns (default True).

True

get_predictor_cols_idx()

Indices of predictors used in the test: [Y] + expanded expl vars.

get_target_mask(level='column')

level="column": targets are per-wide-column (OHE columns) level="variable": targets are per-raw-variable (Outcome + expl_vars)

get_target_weights(level='column')

Weights aligned with get_target_mask(level=...). For variable-level: compute weights from variable-level missingness.

compute_kappa(r2_x_z, beta_yx, gamma_x)

Compute theoretical imputation bias kappa for CI testing.

See module-level compute_kappa() for details.