Dataset¶
Container for observed data, missingness masks, and (optionally) complete pre-amputation data used for simulation evaluation.
Dataset
¶
Bases: BaseModel
Container for observed data, missingness mask, and (optionally) the complete pre-amputation data.
Call make(data, y, ...) to populate the dataset from a DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
miss_data |
DataFrame
|
Data with missing values (NaN). |
mask |
ndarray
|
Boolean array, True where data is observed. |
n |
int
|
Number of observations. |
full_data |
DataFrame or None
|
Complete data (simulation only). |
expl_vars |
list[str]
|
Column names included in the analysis model. |
make(data, y, expl_vars=None, _onehot=True)
¶
Populate the dataset from a pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data with missing values as NaN. |
required |
y
|
str
|
Name of the outcome variable (moved to the first column). |
required |
expl_vars
|
list[str]
|
Columns for the analysis model. Defaults to all non-outcome columns. |
None
|
_onehot
|
bool
|
One-hot encode categorical columns (default True). |
True
|
get_predictor_cols_idx()
¶
Indices of predictors used in the test: [Y] + expanded expl vars.
get_target_mask(level='column')
¶
level="column": targets are per-wide-column (OHE columns) level="variable": targets are per-raw-variable (Outcome + expl_vars)
get_target_weights(level='column')
¶
Weights aligned with get_target_mask(level=...). For variable-level: compute weights from variable-level missingness.
compute_kappa(r2_x_z, beta_yx, gamma_x)
¶
Compute theoretical imputation bias kappa for CI testing.
See module-level compute_kappa() for details.