Dataset¶

Container for observed data, missingness masks, and (optionally) complete pre-amputation data used for simulation evaluation.

`Dataset` ¶

Bases: BaseModel

Container for observed data, missingness mask, and (optionally) the complete pre-amputation data.

Call make(data, y, ...) to populate the dataset from a DataFrame.

Attributes:

Name	Type	Description
`miss_data`	`DataFrame`	Data with missing values (NaN).
`mask`	`ndarray`	Boolean array, True where data is observed.
`n`	`int`	Number of observations.
`full_data`	`DataFrame or None`	Complete data (simulation only).
`expl_vars`	`list[str]`	Column names included in the analysis model.

Populate the dataset from a pandas DataFrame.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Data with missing values as NaN.	required
`y`	`str`	Name of the outcome variable (moved to the first column).	required
`expl_vars`	`list[str]`	Columns for the analysis model. Defaults to all non-outcome columns.	`None`
`_onehot`	`bool`	One-hot encode categorical columns (default True).	`True`

Indices of predictors used in the test: [Y] + expanded expl vars.

level="column": targets are per-wide-column (OHE columns) level="variable": targets are per-raw-variable (Outcome + expl_vars)

Weights aligned with get_target_mask(level=...). For variable-level: compute weights from variable-level missingness.

Compute theoretical imputation bias kappa for CI testing.

See module-level compute_kappa() for details.