avatars.processors.GroupModalitiesProcessor¶
- class avatars.processors.GroupModalitiesProcessor(*, variable_thresholds: Dict[str, int] | None = None, min_unique: int | None = None, global_threshold: int | None = None, new_category: str = 'other')¶
Processor to group modalities in order to reduce the dataframe dimension.
Use the parameter variables if you want to apply a custom threshold to each variable. Use the parameter min_unique and threshold if you want to apply a generic threshold.
- Keyword Arguments:
variable_thresholds – dictionary of variables and thresholds to apply, see global_threshold below.
min_unique – number of unique modalities by variable needed to be transformed.
global_threshold – limit of the number of individuals in each category to rename it.
new_category – new modality name (default=”other”).
Examples
>>> df = pd.DataFrame( ... { ... "variable_1": ["red", "blue", "blue", "green"], ... "variable_2": ["red", "blue", "blue", "red"], ... "variable_3": ["green", "green", "green", "green"], ... } ... ) >>> df variable_1 variable_2 variable_3 0 red red green 1 blue blue green 2 blue blue green 3 green red green >>> processor = GroupModalitiesProcessor( ... min_unique=2, ... global_threshold=1, ... new_category="other" ... ) >>> processor.preprocess(df) variable_1 variable_2 variable_3 0 other red green 1 blue blue green 2 blue blue green 3 other red green
- preprocess(df: DataFrame) DataFrame ¶
- postprocess(source: DataFrame, dest: DataFrame) DataFrame ¶