avatars.processors.GroupModalitiesProcessor

class avatars.processors.GroupModalitiesProcessor(*, variable_thresholds: Dict[str, int] | None = None, min_unique: int | None = None, global_threshold: int | None = None, new_category: str = 'other')

Processor to group modalities in order to reduce the dataframe dimension.

Use the parameter variables if you want to apply a custom threshold to each variable. Use the parameter min_unique and threshold if you want to apply a generic threshold.

Keyword Arguments:
  • variable_thresholds – dictionary of variables and thresholds to apply, see global_threshold below.

  • min_unique – number of unique modalities by variable needed to be transformed.

  • global_threshold – limit of the number of individuals in each category to rename it.

  • new_category – new modality name (default=”other”).

Examples

>>> df = pd.DataFrame(
...    {
...        "variable_1": ["red", "blue", "blue", "green"],
...        "variable_2": ["red", "blue", "blue", "red"],
...        "variable_3": ["green", "green", "green", "green"],
...    }
... )
>>> df
  variable_1 variable_2 variable_3
0        red        red      green
1       blue       blue      green
2       blue       blue      green
3      green        red      green
>>> processor = GroupModalitiesProcessor(
...     min_unique=2,
...     global_threshold=1,
...     new_category="other"
... )
>>> processor.preprocess(df)
  variable_1 variable_2 variable_3
0      other        red      green
1       blue       blue      green
2       blue       blue      green
3      other        red      green
preprocess(df: DataFrame) DataFrame
postprocess(source: DataFrame, dest: DataFrame) DataFrame