avatars.processors.ToCategoricalProcessor¶
- class avatars.processors.ToCategoricalProcessor(to_categorical_threshold: int, *, keep_continuous: bool = False, continuous_suffix: str = '__cont', category: str = 'other')[source]¶
Processor to model selected numeric variables as categorical variables.
- Parameters:
to_categorical_threshold – threshold of the number of distinct value to consider a continuous variable as categorical.
- Keyword Arguments:
keep_continuous – if True, continuous variables will be kept and
continuous_suffix. (suffixed with)
continuous_suffix – suffix for the continuous variable created during preprocess.
category – if keep_continuous=True, name of the new category, needed for some specific avatarization cases with the use of group_modalities processor
Examples
With keep_continuous=False it only convert the variable to object. By this you ensure to keep all values during the avatarization.
>>> df = pd.DataFrame( ... { ... "variable_1": [1, 7, 7, 1], ... "variable_2": [1, 2, 7, 1] ... } ... ) >>> processor = ToCategoricalProcessor(to_categorical_threshold = 2) >>> processor.preprocess(df).dtypes variable_1 object variable_2 int64 dtype: object >>> avatar = pd.DataFrame( ... { ... "variable_1": [2, 1, 4, 1], ... "variable_2": [2, 1, 4, 1] ... } ... ) >>> avatar["variable_1"] = avatar["variable_1"].astype('object') >>> avatar.dtypes variable_1 object variable_2 int64 dtype: object >>> processor.postprocess(df, avatar).dtypes variable_1 int64 variable_2 int64 dtype: object
With keep_continuous=True, you duplicate the variable and keep it as continuous. This can be useful for other uses.
>>> df = pd.DataFrame( ... { ... "variable_1": [1, 7, 7, 1], ... "variable_2": [1, 2, 7, 1] ... } ... ) >>> processor = ToCategoricalProcessor(to_categorical_threshold=2, keep_continuous=True) >>> processor.preprocess(df).dtypes variable_1 object variable_2 int64 variable_1__cont int64 dtype: object