avatars.processors.ToCategoricalProcessor¶
- class avatars.processors.ToCategoricalProcessor(to_categorical_threshold: int, *, keep_continuous: bool = False, continuous_suffix: str = '__cont', category: str = 'other')¶
Processor to model selected numeric variables as categorical variables.
- Parameters:
to_categorical_threshold – threshold of the number of distinct value to consider a continuous variable as categorical.
- Keyword Arguments:
keep_continuous – if True, continuous variables will be kept and
continuous_suffix. (suffixed with)
continuous_suffix – suffix for the continuous variable created during preprocess.
category – if keep_continuous=True, name of the new category, needed for some specific avatarization cases with the use of group_modalities processor
Examples
With keep_continuous=False it only convert the variable to object. By this you ensure to keep all values during the avatarization.
>>> df = pd.DataFrame( ... { ... "variable_1": [1, 7, 7, 1], ... "variable_2": [1, 2, 7, 1] ... } ... ) >>> processor = ToCategoricalProcessor(to_categorical_threshold = 2) >>> processor.preprocess(df).dtypes variable_1 object variable_2 int64 dtype: object >>> avatar = pd.DataFrame( ... { ... "variable_1": [2, 1, 4, 1], ... "variable_2": [2, 1, 4, 1] ... } ... ) >>> avatar["variable_1"] = avatar["variable_1"].astype('object') >>> avatar.dtypes variable_1 object variable_2 int64 dtype: object >>> processor.postprocess(df, avatar).dtypes variable_1 int64 variable_2 int64 dtype: object
With keep_continuous=True, you duplicate the variable and keep it as continuous. This can be useful for other uses.
>>> df = pd.DataFrame( ... { ... "variable_1": [1, 7, 7, 1], ... "variable_2": [1, 2, 7, 1] ... } ... ) >>> processor = ToCategoricalProcessor(to_categorical_threshold=2, keep_continuous=True) >>> processor.preprocess(df).dtypes variable_1 object variable_2 int64 variable_1__cont int64 dtype: object
- preprocess(df: DataFrame) DataFrame ¶
Transform numeric variables into categorical variables.
- Parameters:
df (dataframe to transform)
- Returns:
DataFrame
- Return type:
transformed dataframe
- postprocess(source: DataFrame, dest: DataFrame) DataFrame ¶
Transform converted categorical variables back to numeric.
- Parameters:
source (reference data frame)
dest (data frame to transform)
- Returns:
DataFrame
- Return type:
transformed data frame