avatars.processors.ProportionProcessor

class avatars.processors.ProportionProcessor(variable_names: List[str], reference: str, *, sum_to_one: bool = True, decimal_count: int = 1)

Processor to express numeric variables as a proportion of another variable.

By this transformation, we keep the addition and subtraction relations such as variable_1 = variable_2 + variable_3.

Parameters:
  • variable_names – variables to transform

  • reference – the variable of reference

Keyword Arguments:
  • sum_to_one – set to True to ensure the sum of the variables sum to 1 once transformed. default: True

  • decimal_count – the number of decimals postprocessed variables should have

Examples

>>> df =  pd.DataFrame(
...        {
...            "variable_1": [100, 10],
...            "variable_2": [10, 10],
...            "variable_3": [90, 30],
...        }
...    )
>>> processor = ProportionProcessor(
...    variable_names=["variable_2", "variable_3"],
...    reference="variable_1",
... )
>>> processor.preprocess(df=df)
   variable_1  variable_2  variable_3
0         100        0.10        0.90
1          10        0.25        0.75

This processor allows you to transform some variable as a proportion of another variable. By default, the processor enforces the proportion of variable_names to be equal to 1.

>>> avatar = pd.DataFrame(
...        {
...            "variable_1": [60, 15],
...            "variable_2": [0.15, 0.88],
...            "variable_3": [0.18, 0.77],
...        }
...    )
>>> avatar
   variable_1  variable_2  variable_3
0          60        0.15        0.18
1          15        0.88        0.77

Then the postprocess allows you to get the original variable unit.

>>> processor.postprocess(df, avatar)
   variable_1  variable_2  variable_3
0          60        27.3        32.7
1          15         8.0         7.0

By this, we keep the mathematical relation variable_1 = variable_2 + variable_3

with sum_to_one=False

>>> processor = ProportionProcessor(
...    variable_names=["variable_2", "variable_3"],
...    reference="variable_1",
...    sum_to_one=False,
... )
>>> processor.preprocess(df=df)
   variable_1  variable_2  variable_3
0         100         0.1         0.9
1          10         1.0         3.0
>>> avatar = pd.DataFrame(
...        {
...            "variable_1": [60, 15],
...            "variable_2": [0.15, 0.88],
...            "variable_3": [1.5, 2.8],
...        }
...    )
>>> avatar
   variable_1  variable_2  variable_3
0          60        0.15         1.5
1          15        0.88         2.8
>>> processor.postprocess(df, avatar)
   variable_1  variable_2  variable_3
0          60         9.0        90.0
1          15        13.2        42.0
preprocess(df: DataFrame) DataFrame

Transform numeric variables into proportion of another variable.

If some values for the variables to transform are set to nan, they will be transformed into nan and will be considered as a 0% proportion of the reference when transforming values of other variables.

Parameters:

df (dataframe to transform)

Returns:

DataFrame

Return type:

a dataframe with the transformed version of wanted columns

postprocess(source: DataFrame, dest: DataFrame) DataFrame

Transform proportion of another variable into an absolute numeric value.

Parameters:
  • source (not used)

  • dest (dataframe to transform)

Returns:

DataFrame

Return type:

a dataframe with the transformed version of wanted columns