avatars.processors.GeolocationNormalizationProcessor

class avatars.processors.GeolocationNormalizationProcessor(*, latitude_variable: str, longitude_variable: str, n_reference_lat: int, n_bins: int)

Processor to normalize longitude values for different latitude.

Use of this processor is recommended on data with latitude and longitude variables where the overall area covered has regions with no points (e.g. lake, forests etc …) or non-squared bounds (e.g. country borders) and it is required to retain this lack of points in the anonymized data.

Keyword Arguments:
  • latitude_variable – latitude variable name

  • longitude_variable – longitude variable name

  • n_reference_lat – number of discretized reference latitudes. A high number of references will yield a higher fidelity in the latitude dimension

  • n_bins – number of discretized longitude bins at each reference latitude. A high number of bins will yield a higher fidelity in the longitude dimension.

Examples

>>> import numpy as np
>>> df = pd.DataFrame(
...    {
...        'lat': [49.1, 49.2, 49.3, 49.3],
...        'lon': [3.21, 3.19, 3.11, 3.18]
...    }
... )
>>> df
    lat   lon
0  49.1  3.21
1  49.2  3.19
2  49.3  3.11
3  49.3  3.18
>>> processor = GeolocationNormalizationProcessor(
...    latitude_variable='lat',
...    longitude_variable='lon',
...    n_reference_lat=3,
...    n_bins=5
... )
>>> processed = processor.preprocess(df)
>>> processed
    lat  lon
0  49.1  0.5
1  49.2  0.5
2  49.3  0.0
3  49.3  1.0

The pre process expresses one of the coordinate dimension (i.e. lon) between 0 and 1. The range of values is different for different lat.

>>> processor.postprocess(source=df, dest=processed)
    lat   lon
0  49.1  3.21
1  49.2  3.19
2  49.3  3.11
3  49.3  3.18

The post process re-express the coordinates in the same original ranges.

preprocess(df: DataFrame) DataFrame
postprocess(source: DataFrame, dest: DataFrame) DataFrame