Client¶

The ApiClient object¶

The Python client acts as an interface that communicates with the avatarization engine. For more information about the concepts and avatarization, checkout our main docs.

The Manager is the main interfaces that you should use. You’ll instantiate it, and authenticate using the credentials to the engine.

The Runner contains the main functionality for your anonymization. Use avatars.manager.Manager.create_runner() to create a runner.

Methods¶

Here below are the methods provided that communicate with the engine. The API they expose uses pydantic objects to help you pass in the correct arguments to the methods.

class avatars.runner.Results(*values)¶

ADVICE = 'advice'¶

SHUFFLED = 'shuffled'¶

UNSHUFFLED = 'unshuffled'¶

PRIVACY_METRICS = 'privacy_metrics'¶

SIGNAL_METRICS = 'signal_metrics'¶

REPORT_IMAGES = 'report_images'¶

PROJECTIONS_ORIGINAL = 'projections-original'¶

PROJECTIONS_AVATARS = 'projections-avatars'¶

METADATA = 'run_metadata'¶

REPORT = 'report'¶

META_METRICS = 'meta_metrics'¶

class avatars.runner.Runner(api_client: ApiClient, set_name: str, seed: int | None = None)¶

add_table(table_name: str, data: str | DataFrame, primary_key: str | None = None, foreign_keys: list | None = None, time_series_time: str | None = None, types: dict[str, ColumnType] = {}, individual_level: bool | None = None, avatar_data: str | None = None)¶

Add a table to the config and upload the data in the server.

Parameters:

table_name – The name of the table.
data – The data to add to the table. Can be a path to a file or a pandas DataFrame.
primary_key – The primary key of the table.
foreign_keys – Foreign keys of the table.
time_series_time – name of the time column in the table (time series case).
types – A dictionary of column types with the column name as the key and the type as the value.
individual_level – A boolean as true if the table is at individual level or not. An individual level table is a table where each row corresponds to an individual (ex: patient, customer, etc.)
avatar_data – The avatar table if there is one. Can be a path to a file or a pandas DataFrame.

advise_parameters(table_name: str | None = None) → None¶

Fill the parameters set with the server recommendation.

Parameters:: table_name – The name of the table. If None, all tables will be used.

upload_file(table_name: str, data: str | DataFrame, avatar_data: str | DataFrame | None = None)¶

Upload a file to the server.

Parameters:

data – The data to upload. Can be a path to a file or a pandas DataFrame.
file_name – The name of the file.

add_link(parent_table_name: str, parent_field: str, child_table_name: str, child_field: str, method: LinkMethod = LinkMethod())¶

Add a table link to the config.

Parameters:

parent_table_name – The name of the parent table.
child_table_name – The name of the child table.
parent_field – The parent link key field (primary key) in the parent table.
child_field – The child link key field (foreign key)in the child table.
method – The method to use for linking the tables. Defaults to “linear_sum_assignment”.

set_parameters(table_name: str, k: int | None = None, ncp: int | None = None, use_categorical_reduction: bool | None = None, column_weights: dict[str, float] | None = None, exclude_variable_names: list[str] | None = None, exclude_replacement_strategy: ExcludeVariablesMethod | None = None, imputation_method: ImputeMethod | None = None, imputation_k: int | None = None, imputation_training_fraction: float | None = None, dp_epsilon: float | None = None, dp_preprocess_budget_ratio: float | None = None, time_series_nf: int | None = None, time_series_projection_type: ProjectionType | None = None, time_series_nb_points: int | None = None, time_series_method: AlignmentMethod | None = None, known_variables: list[str] | None = None, target: str | None = None, closest_rate_percentage_threshold: float | None = None, closest_rate_ratio_threshold: float | None = None, categorical_hidden_rate_variables: list[str] | None = None)¶

Set the parameters for the table.

Parameters:

table_name – The name of the table.
k – Number of nearest neighbors to consider for KNN-based methods.
ncp – Number of dimensions to consider for the KNN algorithm.
use_categorical_reduction – Whether to transform categorical variables into a latent numerical space before projection.
column_weights – Dictionary mapping column names to their respective weights, indicating the importance of each variable during the projection process.
exclude_variable_names – List of variable names to exclude from the projection.
exclude_replacement_strategy (ExcludeVariablesMethod, optional) – Strategy for replacing excluded variables. Options: ROW_ORDER, COORDINATE_SIMILARITY.
imputation_method – Method for imputing missing values. Options: ImputeMethod.KNN, ImputeMethod.MODE, ImputeMethod.MEDIAN, ImputeMethod.MEAN, ImputeMethod.FAST_KNN.
imputation_k – Number of neighbors to use for imputation if the method is KNN or FAST_KNN.
imputation_training_fraction – Fraction of the dataset to use for training the imputation model when using KNN or FAST_KNN.
dp_epsilon – Epsilon value for differential privacy.
dp_preprocess_budget_ratio – Budget ration to allocate when using differential privacy avatarization.
time_series_nf – In time series context, number of degrees of freedom to retain in time series projections.
time_series_projection_type – In time series context, type of projection for time series. Options: ProjectionType.FCPA, ProjectionType.FLATTEN default is FCPA.
time_series_method – In time series context, method for aligning series. Options: AlignmentMethod.SPECIFIED, AlignmentMethod.MAX, AlignmentMethod.MIN, AlignmentMethod.MEAN.
time_series_nb_points – In time series context, number of points to generate for time series.
known_variables – List of known variables to be used for privacy metrics. These are variables that could be easily known by an attacker.
target – Target variable to predict, used for signal metrics.

update_parameters(table_name: str, **kwargs) → None¶

Update specific parameters for the table while preserving other existing parameters. Only updates the parameters that are provided, keeping existing values for others.

Parameters:

table_name – The name of the table.
**kwargs – The parameters to update. Only parameters that are provided will be updated. See set_parameters for the full list of available parameters.

delete_parameters(table_name: str, parameters_names: list[str] | None = None)¶

Delete parameters from the config.

Parameters:

table_name – The name of the table.
parameters_names – The names of the parameters to delete. If None, all parameters will be deleted.

delete_link(parent_table_name: str, child_table_name: str)¶

Delete a link from the config.

Parameters:

parent_table_name – The name of the parent table.
child_table_name – The name of the child table.

delete_table(table_name: str)¶

Delete a table from the config.

Parameters:: table_name – The name of the table.

get_yaml(path: str | None = None)¶

Get the yaml config.

Parameters:: path – The path to the yaml file. If None, the default config will be returned.

run(jobs_to_run: list[JobKind] = [JobKind.standard, JobKind.signal_metrics, JobKind.privacy_metrics, JobKind.report])¶

get_status(job_name: JobKind)¶: Get the status of a job by name. :param job_name: The name of the job to get.

get_specific_result(table_name: str, job_name: JobKind, result: Results = Results.SHUFFLED) → dict | DataFrame | str | list[dict]¶

Download a file from the results.

Parameters:

table_name – The name of the table to search for.
job_name – The name of the job to search for.
result – The result to search for.

Returns:

Either a pandas DataFrame or a dictionary or a list of dictionary depending on the result type.

Return type:

TypeResults

get_all_results()¶

Get all results.

Returns:

dict – A dictionary with the results of each job on every table.
Each job is a dictionary with the table name as key and the results as value.
The results are a dictionary with the result name as key and the data as value.
The data can be a pandas DataFrame or a dictionary depending on the result type.

download_report(path: str | None = None)¶

Download the report.

Parameters:: path – The path to save the report.

print_parameters(table_name: str | None = None) → None¶

Print the parameters for a table.

Parameters:: table_name – The name of the table. If None, all parameters will be printed.

kill()¶: Method not implemented yet.

shuffled(table_name: str) → DataFrame¶

Get the shuffled data.

Parameters:: table_name – The name of the table to get the shuffled data from.
Returns:: The shuffled data as a pandas DataFrame.
Return type:: pd.DataFrame

sensitive_unshuffled(table_name: str) → DataFrame¶

Get the unshuffled data. This is sensitive data and should be used with caution.

Parameters:: table_name – The name of the table to get the unshuffled data from.
Returns:: The unshuffled data as a pandas DataFrame.
Return type:: pd.DataFrame

privacy_metrics(table_name: str) → list[dict]¶

Get the privacy metrics.

Parameters:: table_name – The name of the table to get the privacy metrics from.
Returns:: The privacy metrics as a list of dictionary.
Return type:: dict

signal_metrics(table_name: str) → list[dict]¶

Get the signal metrics.

Parameters:: table_name – The name of the table to get the signal metrics from.
Returns:: The signal metrics as a list of dictionary.
Return type:: dict

projections(table_name: str, job_name: JobKind = JobKind.standard) → tuple[DataFrame, DataFrame]¶

Get the projections.

Parameters:

table_name – The name of the table to get the projections from.
job_name – The name of the job to get the projections from by default from avatarization job.

Returns:

The projections as a pandas DataFrame.

Return type:

pd.DataFrame

from_yaml(yaml_path: str) → None¶

Create a Runner object from a YAML configuration.

Parameters:: yaml – The path to the yaml to transform.
Returns:: A Runner object configured based on the YAML content.
Return type:: Runner