Runner

The Runner orchestrates an avatarization workflow: data upload, parameter configuration, job submission, status polling, and result retrieval. It encapsulates a configuration object (avatar_yaml.Config) that mirrors the YAML structure used by batch operations.

Responsibilities

  • Collect and upload source (and optional avatar) tables (add_table)

  • Advise or customize parameters (advise_parameters / set_parameters)

  • Create links between tables (only needed for multitable) (add_link)

  • Launch jobs individually or end-to-end (run and specialized methods)

  • Download results (get_all_results and specialized methods)

Design notes

The Runner is stateful: it remembers tables, generated parameters, created jobs and download results. You usually create one per anonymization project (“set”) via manager.create_runner(set_name=...). Internally it delegates network calls to the shared authenticated ApiClient.

Minimal flow

runner = manager.create_runner("demo")
runner.add_table("wbcd", "fixtures/wbcd.csv")
runner.advise_parameters()         # optional
runner.set_parameters("wbcd", k=15)
runner.run()                       # runs the full pipeline
runner.get_all_results()          # downloads all results

Detailed reference

class avatars.runner.Runner(api_client: ApiClient, display_name: str, seed: int | None = None, max_distribution_plots: int | None = None)[source]

Bases: object

add_annotations(annotations: dict[str, str]) None[source]
add_table(table_name: str, data: str | DataFrame, primary_key: str | None = None, foreign_keys: list | None = None, time_series_time: str | None = None, types: dict[str, ColumnType] = {}, individual_level: bool | None = None, avatar_data: str | DataFrame | None = None)[source]

Add a table to the config and upload the data in the server.

Parameters:
  • table_name – The name of the table.

  • data – The data to add to the table. Can be a path to a file or a pandas DataFrame.

  • primary_key – The primary key of the table.

  • foreign_keys – Foreign keys of the table.

  • time_series_time – name of the time column in the table (time series case).

  • types – A dictionary of column types with the column name as the key and the type as the value.

  • individual_level – A boolean as true if the table is at individual level or not. An individual level table is a table where each row corresponds to an individual (ex: patient, customer, etc.)

  • avatar_data – The avatar table if there is one. Can be a path to a file or a pandas DataFrame.

advise_parameters(table_name: str | None = None) None[source]

Fill the parameters set with the server recommendation.

Parameters:

table_name – The name of the table. If None, all tables will be used.

upload_file(table_name: str, data: str | DataFrame, avatar_data: str | DataFrame | None = None)[source]

Upload a file to the server.

Parameters:
  • data – The data to upload. Can be a path to a file or a pandas DataFrame.

  • file_name – The name of the file.

add_link(parent_table_name: str, parent_field: str, child_table_name: str, child_field: str, method: LinkMethod = LinkMethod())[source]

Add a table link to the config.

Parameters:
  • parent_table_name – The name of the parent table.

  • child_table_name – The name of the child table.

  • parent_field – The parent link key field (primary key) in the parent table.

  • child_field – The child link key field (foreign key)in the child table.

  • method – The method to use for linking the tables. Defaults to “linear_sum_assignment”.

set_parameters(table_name: str, k: int | None = None, ncp: int | None = None, use_categorical_reduction: bool | None = None, column_weights: dict[str, float] | None = None, exclude_variable_names: list[str] | None = None, exclude_replacement_strategy: ExcludeVariablesMethod | None = None, imputation_method: ImputeMethod | None = None, imputation_k: int | None = None, imputation_training_fraction: float | None = None, imputation_return_data_imputed: bool | None = None, dp_epsilon: float | None = None, dp_preprocess_budget_ratio: float | None = None, time_series_nf: int | None = None, time_series_projection_type: ProjectionType | None = None, time_series_nb_points: int | None = None, time_series_method: AlignmentMethod | None = None, known_variables: list[str] | None = None, target: str | None = None, quantile_threshold: int | None = None, categorical_hidden_rate_variables: list[str] | None = None)[source]

Set the parameters for a given table.

This will overwrite any existing parameters for the table, including parameters set using advise_parameter().

Parameters:
  • table_name – The name of the table.

  • k – Number of nearest neighbors to consider for KNN-based methods.

  • ncp – Number of dimensions to consider for the KNN algorithm.

  • use_categorical_reduction – Whether to transform categorical variables into a latent numerical space before projection.

  • column_weights – Dictionary mapping column names to their respective weights, indicating the importance of each variable during the projection process.

  • exclude_variable_names – List of variable names to exclude from the projection.

  • exclude_replacement_strategy (ExcludeVariablesMethod, optional) – Strategy for replacing excluded variables. Options: ROW_ORDER, COORDINATE_SIMILARITY.

  • imputation_method – Method for imputing missing values. Options: ImputeMethod.KNN, ImputeMethod.MODE, ImputeMethod.MEDIAN, ImputeMethod.MEAN, ImputeMethod.FAST_KNN.

  • imputation_k – Number of neighbors to use for imputation if the method is KNN or FAST_KNN.

  • imputation_training_fraction – Fraction of the dataset to use for training the imputation model when using KNN or FAST_KNN.

  • imputation_return_data_imputed – Whether to return the data with imputed values.

  • dp_epsilon – Epsilon value for differential privacy.

  • dp_preprocess_budget_ratio – Budget ration to allocate when using differential privacy avatarization.

  • time_series_nf – In time series context, number of degrees of freedom to retain in time series projections.

  • time_series_projection_type – In time series context, type of projection for time series. Options: ProjectionType.FCPA (default) or ProjectionType.FLATTEN.

  • time_series_method – In time series context, method for aligning series. Options: AlignmentMethod.SPECIFIED, AlignmentMethod.MAX, AlignmentMethod.MIN, AlignmentMethod.MEAN.

  • time_series_nb_points – In time series context, number of points to generate for time series.

  • known_variables – List of known variables to be used for privacy metrics. These are variables that could be easily known by an attacker.

  • target – Target variable to predict, used for signal metrics.

update_parameters(table_name: str, **kwargs) None[source]

Update specific parameters for the table while preserving other existing parameters. Only updates the parameters that are provided, keeping existing values for others.

Parameters:
  • table_name – The name of the table.

  • **kwargs – The parameters to update. Only parameters that are provided will be updated. See set_parameters for the full list of available parameters.

delete_parameters(table_name: str, parameters_names: list[str] | None = None)[source]

Delete parameters from the config.

Parameters:
  • table_name – The name of the table.

  • parameters_names – The names of the parameters to delete. If None, all parameters will be deleted.

delete_link(parent_table_name: str, child_table_name: str)[source]

Delete a link from the config.

Parameters:
  • parent_table_name – The name of the parent table.

  • child_table_name – The name of the child table.

delete_table(table_name: str)[source]

Delete a table from the config.

Parameters:

table_name – The name of the table.

get_yaml(path: str | None = None)[source]

Get the yaml config.

Parameters:

path – The path to the yaml file. If None, the default config will be returned.

run(jobs_to_run: list[JobKind] = [JobKind.standard, JobKind.signal_metrics, JobKind.privacy_metrics, JobKind.report])[source]
get_status(job_name: JobKind)[source]

Get the status of a job by name. :param job_name: The name of the job to get.

get_specific_result_urls(job_name: JobKind, result: Results = Results.SHUFFLED) list[str][source]
get_specific_result(table_name: str, job_name: JobKind, result: Results = Results.SHUFFLED) dict | DataFrame | str | list[dict[str, Any]] | None | HTML[source]
get_all_results()[source]

Get all results.

Returns:

  • dict – A dictionary with the results of each job on every table.

  • Each job is a dictionary with the table name as key and the results as value.

  • The results are a dictionary with the result name as key and the data as value.

  • The data can be a pandas DataFrame or a dictionary depending on the result type.

download_report(path: str | None = None)[source]

Download the report.

Parameters:

path – The path to save the report.

print_parameters(table_name: str | None = None) None[source]

Print the parameters for a table.

Parameters:

table_name – The name of the table. If None, all parameters will be printed.

kill()[source]

Method not implemented yet.

shuffled(table_name: str) DataFrame[source]

Get the shuffled data.

Parameters:

table_name – The name of the table to get the shuffled data from.

Returns:

The shuffled data as a pandas DataFrame.

Return type:

pd.DataFrame

sensitive_unshuffled(table_name: str) DataFrame[source]

Get the unshuffled data. This is sensitive data and should be used with caution.

Parameters:

table_name – The name of the table to get the unshuffled data from.

Returns:

The unshuffled data as a pandas DataFrame.

Return type:

pd.DataFrame

privacy_metrics(table_name: str) list[dict][source]

Get the privacy metrics.

Parameters:

table_name – The name of the table to get the privacy metrics from.

Returns:

The privacy metrics as a list of dictionary.

Return type:

dict

signal_metrics(table_name: str) list[dict][source]

Get the signal metrics.

Parameters:

table_name – The name of the table to get the signal metrics from.

Returns:

The signal metrics as a list of dictionary.

Return type:

dict

render_plot(table_name: str, plot_kind: PlotKind, open_in_browser: bool = False)[source]

Render a plot for a given table. The different plot kinds are defined in the PlotKind enum.

Parameters:
  • table_name – The name of the table to get the plot from.

  • plot_kind – The kind of plot to render.

  • open_in_browser – Whether to save the plot to a file and open it in a browser.

projections(table_name: str) tuple[DataFrame, DataFrame][source]

Get the projections.

Parameters:

table_name – The name of the table to get the projections from.

Returns:

The projections as a pandas DataFrame.

Return type:

pd.DataFrame

table_summary(table_name: str) DataFrame[source]

Get the table summary.

Parameters:

table_name – The name of the table to get the summary from.

Returns:

The table summary as a dataframe.

Return type:

pd.DataFrame

from_yaml(yaml_path: str) None[source]

Create a Runner object from a YAML configuration.

Parameters:

yaml – The path to the yaml to transform.

Returns:

A Runner object configured based on the YAML content.

Return type:

Runner