Runner¶

The Runner orchestrates an avatarization workflow: data upload, parameter configuration, job submission, status polling, and result retrieval. It encapsulates a configuration object (avatar_yaml.Config) that mirrors the YAML structure used by batch operations.

Responsibilities¶

Collect and upload source (and optional avatar) tables (add_table)
Advise or customize parameters (advise_parameters / set_parameters)
Create links between tables (only needed for multitable) (add_link)
Launch jobs individually or end-to-end (run and specialized methods)
Download results (get_all_results and specialized methods)

Design notes¶

The Runner is stateful: it remembers tables, generated parameters, created jobs and download results. You usually create one per anonymization project (“set”) via manager.create_runner(set_name=...). Internally it delegates network calls to the shared authenticated ApiClient.

Minimal flow¶

runner = manager.create_runner("demo")
runner.add_table("wbcd", "fixtures/wbcd.csv")
runner.advise_parameters()         # optional
runner.set_parameters("wbcd", k=15)
runner.run()                       # runs the full pipeline
runner.get_all_results()          # downloads all results

Detailed reference¶

class avatars.runner.Runner(api_client: ApiClient, display_name: str, seed: int | None = None, max_distribution_plots: int | None = None)[source]

Bases: object

add_annotations(annotations: dict[str, str]) → None[source]

add_table(table_name: str, data: str | DataFrame, primary_key: str | None = None, foreign_keys: list | None = None, time_series_time: str | None = None, types: dict[str, ColumnType] = {}, individual_level: bool | None = None, avatar_data: str | DataFrame | None = None)[source]

Add a table to the config and upload the data in the server.

Parameters:

table_name – The name of the table.
data – The data to add to the table. Can be a path to a file or a pandas DataFrame.
primary_key – The primary key of the table.
foreign_keys – Foreign keys of the table.
time_series_time – name of the time column in the table (time series case).
types – A dictionary of column types with the column name as the key and the type as the value.
individual_level – A boolean as true if the table is at individual level or not. An individual level table is a table where each row corresponds to an individual (ex: patient, customer, etc.)
avatar_data – The avatar table if there is one. Can be a path to a file or a pandas DataFrame.

advise_parameters(table_name: str | None = None) → None[source]

Fill the parameters set with the server recommendation.

Parameters:: table_name – The name of the table. If None, all tables will be used.

upload_file(table_name: str, data: str | DataFrame, avatar_data: str | DataFrame | None = None)[source]

Upload a file to the server.

Parameters:

data – The data to upload. Can be a path to a file or a pandas DataFrame.
file_name – The name of the file.

add_link(parent_table_name: str, parent_field: str, child_table_name: str, child_field: str, method: LinkMethod = LinkMethod())[source]

Add a table link to the config.

Parameters:

parent_table_name – The name of the parent table.
child_table_name – The name of the child table.
parent_field – The parent link key field (primary key) in the parent table.
child_field – The child link key field (foreign key)in the child table.
method – The method to use for linking the tables. Defaults to “linear_sum_assignment”.

set_parameters(table_name: str, k: int | None = None, ncp: int | None = None, use_categorical_reduction: bool | None = None, column_weights: dict[str, float] | None = None, exclude_variable_names: list[str] | None = None, exclude_replacement_strategy: ExcludeVariablesMethod | None = None, imputation_method: ImputeMethod | None = None, imputation_k: int | None = None, imputation_training_fraction: float | None = None, imputation_return_data_imputed: bool | None = None, dp_epsilon: float | None = None, dp_preprocess_budget_ratio: float | None = None, time_series_nf: int | None = None, time_series_projection_type: ProjectionType | None = None, time_series_nb_points: int | None = None, time_series_method: AlignmentMethod | None = None, known_variables: list[str] | None = None, target: str | None = None, quantile_threshold: int | None = None, categorical_hidden_rate_variables: list[str] | None = None)[source]

Set the parameters for a given table.

This will overwrite any existing parameters for the table, including parameters set using advise_parameter().

Parameters:

table_name – The name of the table.
k – Number of nearest neighbors to consider for KNN-based methods.
ncp – Number of dimensions to consider for the KNN algorithm.
use_categorical_reduction – Whether to transform categorical variables into a latent numerical space before projection.
column_weights – Dictionary mapping column names to their respective weights, indicating the importance of each variable during the projection process.
exclude_variable_names – List of variable names to exclude from the projection.
exclude_replacement_strategy (ExcludeVariablesMethod, optional) – Strategy for replacing excluded variables. Options: ROW_ORDER, COORDINATE_SIMILARITY.
imputation_method – Method for imputing missing values. Options: ImputeMethod.KNN, ImputeMethod.MODE, ImputeMethod.MEDIAN, ImputeMethod.MEAN, ImputeMethod.FAST_KNN.
imputation_k – Number of neighbors to use for imputation if the method is KNN or FAST_KNN.
imputation_training_fraction – Fraction of the dataset to use for training the imputation model when using KNN or FAST_KNN.
imputation_return_data_imputed – Whether to return the data with imputed values.
dp_epsilon – Epsilon value for differential privacy.
dp_preprocess_budget_ratio – Budget ration to allocate when using differential privacy avatarization.
time_series_nf – In time series context, number of degrees of freedom to retain in time series projections.
time_series_projection_type – In time series context, type of projection for time series. Options: ProjectionType.FCPA (default) or ProjectionType.FLATTEN.
time_series_method – In time series context, method for aligning series. Options: AlignmentMethod.SPECIFIED, AlignmentMethod.MAX, AlignmentMethod.MIN, AlignmentMethod.MEAN.
time_series_nb_points – In time series context, number of points to generate for time series.
known_variables – List of known variables to be used for privacy metrics. These are variables that could be easily known by an attacker.
target – Target variable to predict, used for signal metrics.

update_parameters(table_name: str, **kwargs) → None[source]

Update specific parameters for the table while preserving other existing parameters. Only updates the parameters that are provided, keeping existing values for others.

Parameters:

table_name – The name of the table.
**kwargs – The parameters to update. Only parameters that are provided will be updated. See set_parameters for the full list of available parameters.

delete_parameters(table_name: str, parameters_names: list[str] | None = None)[source]

Delete parameters from the config.

Parameters:

table_name – The name of the table.
parameters_names – The names of the parameters to delete. If None, all parameters will be deleted.

delete_link(parent_table_name: str, child_table_name: str)[source]

Delete a link from the config.

Parameters:

parent_table_name – The name of the parent table.
child_table_name – The name of the child table.

delete_table(table_name: str)[source]

Delete a table from the config.

Parameters:: table_name – The name of the table.

get_yaml(path: str | None = None)[source]

Get the yaml config.

Parameters:: path – The path to the yaml file. If None, the default config will be returned.

run(jobs_to_run: list[JobKind] = [JobKind.standard, JobKind.signal_metrics, JobKind.privacy_metrics, JobKind.report])[source]

get_status(job_name: JobKind)[source]: Get the status of a job by name. :param job_name: The name of the job to get.

get_specific_result_urls(job_name: JobKind, result: Results = Results.SHUFFLED) → list[str][source]

get_specific_result(table_name: str, job_name: JobKind, result: Results = Results.SHUFFLED) → dict | DataFrame | str | list[dict[str, Any]] | None | HTML[source]

get_all_results()[source]

Get all results.

Returns:

dict – A dictionary with the results of each job on every table.
Each job is a dictionary with the table name as key and the results as value.
The results are a dictionary with the result name as key and the data as value.
The data can be a pandas DataFrame or a dictionary depending on the result type.

download_report(path: str | None = None)[source]

Download the report.

Parameters:: path – The path to save the report.

print_parameters(table_name: str | None = None) → None[source]

Print the parameters for a table.

Parameters:: table_name – The name of the table. If None, all parameters will be printed.

kill()[source]: Method not implemented yet.

shuffled(table_name: str) → DataFrame[source]

Get the shuffled data.

Parameters:: table_name – The name of the table to get the shuffled data from.
Returns:: The shuffled data as a pandas DataFrame.
Return type:: pd.DataFrame

sensitive_unshuffled(table_name: str) → DataFrame[source]

Get the unshuffled data. This is sensitive data and should be used with caution.

Parameters:: table_name – The name of the table to get the unshuffled data from.
Returns:: The unshuffled data as a pandas DataFrame.
Return type:: pd.DataFrame

privacy_metrics(table_name: str) → list[dict][source]

Get the privacy metrics.

Parameters:: table_name – The name of the table to get the privacy metrics from.
Returns:: The privacy metrics as a list of dictionary.
Return type:: dict

signal_metrics(table_name: str) → list[dict][source]

Get the signal metrics.

Parameters:: table_name – The name of the table to get the signal metrics from.
Returns:: The signal metrics as a list of dictionary.
Return type:: dict

render_plot(table_name: str, plot_kind: PlotKind, open_in_browser: bool = False)[source]

Render a plot for a given table. The different plot kinds are defined in the PlotKind enum.

Parameters:

table_name – The name of the table to get the plot from.
plot_kind – The kind of plot to render.
open_in_browser – Whether to save the plot to a file and open it in a browser.

projections(table_name: str) → tuple[DataFrame, DataFrame][source]

Get the projections.

Parameters:: table_name – The name of the table to get the projections from.
Returns:: The projections as a pandas DataFrame.
Return type:: pd.DataFrame

table_summary(table_name: str) → DataFrame[source]

Get the table summary.

Parameters:: table_name – The name of the table to get the summary from.
Returns:: The table summary as a dataframe.
Return type:: pd.DataFrame

from_yaml(yaml_path: str) → None[source]

Create a Runner object from a YAML configuration.

Parameters:: yaml – The path to the yaml to transform.
Returns:: A Runner object configured based on the YAML content.
Return type:: Runner