Runner¶
The Runner orchestrates an avatarization workflow: data upload, parameter
configuration, job submission, status polling, and result retrieval.
It encapsulates a configuration object (avatar_yaml.Config) that mirrors the
YAML structure used by batch operations.
Responsibilities¶
Collect and upload source (and optional avatar) tables (
add_table)Advise or customize parameters (
advise_parameters/set_parameters)Create links between tables (only needed for multitable) (
add_link)Launch jobs individually or end-to-end (
runand specialized methods)Download results (
get_all_resultsand specialized methods)
Design notes¶
The Runner is stateful: it remembers tables, generated parameters, created jobs and
download results. You usually create one per anonymization project (“set”) via
manager.create_runner(set_name=...). Internally it delegates network calls to the
shared authenticated ApiClient.
Minimal flow¶
runner = manager.create_runner("demo")
runner.add_table("wbcd", "fixtures/wbcd.csv")
runner.advise_parameters() # optional
runner.set_parameters("wbcd", k=15)
runner.run() # runs the full pipeline
runner.get_all_results() # downloads all results
Detailed reference¶
- class avatars.runner.Runner(api_client: ApiClient, display_name: str, seed: int | None = None, max_distribution_plots: int | None = None)[source]
Bases:
object- add_annotations(annotations: dict[str, str]) None[source]
- add_table(table_name: str, data: str | DataFrame, primary_key: str | None = None, foreign_keys: list | None = None, time_series_time: str | None = None, types: dict[str, ColumnType] = {}, individual_level: bool | None = None, avatar_data: str | DataFrame | None = None)[source]
Add a table to the config and upload the data in the server.
- Parameters:
table_name – The name of the table.
data – The data to add to the table. Can be a path to a file or a pandas DataFrame.
primary_key – The primary key of the table.
foreign_keys – Foreign keys of the table.
time_series_time – name of the time column in the table (time series case).
types – A dictionary of column types with the column name as the key and the type as the value.
individual_level – A boolean as true if the table is at individual level or not. An individual level table is a table where each row corresponds to an individual (ex: patient, customer, etc.)
avatar_data – The avatar table if there is one. Can be a path to a file or a pandas DataFrame.
- advise_parameters(table_name: str | None = None) None[source]
Fill the parameters set with the server recommendation.
- Parameters:
table_name – The name of the table. If None, all tables will be used.
- upload_file(table_name: str, data: str | DataFrame, avatar_data: str | DataFrame | None = None)[source]
Upload a file to the server.
- Parameters:
data – The data to upload. Can be a path to a file or a pandas DataFrame.
file_name – The name of the file.
- add_link(parent_table_name: str, parent_field: str, child_table_name: str, child_field: str, method: LinkMethod = LinkMethod())[source]
Add a table link to the config.
- Parameters:
parent_table_name – The name of the parent table.
child_table_name – The name of the child table.
parent_field – The parent link key field (primary key) in the parent table.
child_field – The child link key field (foreign key)in the child table.
method – The method to use for linking the tables. Defaults to “linear_sum_assignment”.
- set_parameters(table_name: str, k: int | None = None, ncp: int | None = None, use_categorical_reduction: bool | None = None, column_weights: dict[str, float] | None = None, exclude_variable_names: list[str] | None = None, exclude_replacement_strategy: ExcludeVariablesMethod | None = None, imputation_method: ImputeMethod | None = None, imputation_k: int | None = None, imputation_training_fraction: float | None = None, imputation_return_data_imputed: bool | None = None, dp_epsilon: float | None = None, dp_preprocess_budget_ratio: float | None = None, time_series_nf: int | None = None, time_series_projection_type: ProjectionType | None = None, time_series_nb_points: int | None = None, time_series_method: AlignmentMethod | None = None, known_variables: list[str] | None = None, target: str | None = None, quantile_threshold: int | None = None, categorical_hidden_rate_variables: list[str] | None = None)[source]
Set the parameters for a given table.
This will overwrite any existing parameters for the table, including parameters set using advise_parameter().
- Parameters:
table_name – The name of the table.
k – Number of nearest neighbors to consider for KNN-based methods.
ncp – Number of dimensions to consider for the KNN algorithm.
use_categorical_reduction – Whether to transform categorical variables into a latent numerical space before projection.
column_weights – Dictionary mapping column names to their respective weights, indicating the importance of each variable during the projection process.
exclude_variable_names – List of variable names to exclude from the projection.
exclude_replacement_strategy (ExcludeVariablesMethod, optional) – Strategy for replacing excluded variables. Options: ROW_ORDER, COORDINATE_SIMILARITY.
imputation_method – Method for imputing missing values. Options:
ImputeMethod.KNN,ImputeMethod.MODE,ImputeMethod.MEDIAN,ImputeMethod.MEAN,ImputeMethod.FAST_KNN.imputation_k – Number of neighbors to use for imputation if the method is KNN or FAST_KNN.
imputation_training_fraction – Fraction of the dataset to use for training the imputation model when using KNN or FAST_KNN.
imputation_return_data_imputed – Whether to return the data with imputed values.
dp_epsilon – Epsilon value for differential privacy.
dp_preprocess_budget_ratio – Budget ration to allocate when using differential privacy avatarization.
time_series_nf – In time series context, number of degrees of freedom to retain in time series projections.
time_series_projection_type – In time series context, type of projection for time series. Options:
ProjectionType.FCPA(default) orProjectionType.FLATTEN.time_series_method – In time series context, method for aligning series. Options:
AlignmentMethod.SPECIFIED,AlignmentMethod.MAX,AlignmentMethod.MIN,AlignmentMethod.MEAN.time_series_nb_points – In time series context, number of points to generate for time series.
known_variables – List of known variables to be used for privacy metrics. These are variables that could be easily known by an attacker.
target – Target variable to predict, used for signal metrics.
- update_parameters(table_name: str, **kwargs) None[source]
Update specific parameters for the table while preserving other existing parameters. Only updates the parameters that are provided, keeping existing values for others.
- Parameters:
table_name – The name of the table.
**kwargs – The parameters to update. Only parameters that are provided will be updated. See set_parameters for the full list of available parameters.
- delete_parameters(table_name: str, parameters_names: list[str] | None = None)[source]
Delete parameters from the config.
- Parameters:
table_name – The name of the table.
parameters_names – The names of the parameters to delete. If None, all parameters will be deleted.
- delete_link(parent_table_name: str, child_table_name: str)[source]
Delete a link from the config.
- Parameters:
parent_table_name – The name of the parent table.
child_table_name – The name of the child table.
- delete_table(table_name: str)[source]
Delete a table from the config.
- Parameters:
table_name – The name of the table.
- get_yaml(path: str | None = None)[source]
Get the yaml config.
- Parameters:
path – The path to the yaml file. If None, the default config will be returned.
- run(jobs_to_run: list[JobKind] = [JobKind.standard, JobKind.signal_metrics, JobKind.privacy_metrics, JobKind.report])[source]
- get_status(job_name: JobKind)[source]
Get the status of a job by name. :param job_name: The name of the job to get.
- get_specific_result(table_name: str, job_name: JobKind, result: Results = Results.SHUFFLED) dict | DataFrame | str | list[dict[str, Any]] | None | HTML[source]
- get_all_results()[source]
Get all results.
- Returns:
dict – A dictionary with the results of each job on every table.
Each job is a dictionary with the table name as key and the results as value.
The results are a dictionary with the result name as key and the data as value.
The data can be a pandas DataFrame or a dictionary depending on the result type.
- download_report(path: str | None = None)[source]
Download the report.
- Parameters:
path – The path to save the report.
- print_parameters(table_name: str | None = None) None[source]
Print the parameters for a table.
- Parameters:
table_name – The name of the table. If None, all parameters will be printed.
- kill()[source]
Method not implemented yet.
- shuffled(table_name: str) DataFrame[source]
Get the shuffled data.
- Parameters:
table_name – The name of the table to get the shuffled data from.
- Returns:
The shuffled data as a pandas DataFrame.
- Return type:
pd.DataFrame
- sensitive_unshuffled(table_name: str) DataFrame[source]
Get the unshuffled data. This is sensitive data and should be used with caution.
- Parameters:
table_name – The name of the table to get the unshuffled data from.
- Returns:
The unshuffled data as a pandas DataFrame.
- Return type:
pd.DataFrame
- privacy_metrics(table_name: str) list[dict][source]
Get the privacy metrics.
- Parameters:
table_name – The name of the table to get the privacy metrics from.
- Returns:
The privacy metrics as a list of dictionary.
- Return type:
dict
- signal_metrics(table_name: str) list[dict][source]
Get the signal metrics.
- Parameters:
table_name – The name of the table to get the signal metrics from.
- Returns:
The signal metrics as a list of dictionary.
- Return type:
dict
- render_plot(table_name: str, plot_kind: PlotKind, open_in_browser: bool = False)[source]
Render a plot for a given table. The different plot kinds are defined in the PlotKind enum.
- Parameters:
table_name – The name of the table to get the plot from.
plot_kind – The kind of plot to render.
open_in_browser – Whether to save the plot to a file and open it in a browser.
- projections(table_name: str) tuple[DataFrame, DataFrame][source]
Get the projections.
- Parameters:
table_name – The name of the table to get the projections from.
- Returns:
The projections as a pandas DataFrame.
- Return type:
pd.DataFrame
- table_summary(table_name: str) DataFrame[source]
Get the table summary.
- Parameters:
table_name – The name of the table to get the summary from.
- Returns:
The table summary as a dataframe.
- Return type:
pd.DataFrame
- from_yaml(yaml_path: str) None[source]
Create a Runner object from a YAML configuration.
- Parameters:
yaml – The path to the yaml to transform.
- Returns:
A Runner object configured based on the YAML content.
- Return type:
Runner