Changelog¶

NEXT RELEASE¶

fix: use specific avatar-yaml version

1.0.4 - 2025/05/20¶

feat: support generating multitable reports
feat: add advisor functionality

1.0.3 - 2025/04/30¶

feat: make mandatory a set_name in the runner

1.0.2 - 2025/04/29¶

BREAKING: feat: Release of the python client for the API 1.0.0 🚀 🥳.
feat: New documentation of the python client.

0.15.0 - 2024/08/26¶

feat: add tutorial on job management
feat: add GeolocationNormalizationProcessor
chore: remove timeout to avoid re-POST
BREAKING: chore: remove all batch from client side

0.14.0 - 2024/08/07¶

BREAKING: remove deprecated persistance_job_id
BREAKING: remove deprecated to_categorical_threshold

0.13.0 - 2024/07/24¶

BREAKING: send the total size of the stream at the start of the stream
Remove dependency on libmagic

0.12.0 - 2024/07/05¶

BREAKING: refactor: Dataset.columns is required

0.11.0 - 2024/07/01¶

BREAKING: fix dataset upload

0.10.0 - 2024/06/18¶

BREAKING: fix dataset upload

0.9.2 - 2024/06/11¶

feat: retry any kind of network error

0.9.1 - 2024/06/10¶

feat: retry on DNS resolution errors

0.9.0 - 2024/06/06¶

feat: add categorical hidden rate variable to privacy parameters
BREAKING refactor: categorical hidden rate is optional in PrivacyMetrics

0.8.0 - 2024/06/05¶

BREAKING feat: add linkage methods to TableLink and make linear sum assignment the default method.
BREAKING refactor: remove ExcludeCategoricalParameters and replace it by ExcludeVariablesParameters

0.7.4 - 2024/05/15¶

Add advice for choosing avatarization parameters
Speed up projector load and save
Remove dataset_id from get_variable_contributions
Add size agnostic bi-directional arrow/parquet streaming utilities

0.7.3 - 2024/04/29¶

Allow passing filetype in datasets.download_dataset and pandas_integration.download_dataframe to change the format of the retrieved data
Deprecate datasets.download_dataset_as_stream and datasets.create_dataset_from_stream
Deprecate the ‘should_stream’ argument from pandas_integration.upload_dataframe and pandas_integration.download_dataframe
Deprecate ‘request’ argument from datasets.create_dataset in favor of ‘source’ argument
Add ‘destination’ argument to datasets.download_dataset

0.7.2 - 2024/04/12¶

fix: remove retry logic around Job.last_updated_at

0.7.1 - 2024/04/11¶

feat: overhaul client architecture

0.7.0- 2024/04/05¶

fix: change shuffle multi-table process to return the right dataframe
fix: return metric parameter error to user
feat: return error to user if data contains ninf
feat: improve multi-table anonymization quality (utility)
feat: verify compatibility with server on client init
feat: add dataset name in the multitable privacy metrics
feat: create privacy geolocation assessment feature
refactor: add custom methods for Datasets
refactor: change seed place for avatarization and metrics job parameters to guarantee reproducibility

0.6.2¶

feat: add should_verify_ssl to ApiClient to bypass
refactor: revert to AvatarizationParameters.dataset_id being required
feat: add pydantic constraints to privacy metrics fields
feat: add multi table avatarization and privacy metrics jobs
feat: add ‘name’ keyword argument to create_dataset

0.6.1¶

feat: enable parquet format for dataset upload
feat: use pydantic v2
feat: add InterRecordBoundedCumulatedDifferenceProcessor
fix: max file size error message

0.6.0¶

feat: detect potential id columns
feat: add created_at, kind to Jobs
feat: add time series

0.5.2¶

feat: add InterRecordBoundedRangeDifferenceProcessor

0.5.1¶

fix: compatibility mapping due to breaking change

BREAKING CHANGE¶

remove broken endpoint /projections

0.4.0¶

feat: Limit the size of nb_days in find_all_jobs_by_user
feat: implement anonymization, metrics and report generation as a batch
feat: apply license check only during anonymization, not during upload
fix: Prevent user from uploaded a dataframe with bool dtype
fix: Correctly handle error on missing job
fix: standardize metrics in the anonymization report

BREAKING CHANGE¶

remove patch parameter from create_dataset

0.3.3¶

Add should_stream parameter to {upload,download}_dataframe and {create,download}_dataset. This should prevent issues with timeouts during upload and download, as well as lessen the load on the server for big files.
Add jobs.cancel_job method to cancel a job
Add use_categorical_reduction parameter
Add maximum password length of 128 characters
Add report creation without avatarization job
Remove re-raise of JSONDecodeError
Add commit hash to generated files
Fix: verify that known_variables and target are known when launching a privacy metrics job
Fix: call analyze_dataset only once in notebooks

0.3.2¶

catch JSONDecodeError and re-raise with more info

0.3.1¶

add should_verify_ssl to allow usage of self-signed certificate on server side
add InterRecordCumulatedDifferenceProcessor
add InterRecordRangeDifferenceProcessor
improve logging and error handling in avatarization_pipeline to resume easier on failure

0.3.0¶

BREAKING¶

ReportCreate now takes required avatarization_job_id, signal_job_id, and privacy_job_id parameters
Mark AvatarizationParameters.to_categorical_threshold as deprecated
client.jobs.create_avatarization_job behaviour does not compute metrics anymore. Use client.jobs.create_full_avatarization_job instead
AvatarizationResult now has signal_metrics and privacy_metrics properties as Optional
Verify dataset size on upload. This will prevent you from uploading a dataset that is too big to handle for the server
The direct_match_protection privacy metrics got renamed to column_direct_match_protection
dataset_id from AvatarizationParameters is now required
dataset_id from AvatarizationJob,SignalMetricsJob and PrivacyMetricsJob got removed
client.users.get_user now accepts an id rather than a username
SignalMetricsParameters.job_id got renamed to persistance_job_id
CreateUser does not take is_email_confirmed as parameter anymore
Processors get imported from avatars.processors instead of avatars.processor.{processor_name}
- Example: from avatars.processors.expected_mean import ExpectedMeanProcessor becomes from avatars.processors import ExpectedMeanProcessor

Others¶

feat: add more metrics and graphs to report
feat: add client.compatibility.is_client_compatible to verify client-server compatibility
feat: enable to avatarize without calculating metrics using client.jobs.create_avatarization_job
feat: add nb_dimensions property to Dataset
feat: add User object
feat: use patch in client.datasets.create_dataset to patch dataset columns on upload
feat: add correlation_protection_rate, inference_continuous, inference_categorical, row_direct_match_protection and closest_rate privacy metrics
feat: add known_variables, target, closest_rate_percentage_threshold, and closest_rate_ratio_threshold to PrivacyMetricsParameters
docs: add multiple versions of the documentation
feat: each user now belongs to an organization and gets a new field: organization_id
fix: fixed a bug where computing privacy metrics with distinct missing values was impossible

0.2.2¶

Improve type hints of the method
Update tutorial notebooks with smaller datasets
Fix bugs in tutorial notebooks
Improve error message when the call to the API times out
Add jobs.find_all_jobs_by_user
Add two new privacy metrics: direct_match_protection and categorical_hidden_rate
Add the DatetimeProcessor

0.2.1¶

Fix to processor taking the wrong number of arguments
Make the toolz package a mandatory dependency
Fix a handling of a target variable equaling zero

0.2.0¶

Drop support for python3.8 # BREAKING CHANGE
Drop jobs.get_job and job.create_job. # BREAKING CHANGE
Rename DatasetResponse to Dataset # BREAKING CHANGE
Rename client.pandas to client.pandas_integration # BREAKING CHANGE
Add separate endpoint to compute metrics separately using jobs.create_signal_metrics_job and jobs.create_privacy_metrics_job.
Add separate endpoint to access metrics jobs using jobs.get_signal_metrics and job.get_privacy_metrics
Add processors to pre- and post-process your data before, and after avatarization for custom use-cases. These are accessible under avatars.processors.
Handle errors more gracefully
Add ExcludeCategoricalParameters to use embedded processor on the server side

0.1.16¶

Add forgotten password endpoint
Add reset password endpoint
JobParameters becomes AvatarizationParameters
Add DCR and NNDR to privacy metrics

0.1.15¶

Handle category dtype
Fix dtype casting of datetime columns
Add ability to login with email
Add filtering options to find_users
Avatarizations are now called with create_avatarization_job and AvatarizationJobCreate. create_job and JobCreate are deprecated but still work.
dataset_id is now passed to AvatarizationParameters and not AvatarizationJobCreate.
Job.dataset_id is deprecated. Use Job.parameters.dataset_id instead.

BREAKING¶

Remove get_health_config call.

0.1.14¶

Give access to avatars unshuffled avatars dataset

0.1.13¶

Remove default value for to_categorical_threshold
Use logger.info instead of print