Changelog
=========

NEXT RELEASE
------------

- feat: support generating multitable reports
- feat: add advisor functionality

1.0.3 - 2025/04/30
------------------

- feat: make mandatory a set_name in the runner

.. _section-1:

1.0.2 - 2025/04/29
------------------

- BREAKING: feat: Release of the python client for the API 1.0.0 🚀 🥳.
- feat: New documentation of the python client.

.. _section-2:

0.15.0 - 2024/08/26
-------------------

- feat: add tutorial on job management
- feat: add GeolocationNormalizationProcessor
- chore: remove timeout to avoid re-POST
- BREAKING: chore: remove all batch from client side

.. _section-3:

0.14.0 - 2024/08/07
-------------------

- BREAKING: remove deprecated persistance_job_id
- BREAKING: remove deprecated to_categorical_threshold

.. _section-4:

0.13.0 - 2024/07/24
-------------------

- BREAKING: send the total size of the stream at the start of the stream
- Remove dependency on libmagic

.. _section-5:

0.12.0 - 2024/07/05
-------------------

- BREAKING: refactor: Dataset.columns is required

.. _section-6:

0.11.0 - 2024/07/01
-------------------

- BREAKING: fix dataset upload

.. _section-7:

0.10.0 - 2024/06/18
-------------------

- BREAKING: fix dataset upload

.. _section-8:

0.9.2 - 2024/06/11
------------------

- feat: retry any kind of network error

.. _section-9:

0.9.1 - 2024/06/10
------------------

- feat: retry on DNS resolution errors

.. _section-10:

0.9.0 - 2024/06/06
------------------

- feat: add categorical hidden rate variable to privacy parameters
- BREAKING refactor: categorical hidden rate is optional in
  PrivacyMetrics

.. _section-11:

0.8.0 - 2024/06/05
------------------

- BREAKING feat: add linkage methods to TableLink and make linear sum
  assignment the default method.
- BREAKING refactor: remove ``ExcludeCategoricalParameters`` and replace
  it by ``ExcludeVariablesParameters``

.. _section-12:

0.7.4 - 2024/05/15
------------------

- Add advice for choosing avatarization parameters
- Speed up projector load and save
- Remove dataset_id from get_variable_contributions
- Add size agnostic bi-directional arrow/parquet streaming utilities

.. _section-13:

0.7.3 - 2024/04/29
------------------

- Allow passing filetype in datasets.download_dataset and
  pandas_integration.download_dataframe to change the format of the
  retrieved data
- Deprecate datasets.download_dataset_as_stream and
  datasets.create_dataset_from_stream
- Deprecate the ‘should_stream’ argument from
  pandas_integration.upload_dataframe and
  pandas_integration.download_dataframe
- Deprecate ‘request’ argument from datasets.create_dataset in favor of
  ‘source’ argument
- Add ‘destination’ argument to datasets.download_dataset

.. _section-14:

0.7.2 - 2024/04/12
------------------

- fix: remove retry logic around Job.last_updated_at

.. _section-15:

0.7.1 - 2024/04/11
------------------

- feat: overhaul client architecture

.. _section-16:

0.7.0- 2024/04/05
-----------------

- fix: change shuffle multi-table process to return the right dataframe
- fix: return metric parameter error to user
- feat: return error to user if data contains ninf
- feat: improve multi-table anonymization quality (utility)
- feat: verify compatibility with server on client init
- feat: add dataset name in the multitable privacy metrics
- feat: create privacy geolocation assessment feature
- refactor: add custom methods for Datasets
- refactor: change seed place for avatarization and metrics job
  parameters to guarantee reproducibility

.. _section-17:

0.6.2
-----

- feat: add should_verify_ssl to ApiClient to bypass
- refactor: revert to AvatarizationParameters.dataset_id being required
- feat: add pydantic constraints to privacy metrics fields
- feat: add multi table avatarization and privacy metrics jobs
- feat: add ‘name’ keyword argument to create_dataset

.. _section-18:

0.6.1
-----

- feat: enable parquet format for dataset upload
- feat: use pydantic v2
- feat: add InterRecordBoundedCumulatedDifferenceProcessor
- fix: max file size error message

.. _section-19:

0.6.0
-----

- feat: detect potential id columns
- feat: add created_at, kind to Jobs
- feat: add time series

.. _section-20:

0.5.2
-----

- feat: add InterRecordBoundedRangeDifferenceProcessor

.. _section-21:

0.5.1
-----

- fix: compatibility mapping due to breaking change

BREAKING CHANGE
~~~~~~~~~~~~~~~

- remove broken endpoint ``/projections``

.. _section-22:

0.4.0
-----

- feat: Limit the size of ``nb_days`` in ``find_all_jobs_by_user``
- feat: implement anonymization, metrics and report generation as a
  batch
- feat: apply license check only during anonymization, not during upload
- fix: Prevent user from uploaded a dataframe with ``bool`` dtype
- fix: Correctly handle error on missing job
- fix: standardize metrics in the anonymization report

.. _breaking-change-1:

BREAKING CHANGE
~~~~~~~~~~~~~~~

- remove ``patch`` parameter from ``create_dataset``

.. _section-23:

0.3.3
-----

- Add ``should_stream`` parameter to ``{upload,download}_dataframe`` and
  ``{create,download}_dataset``. This should prevent issues with
  timeouts during upload and download, as well as lessen the load on the
  server for big files.
- Add ``jobs.cancel_job`` method to cancel a job
- Add ``use_categorical_reduction`` parameter
- Add maximum password length of 128 characters
- Add report creation without avatarization job
- Remove re-raise of JSONDecodeError
- Add commit hash to generated files
- Fix: verify that ``known_variables`` and ``target`` are known when
  launching a privacy metrics job
- Fix: call analyze_dataset only once in notebooks

.. _section-24:

0.3.2
-----

- catch JSONDecodeError and re-raise with more info

.. _section-25:

0.3.1
-----

- add ``should_verify_ssl`` to allow usage of self-signed certificate on
  server side
- add ``InterRecordCumulatedDifferenceProcessor``
- add ``InterRecordRangeDifferenceProcessor``
- improve logging and error handling in avatarization_pipeline to resume
  easier on failure

.. _section-26:

0.3.0
-----

BREAKING
~~~~~~~~

- ``ReportCreate`` now takes required ``avatarization_job_id``,
  ``signal_job_id``, and ``privacy_job_id`` parameters
- Mark ``AvatarizationParameters.to_categorical_threshold`` as
  deprecated
- ``client.jobs.create_avatarization_job`` behaviour does not compute
  metrics anymore. Use ``client.jobs.create_full_avatarization_job``
  instead
- ``AvatarizationResult`` now has ``signal_metrics`` and
  ``privacy_metrics`` properties as ``Optional``
- Verify dataset size on upload. This will prevent you from uploading a
  dataset that is too big to handle for the server
- The ``direct_match_protection`` privacy metrics got renamed to
  ``column_direct_match_protection``
- ``dataset_id`` from ``AvatarizationParameters`` is now required
- ``dataset_id`` from ``AvatarizationJob``,\ ``SignalMetricsJob`` and
  ``PrivacyMetricsJob`` got removed
- ``client.users.get_user`` now accepts an ``id`` rather than a
  ``username``
- ``SignalMetricsParameters.job_id`` got renamed to
  ``persistance_job_id``
- ``CreateUser`` does not take ``is_email_confirmed`` as parameter
  anymore
- Processors get imported from ``avatars.processors`` instead of
  ``avatars.processor.{processor_name}``

  - Example:
    ``from avatars.processors.expected_mean import ExpectedMeanProcessor``
    becomes ``from avatars.processors import ExpectedMeanProcessor``

Others
~~~~~~

- feat: add more metrics and graphs to report
- feat: add ``client.compatibility.is_client_compatible`` to verify
  client-server compatibility
- feat: enable to avatarize without calculating metrics using
  ``client.jobs.create_avatarization_job``
- feat: add ``nb_dimensions`` property to ``Dataset``
- feat: add ``User`` object
- feat: use ``patch`` in ``client.datasets.create_dataset`` to patch
  dataset columns on upload
- feat: add ``correlation_protection_rate``, ``inference_continuous``,
  ``inference_categorical``, ``row_direct_match_protection`` and
  ``closest_rate`` privacy metrics
- feat: add ``known_variables``, ``target``,
  ``closest_rate_percentage_threshold``, and
  ``closest_rate_ratio_threshold`` to ``PrivacyMetricsParameters``
- docs: add multiple versions of the documentation
- feat: each user now belongs to an organization and gets a new field:
  ``organization_id``
- fix: fixed a bug where computing privacy metrics with distinct missing
  values was impossible

.. _section-27:

0.2.2
-----

- Improve type hints of the method
- Update tutorial notebooks with smaller datasets
- Fix bugs in tutorial notebooks
- Improve error message when the call to the API times out
- Add ``jobs.find_all_jobs_by_user``
- Add two new privacy metrics: ``direct_match_protection`` and
  ``categorical_hidden_rate``
- Add the ``DatetimeProcessor``

.. _section-28:

0.2.1
-----

- Fix to processor taking the wrong number of arguments
- Make the ``toolz`` package a mandatory dependency
- Fix a handling of a target variable equaling zero

.. _section-29:

0.2.0
-----

- Drop support for python3.8 # BREAKING CHANGE
- Drop ``jobs.get_job`` and ``job.create_job``. # BREAKING CHANGE
- Rename ``DatasetResponse`` to ``Dataset`` # BREAKING CHANGE
- Rename ``client.pandas`` to ``client.pandas_integration`` # BREAKING
  CHANGE
- Add separate endpoint to compute metrics separately using
  ``jobs.create_signal_metrics_job`` and
  ``jobs.create_privacy_metrics_job``.
- Add separate endpoint to access metrics jobs using
  ``jobs.get_signal_metrics`` and ``job.get_privacy_metrics``
- Add processors to pre- and post-process your data before, and after
  avatarization for custom use-cases. These are accessible under
  ``avatars.processors``.
- Handle errors more gracefully
- Add ExcludeCategoricalParameters to use embedded processor on the
  server side

.. _section-30:

0.1.16
------

- Add forgotten password endpoint
- Add reset password endpoint
- JobParameters becomes AvatarizationParameters
- Add DCR and NNDR to privacy metrics

.. _section-31:

0.1.15
------

- Handle category dtype
- Fix dtype casting of datetime columns
- Add ability to login with email
- Add filtering options to ``find_users``
- Avatarizations are now called with ``create_avatarization_job`` and
  ``AvatarizationJobCreate``. ``create_job`` and ``JobCreate`` are
  deprecated but still work.
- ``dataset_id`` is now passed to ``AvatarizationParameters`` and not
  ``AvatarizationJobCreate``.
- ``Job.dataset_id`` is deprecated. Use ``Job.parameters.dataset_id``
  instead.

.. _breaking-1:

BREAKING
~~~~~~~~

- Remove ``get_health_config`` call.

.. _section-32:

0.1.14
------

- Give access to avatars unshuffled avatars dataset

.. _section-33:

0.1.13
------

- Remove default value for ``to_categorical_threshold``
- Use ``logger.info`` instead of ``print``