Tutorial¶
This Python client communicates with the Avatar platform.
For more information about the Avatar method and process, check out our main docs at https://docs.octopize.io
Step 1: Setup and Authentication¶
First, import the necessary libraries and set up your authentication. If you have the correct credentials, you can start interacting with the Avatarization API right away.
import os
from avatars.manager import Manager
# Set up the URL and your credentials
url = os.environ.get("AVATAR_BASE_API_URL", "https://scaleway-prod.octopize.app/api")
username = os.environ.get("AVATAR_USERNAME")
password = os.environ.get("AVATAR_PASSWORD")
# Initialize the Manager and authenticate
manager = Manager(base_url=url)
manager.authenticate(username, password)
Make sure to replace username
and password
with your actual
login credentials or configure them in your environment variables.
Step 2: Uploading Your Data¶
Once authenticated, you can upload your data table to the server. In
this example, we’ll use a CSV file wbcd.csv
.
# Initialize the runner
runner = manager.create_runner()
# Add the table to the runner
runner.add_table("wbcd", "fixtures/wbcd.csv")
Here, the wbcd dataset is uploaded and is now ready for anonymization.
Step 3: Avatarization Parameters and Execution¶
Next, set the avatarization parameters and run the avatarization process:
# Set anonymization parameters
runner.set_parameters("wbcd", k=10) # Adjust k value for privacy level
# Run the anonymization pipeline
runner.run()
# Get all results after running the anonymization
results = runner.get_all_results()
Here, k=10
is the number of nearest neighbors used in the KNN-based
anonymization).
You can adjust this value based on your desired privacy level. If you
want to know more about anonymization parameters, please refer
here
Step 4: Retrieve Avatarized Data¶
Once the anonymization is complete, you can retrieve and inspect the avatarized data:
# Print a preview of the avatarized data
print("Avatar data:")
print(runner.shuffled("wbcd").head())
Step 5: Retrieve Privacy and Signal Metrics¶
Octopize provides privacy and signal metrics to evaluate the quality of the synthetic data produced:
# Print privacy metrics
for key, value in runner.privacy_metrics("wbcd").items():
print(f"{key}: {value}")
# Print signal metrics
for key, value in runner.signal_metrics("wbcd").items():
print(f"{key}: {value}")
Step 6: Download Avatarization Report¶
Once you found the optimal set of parameters for your use case, you can download a comprehensive report of the anonymization process. This report compiles the privacy and utility metrics obtained, providing evidence of both anonymity and the preservation of statistical properties.
# Download the anonymization report as a PDF
runner.download_report('my_report.pdf')
Once you’ve followed these steps, you’re ready to explore further and fine-tune your anonymization process!