Tutorial¶
This Python client communicates with the Avatar platform.
For more information about the Avatar method and process, check out our main docs at https://docs.octopize.io
Step 1: Setup and Authentication¶
First, import the necessary libraries and set up your authentication.
Option A: API Key Authentication (Recommended)¶
import os
from avatars.manager import Manager
# Initialize the Manager with an API key
manager = Manager(api_key=os.environ.get("AVATAR_API_KEY"))
Option B: Username/Password Authentication¶
import os
from avatars.manager import Manager
# Set up the URL and your credentials
username = os.environ.get("AVATAR_USERNAME")
password = os.environ.get("AVATAR_PASSWORD")
# Initialize the Manager and authenticate
manager = Manager() # or manager = Manager(base_url="https://your-server.com")
manager.authenticate(username, password)
Make sure to configure your credentials in your environment variables.
Step 2: Uploading Your Data¶
Once authenticated, you can upload your data table to the server. In
this example, we’ll use a CSV file wbcd.csv.
# Initialize the runner
runner = manager.create_runner("test_wbcd")
# Add the table to the runner
runner.add_table("wbcd", "fixtures/wbcd.csv")
Here, the wbcd dataset is uploaded and is now ready for anonymization.
Step 3: Avatarization Parameters and Execution¶
Next, set the avatarization parameters and run the avatarization process:
# Set anonymization parameters
runner.set_parameters("wbcd", k=10) # Adjust k value for privacy level
# Run the anonymization pipeline
runner.run()
# Get all results after running the anonymization
results = runner.get_all_results()
Here, k=10 is the number of nearest neighbors used in the KNN-based
anonymization).
You can adjust this value based on your desired privacy level. If you
want to know more about anonymization parameters, please refer to the
user
guide
Step 4: Retrieve Avatarized Data¶
Once the anonymization is complete, you can retrieve and inspect the avatarized data:
# Print a preview of the avatarized data
print("Avatar data:")
print(runner.shuffled("wbcd").head())
Step 5: Retrieve Privacy and Signal Metrics¶
Octopize provides privacy and signal metrics to evaluate the quality of the synthetic data produced:
# Print privacy metrics
for key, value in runner.privacy_metrics("wbcd")[0].items():
print(f"{key}: {value}")
# Print signal metrics
for key, value in runner.signal_metrics("wbcd")[0].items():
print(f"{key}: {value}")
Step 6: Download Avatarization Report¶
Once you found the optimal set of parameters for your use case, you can download a comprehensive report of the anonymization process. This report compiles the privacy and utility metrics obtained, providing evidence of both anonymity and the preservation of statistical properties.
# Download the anonymization report as a PDF
runner.download_report('my_report.pdf')
Once you’ve followed these steps, you’re ready to explore further and fine-tune your anonymization process!
Further Resources¶
You can explore more features of the Avatar solution by following our notebook tutorials.