Models¶

These models are used for argument and return value validation. They are based on the pydantic package.

Definitions¶

class avatars.models.ApiKey[source]¶

Response model for an API key.

created_at: Annotated[AwareDatetime, Field(title='Created At')] [Required]¶

expires_at: Annotated[AwareDatetime, Field(title='Expires At')] [Required]¶

id: Annotated[UUID, Field(title='Id')] [Required]¶

last_used_at: Annotated[AwareDatetime | None, Field(title='Last Used At')] = None¶

name: Annotated[str, Field(title='Name')] [Required]¶

revoked_at: Annotated[AwareDatetime | None, Field(title='Revoked At')] = None¶

class avatars.models.ApiKeyWithPlaintext[source]¶

API Key response model that includes the secret for creation.

created_at: Annotated[AwareDatetime, Field(title='Created At')] [Required]¶

expires_at: Annotated[AwareDatetime, Field(title='Expires At')] [Required]¶

id: Annotated[UUID, Field(title='Id')] [Required]¶

last_used_at: Annotated[AwareDatetime | None, Field(title='Last Used At')] = None¶

name: Annotated[str, Field(title='Name')] [Required]¶

plaintext: Annotated[str, Field(title='Plaintext')] [Required]¶

revoked_at: Annotated[AwareDatetime | None, Field(title='Revoked At')] = None¶

class avatars.models.BulkDeleteRequest[source]¶

Request model for bulk job deletion.

job_names: Annotated[list[str], Field(max_length=100, title='Job Names')] [Required]¶

Constraints:

max_length = 100

class avatars.models.CompatibilityStatus(*values)[source]¶

compatible = 'compatible'¶

incompatible = 'incompatible'¶

unknown = 'unknown'¶

class avatars.models.ExpirationDays(*values)[source]¶

Expiration preset in days (choose from 30/60/120/365/3650)

integer_30 = 30¶

integer_60 = 60¶

integer_120 = 120¶

integer_365 = 365¶

integer_3650 = 3650¶

class avatars.models.CreateApiKeyRequest[source]¶

Request body for creating an API key.

expiration_days: Annotated[ExpirationDays, Field(description='Expiration preset in days (choose from 30/60/120/365/3650)', title='Expiration Days')] [Required]¶: Expiration preset in days (choose from 30/60/120/365/3650)

name: Annotated[str, Field(description='Human-readable label for the API key', max_length=255, min_length=1, title='Name')] [Required]¶

Human-readable label for the API key

Constraints:

min_length = 1
max_length = 255

class avatars.models.CreateApiKeyResponse[source]¶

Response for API key creation.

api_key: ApiKeyWithPlaintext [Required]¶

message: Annotated[str, Field(title='Message')] [Required]¶

class avatars.models.CreditsInfo[source]¶

credits: Annotated[int | None, Field(title='Credits')] [Required]¶

is_credit_enabled: Annotated[bool, Field(title='Is Credit Enabled')] [Required]¶

class avatars.models.EnvironmentInfo[source]¶

Resolved environment values for the current user.

dataset_expiration_days: Annotated[int, Field(description='Number of days before a dataset expires.', title='Dataset Expiration Days')] [Required]¶: Number of days before a dataset expires.

max_allowed_dimensions_per_dataset: Annotated[int, Field(description='Maximum number of dimensions (columns) allowed per dataset.', title='Max Allowed Dimensions Per Dataset')] [Required]¶: Maximum number of dimensions (columns) allowed per dataset.

max_allowed_lines_per_dataset: Annotated[int, Field(description='Maximum number of rows allowed per dataset.', title='Max Allowed Lines Per Dataset')] [Required]¶: Maximum number of rows allowed per dataset.

class avatars.models.EventLogResponse[source]¶

A single audit-trail entry visible to the user.

created_at: Annotated[AwareDatetime, Field(title='Created At')] [Required]¶

id: Annotated[UUID, Field(title='Id')] [Required]¶

object_id: Annotated[UUID | None, Field(title='Object Id')] = None¶

object_type: Annotated[str, Field(title='Object Type')] [Required]¶

verb: Annotated[str, Field(title='Verb')] [Required]¶

class avatars.models.FeatureScope(*values)[source]¶

avatar_parameters = 'avatar_parameters'¶

single_table = 'single_table'¶

multi_table = 'multi_table'¶

time_series = 'time_series'¶

report = 'report'¶

geolocalization = 'geolocalization'¶

privacy_assessment = 'privacy_assessment'¶

differential_privacy = 'differential_privacy'¶

class avatars.models.FeaturesInfo[source]¶

feature_scopes: Annotated[list[FeatureScope], Field(title='Feature Scopes')] [Required]¶

class avatars.models.FileCredentials[source]¶

access_key_id: Annotated[str, Field(title='Access Key Id')] [Required]¶

secret_access_key: Annotated[str, Field(title='Secret Access Key')] [Required]¶

class avatars.models.ForgottenPasswordRequest[source]¶

email: Annotated[str, Field(title='Email')] [Required]¶

class avatars.models.JobCreateRequest[source]¶

depends_on: Annotated[list[str] | None, Field(title='Depends On')] = []¶

parameters_name: Annotated[str, Field(title='Parameters Name')] [Required]¶

set_name: Annotated[UUID, Field(title='Set Name')] [Required]¶

class avatars.models.JobCreateResponse[source]¶

Location: Annotated[str, Field(title='Location')] [Required]¶

name: Annotated[str, Field(title='Name')] [Required]¶

class avatars.models.JobKind(*values)[source]¶

standard = 'standard'¶

privacy_metrics = 'privacy_metrics'¶

signal_metrics = 'signal_metrics'¶

report = 'report'¶

advice = 'advice'¶

class avatars.models.JobStatus(*values)[source]¶

Status of a job in its lifecycle.

Typical happy-path order:: QUEUED → CREATED → PENDING → FINISHED
Error paths:: PARENT_ERROR (a dependency job failed) ERROR (the job itself failed) LOST (worker disappeared) ORPHANED (worker lost contact before running)

DEFAULT (“”) is the initial value before any status is assigned.

queued = 'queued'¶

created = 'created'¶

orphaned = 'orphaned'¶

parent_error = 'parent_error'¶

error = 'error'¶

finished = 'finished'¶

field_ = ''¶

pending = 'pending'¶

lost = 'lost'¶

class avatars.models.LoginResponse[source]¶

access_token: Annotated[str, Field(title='Access Token')] [Required]¶

refresh_token: Annotated[str | None, Field(title='Refresh Token')] = None¶

token_type: Annotated[str, Field(title='Token Type')] [Required]¶

class avatars.models.ResetPasswordRequest[source]¶

email: Annotated[str, Field(title='Email')] [Required]¶

new_password: Annotated[str, Field(title='New Password')] [Required]¶

new_password_repeated: Annotated[str, Field(title='New Password Repeated')] [Required]¶

token: Annotated[UUID, Field(title='Token')] [Required]¶

class avatars.models.ResourceSetResponse[source]¶

display_name: Annotated[str, Field(title='Display Name')] [Required]¶

set_name: Annotated[UUID, Field(title='Set Name')] [Required]¶

class avatars.models.UserRole(*values)[source]¶

admin = 'admin'¶

user = 'user'¶

class avatars.models.ValidationError[source]¶

ctx: Annotated[dict[str, Any] | None, Field(title='Context')] = None¶

input: Annotated[Any | None, Field(title='Input')] = None¶

loc: Annotated[list[str | int], Field(title='Location')] [Required]¶

msg: Annotated[str, Field(title='Message')] [Required]¶

type: Annotated[str, Field(title='Error Type')] [Required]¶

class avatars.models.GrantType[source]¶

root: Annotated[str, Field(pattern='^password$', title='Grant Type')] [Required]¶

Constraints:

pattern = ^password$

class avatars.models.Login[source]¶

client_id: Annotated[str | None, Field(title='Client Id')] = None¶

client_secret: Annotated[SecretStr | None, Field(title='Client Secret')] = None¶

grant_type: Annotated[GrantType | None, Field(title='Grant Type')] = None¶

password: Annotated[SecretStr, Field(title='Password')] [Required]¶

scope: Annotated[str | None, Field(title='Scope')] = ''¶

username: Annotated[str, Field(title='Username')] [Required]¶

class avatars.models.CompatibilityResponse[source]¶

message: Annotated[str, Field(title='Message')] [Required]¶

most_recent_compatible_client: Annotated[str | None, Field(title='Most Recent Compatible Client')] = None¶

status: CompatibilityStatus [Required]¶

class avatars.models.CreateUser[source]¶

Create a user with an email.

email: Annotated[str, Field(title='Email')] [Required]¶

password: Annotated[str | None, Field(title='Password')] = None¶

role: UserRole | None = UserRole.user¶

class avatars.models.FileAccess[source]¶

credentials: FileCredentials [Required]¶

url: Annotated[str, Field(title='Url')] [Required]¶

class avatars.models.HTTPValidationError[source]¶

detail: Annotated[list[ValidationError] | None, Field(title='Detail')] = None¶

class avatars.models.JobResponse[source]¶

created_at: Annotated[AwareDatetime, Field(title='Created At')] [Required]¶

deleted_at: Annotated[AwareDatetime | None, Field(title='Deleted At')] = None¶

display_name: Annotated[str, Field(title='Display Name')] [Required]¶

done: Annotated[bool, Field(title='Done')] [Required]¶

exception: Annotated[str, Field(title='Exception')] [Required]¶

kind: JobKind [Required]¶

name: Annotated[str, Field(title='Name')] [Required]¶

parameters_name: Annotated[str, Field(title='Parameters Name')] [Required]¶

progress: Annotated[float | None, Field(title='Progress')] [Required]¶

set_name: Annotated[UUID, Field(title='Set Name')] [Required]¶

status: JobStatus [Required]¶

class avatars.models.JobResponseList[source]¶

jobs: Annotated[list[JobResponse], Field(title='Jobs')] [Required]¶

class avatars.models.MeUser[source]¶

email: Annotated[str, Field(title='Email')] [Required]¶

environment: EnvironmentInfo [Required]¶

id: Annotated[UUID, Field(title='Id')] [Required]¶

organization_id: Annotated[UUID, Field(title='Organization Id')] [Required]¶

role: UserRole | None = UserRole.user¶

class avatars.models.User[source]¶

email: Annotated[str, Field(title='Email')] [Required]¶

id: Annotated[UUID, Field(title='Id')] [Required]¶

organization_id: Annotated[UUID, Field(title='Organization Id')] [Required]¶

role: UserRole | None = UserRole.user¶

class avatars.models.BulkDeleteResponse[source]¶

Response model for bulk job deletion.

deleted_jobs: Annotated[list[JobResponse], Field(title='Deleted Jobs')] [Required]¶

failed_jobs: Annotated[list[str], Field(title='Failed Jobs')] [Required]¶

class avatars.models.Processor(*args, **kwargs)[source]¶

preprocess(df: DataFrame) → DataFrame[source]¶

postprocess(source: DataFrame, dest: DataFrame) → DataFrame[source]¶

class avatar_yaml.models.parameters.AlignmentMethod(*values)[source]¶

Bases: str, Enum

SPECIFIED = 'specified'¶

MAX = 'max'¶

MIN = 'min'¶

MEAN = 'mean'¶

class avatar_yaml.models.parameters.ExcludeVariablesMethod(*values)[source]¶

Bases: str, Enum

The method to exclude column.

ROW_ORDER = 'row_order'¶: SENSITIVE The excluded column will be linked to the original row order. This is a violation of privacy.

COORDINATE_SIMILARITY = 'coordinate_similarity'¶: The excluded column will be linked by individual similarity.

class avatar_yaml.models.parameters.ImputeMethod(*values)[source]¶

Bases: str, Enum

KNN = 'knn'¶

MODE = 'mode'¶

MEDIAN = 'median'¶

MEAN = 'mean'¶

FAST_KNN = 'fast_knn'¶

class avatar_yaml.models.parameters.ProjectionType(*values)[source]¶

Bases: str, Enum

FPCA = 'fpca'¶

FLATTEN = 'flatten'¶

class avatar_yaml.models.schema.ColumnType(*values)[source]¶

Bases: StrEnum

INT = 'int'¶

BOOL = 'bool'¶

CATEGORY = 'category'¶

NUMERIC = 'float'¶

DATETIME = 'datetime'¶

DATETIME_TZ = 'datetime_tz'¶

class avatar_yaml.models.schema.LinkMethod(*values)[source]¶

Bases: StrEnum

Available assignment methods to link a child to its parent table after the anonymization.

LINEAR_SUM_ASSIGNMENT = 'linear_sum_assignment'¶: Assign using the linear sum assignment algorithm. This method is a good privacy and utility trade-off. The algorithm consumes lots of resources.

MINIMUM_DISTANCE_ASSIGNMENT = 'minimum_distance_assignment'¶: Assign using the minimum distance assignment algorithm. This method assigns the closest child to the parent. It is an acceptable privacy and utility trade-off. This algorithm consumes less resources than the linear sum assignment.

SENSITIVE_ORIGINAL_ORDER_ASSIGNMENT = 'sensitive_original_order_assignment'¶: Assign the child to the parent using the original order. WARNING!!! This method is a HIGH PRIVACY BREACH as it keeps the original order to assign the child to the parent. This method isn’t recommended for privacy reasons but consumes less resources than the other methods.

TIME_SERIES = 'time_series'¶: Specific assignment method for time series data. It is used to link time series data to the parent table.

class avatars.constants.PlotKind(*values)[source]¶

Bases: StrEnum

Available plot types for visualization.

CORRELATION = 'correlation'¶: A correlation heatmap of the original and avatar data.

CORRELATION_DIFFERENCE = 'correlation_difference'¶: A heatmap of the difference between the original and avatar data.

CONTRIBUTION = 'contribution'¶: A bar chart showing the contribution of each feature in the model.

PROJECTION_2D = '2d_projection'¶: A 2D projection of the original and avatar data.

PROJECTION_3D = '3d_projection'¶: A 3D projection of the original and avatar data.

DISTRIBUTION = 'distribution'¶: Distributions plot of the original and avatar data, there is a plot for each column.

AGGREGATE_STATS = 'aggregate_stats'¶: A table containing the mean and std of the original and avatar data (of the 10 first columns).

RAW_SERIES = 'raw_series'¶: A line plot of the original and avatar time series over time.

NORMALIZED_SERIES = 'normalized_series'¶: A line plot of the normalized original and avatar time series over time.

CLASS_PROJECTION_2D = 'class_projection_2d'¶: A 2D projection colored by the target class (only available with class balancing augmentation).

METRICS_SUMMARY = 'metrics_summary'¶: A summary table of privacy metrics.

class avatars.constants.Results(*values)[source]¶

Bases: StrEnum

ADVICE = 'advice'¶

SHUFFLED = 'shuffled'¶

UNSHUFFLED = 'unshuffled'¶

PRIVACY_METRICS = 'privacy_metrics'¶

SIGNAL_METRICS = 'signal_metrics'¶

REPORT_IMAGES = 'report_images'¶

PROJECTIONS_ORIGINAL = 'original_projections'¶

PROJECTIONS_AVATARS = 'avatar_projections'¶

METADATA = 'run_metadata'¶

REPORT = 'report'¶

META_PRIVACY_METRIC = 'meta_privacy_metric'¶

META_SIGNAL_METRIC = 'meta_signal_metric'¶

FIGURES = 'figures'¶

FIGURES_METADATA = 'figures_metadata'¶

PRIVACY_METRICS_SUMMARY = 'privacy_metrics_summary'¶

SIGNAL_METRICS_SUMMARY = 'signal_metrics_summary'¶

class avatar_yaml.models.parameters.AugmentationStrategy(*values)[source]¶

Bases: StrEnum

minority = 'minority'¶

not_majority = 'not_majority'¶

class avatar_yaml.models.parameters.AvatarizationProcessorParameters[source]¶

Bases: object

Base class for all avatarization processor parameters.

Subclass this to define parameters for a specific server-side processor that runs within the avatarization pipeline.

class avatar_yaml.models.parameters.InterRecordRangeDifferenceParameters(id_variable: str, target_start_variable: str, target_end_variable: str)[source]¶

Bases: AvatarizationProcessorParameters

Parameters for inter-record range difference processor.

The processor transforms start/end column pairs into internal representation. This can lead to better semantic avatarization. The transformation is transparent to the user - input and output have the same column structure at the end.

Records are automatically sorted by the target_start_variable for processing.

id_variable: str¶

target_start_variable: str¶

target_end_variable: str¶

class avatar_yaml.models.parameters.RelativeDifferenceParameters(target: str, references: List[str], scaling_unit: int | None = None)[source]¶

Bases: AvatarizationProcessorParameters

Parameters for relative difference processor.

The processor transforms a numeric variable into a difference relative to the sum of other variables. This can lead to better mathematical relation retention between correlated variables. The transformation is transparent to the user - input and output have the same column structure at the end.

Parameters:

target (str) – variable to transform
references (List[str]) – the variables of reference

Keyword Arguments:

scaling_unit (int | None) – divide difference by factor to handle unit variation. Eg. if scaling_unit=1000, a difference in meters will be expressed in kilometers.

target: str¶

references: List[str]¶

scaling_unit: int | None = None¶

class avatar_yaml.models.avatar_metadata.SensitivityLevel(*values)[source]¶

Bases: str, Enum

Evaluation of the sensitivity level of the personal data being processed.

This assessment is based on factors such as the nature of the data and potential risks to data subjects. It applies to three categories of data:

Sensitive personal data (GDPR Art. 9): Special categories including health, racial/ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, sex life, or sexual orientation. These typically require VERY_HIGH or HIGH sensitivity levels.
Personal data (GDPR Art. 4): Any information relating to an identified or identifiable natural person (e.g., name, identification number, location data, online identifiers). Sensitivity level varies based on context and combination with other data.
Demographic data: Non-sensitive characteristics such as age, gender, geographic location, education level. These are typically LOW to MEDIUM sensitivity, but can increase when combined with other identifying information.

The sensitivity level should reflect potential harm to data subjects if the data were compromised or re-identified.

VERY_HIGH = 'Very High'¶

HIGH = 'High'¶

MEDIUM = 'Medium'¶

LOW = 'Low'¶

VERY_LOW = 'Very Low'¶

NEGLIGIBLE = 'Negligible'¶

UNDEFINED = 'Undefined'¶

class avatar_yaml.models.avatar_metadata.DataType(*values)[source]¶

Bases: str, Enum

Categories of personal data being processed, based on the context and sector of the data processing activity.

UNKNOWN = 'unknown'¶: The processing involves personal data of an unspecified type. The exact nature of the data has not been determined or categorized at this stage.

HEALTH = 'health'¶: The data processed originate from health-related datasets containing information on patients or study participants. These datasets typically include demographic, clinical, and behavioural variables, such as age, gender, diagnosis codes, treatment details, medical outcomes, and follow-up data.

HR = 'hr'¶: The personal data processed concern employees, job applicants, contractors, or trainees. The datasets generally include professional information such as identification data, employment history, remuneration details, performance evaluations, and training records. Certain datasets may also include information relating to health or diversity monitoring.

MOBILITY = 'mobility'¶: The personal data processed typically relate to users of transport systems, vehicle operators, or mobility service subscribers. These datasets may include identifiers, geolocation traces, timestamps, usage frequency, travel routes, and behavioural metrics. Depending on the context, they may also contain information derived from connected vehicles or smart ticketing systems.

INSURANCE = 'insurance'¶: The personal data processed typically relate to policyholders, beneficiaries, or claimants. The datasets may include demographic characteristics, contract details, claim histories, financial indicators, and, in some cases, health-related information.

FINANCE = 'finance'¶: The personal data processed concern clients, investors, account holders, or financial service users. Typical datasets may include identification data, transaction histories, account balances, income levels, credit ratings, and investment portfolios.In certain contexts, they may also contain data classified as sensitive, such as information revealing financial hardship or vulnerability.

EDUCATION = 'education'¶: The personal data processed relate to students, teachers, or administrative staff within educational institutions. The datasets may include demographic information, academic performance records, attendance logs, course enrolments, and, where relevant, special educational needs or socio-economic indicators.

class avatar_yaml.models.avatar_metadata.DataSubject(*values)[source]¶

Bases: str, Enum

Categories of individuals whose personal data are being processed, based on the context and purpose of the data processing activity.

UNKNOWN = 'unknown'¶

PATIENTS = 'patients'¶: The data subjects are patients whose personal data are processed in the context of medical research, healthcare provision, or clinical trials. Such data may include information directly or indirectly identifying individuals, together with health-related or demographic variables.

EMPLOYEES = 'employees'¶: The data subjects are employees, job applicants, or contractors whose personal data are processed for human resources management, organisational analysis, or workforce studies. Such data may encompass professional identifiers, career trajectories, remuneration details, performance indicators, and training records.

CLIENTS = 'clients'¶: The data subjects are clients, customers, or insured persons whose personal data are processed for the purposes of service provision, product analysis, or contractual performance. These data may include identifying information, transaction or claim histories, contact details, and, in some contexts, financial or health-related information.

USERS = 'users'¶: The data subjects are users of digital, public, or mobility services whose personal data are processed for analytical, operational, or optimisation purposes. The data may include identifiers, behavioural indicators, service usage patterns, or geolocation data.

STUDENTS = 'students'¶: The data subjects are students enrolled in educational institutions whose personal data are processed for pedagogical, administrative, or research purposes. The datasets may include demographic information, academic performance, attendance records, or socio-economic indicators.

class avatar_yaml.models.avatar_metadata.DataRecipient(*values)[source]¶

Bases: str, Enum

Categories of recipients for the anonymised data, based on their relationship to the Data Controller and the context of data sharing.

UNKNOWN = 'unknown'¶: The recipients of the anonymized data have not been specifically identified at this stage. The data recipient category will need to be determined to properly assess the privacy risks associated with data sharing and ensure appropriate safeguards are in place.

OPENDATA = 'opendata'¶: The recipients of the anonymised data are the general public, through publication in an open data repository or public research platform. Such dissemination aims to promote scientific collaboration, innovation, or public transparency. To guarantee full compliance with data protection requirements, the datasets released as open data have undergone an anonymisation process.

CONTRACTUAL_THIRDPARTY = 'contractual_thirdparty'¶: The recipients of the anonymised data are third parties with whom the Data Controller maintains a contractual relationship, such as research partners, insurers, data analytics firms, or other commercial entities. These transfers occur within a controlled legal framework ensuring compliance with the principles of confidentiality, data minimisation, and purpose limitation

INTERNAL = 'internal'¶: The recipients of the anonymised data are exclusively internal stakeholders of the Data Controller, such as authorised employees, researchers, or analysts operating within the same organisation. The synthetic datasets are used for internal analytical, research, or operational purposes, in strict compliance with the principles of data protection by design and by default

OUTSIDE_EU = 'outside_eu'¶: The recipients of the anonymised data are entities established outside the European Union, including international research institutions or commercial partners.

TRUSTED_THIRDPARTY = 'trusted_thirdparty'¶: The recipients of the anonymised data are trusted third parties operating under a contractual or institutional framework that ensures compliance with data protection and ethical standards. These may include subcontractors providing technical services, scientific publishers, or data repositories managing peer-reviewed research outputs. The sharing of anonymised datasets with such entities is governed by confidentiality agreements and data processing clauses that explicitly prohibit any attempt at re-identification.

class avatar_yaml.models.schema.PiiType(*values)[source]¶

Bases: StrEnum

Category of personally identifiable information (PII) to generate.

Used with FakeDataStrategy to select the kind of realistic fake data to produce. The generated values are locale-aware (default en_US).

EMAIL = 'EMAIL'¶: A syntactically valid email address (e.g. john.doe@example.com).

FIRST_NAME = 'FIRST_NAME'¶: A given/first name (e.g. Alice).

LAST_NAME = 'LAST_NAME'¶: A family/last name (e.g. Smith).

FULL_NAME = 'FULL_NAME'¶: A full personal name combining first and last name (e.g. Alice Smith).

PHONE = 'PHONE'¶: A phone number in a locale-appropriate format (e.g. +1-800-555-0100).

SSN = 'SSN'¶: A Social Security Number formatted string (e.g. 123-45-6789). Use only when the source data contains SSN-like identifiers.

ADDRESS = 'ADDRESS'¶: A multi-line postal address (e.g. 123 Main St, Springfield, IL 62701).

FREE_TEXT = 'FREE_TEXT'¶: A paragraph of random lorem-ipsum-style text. Use for unstructured text columns (notes, comments, descriptions) that must be replaced wholesale.

class avatar_yaml.models.schema.FakeDataStrategy(pii_type: PiiType, consistent: bool = True, high_variability: bool = False, kind: Literal['FAKER'] = 'FAKER')[source]¶

Bases: object

Replace PII values with realistic fake data of the same type.

The generated values are locale-aware and structurally valid (e.g. an email replacement is a real-looking email address). This strategy is ideal when you need pseudonymized data that still looks natural to downstream consumers.

Example Python:

FakeDataStrategy(pii_type=PiiType.EMAIL)

pii_type¶

The category of PII to generate. See PiiType for the full list of supported types.

Type:: PiiType

consistent¶

When True (default), the same source value always maps to the same fake value within a pipeline run — i.e. if alice@corp.com appears on three rows it will be replaced with the same fake email on all three rows. Set to False to generate a fresh independent value for every row (useful when uniqueness matters more than cross-row consistency).

Type:: bool

high_variability¶

When False (default), a pool of pre-generated fake values is sampled for each row (~70× faster). Because rows draw from that shared pool, distinct source values can collide and receive the same fake value. Set to True for fully independent per-row generation — slower but collision-free.

Type:: bool

pii_type: PiiType¶

consistent: bool = True¶

high_variability: bool = False¶

kind: Literal['FAKER'] = 'FAKER'¶

class avatar_yaml.models.schema.HashSha256Strategy(kind: Literal['HASH_SHA256'] = 'HASH_SHA256')[source]¶

Bases: object

Replace each value with its SHA-256 hex digest (deterministic, one-way).

The hash is irreversible and consistent: the same source value always produces the same 64-character hex string, across runs and pipeline instances. No consistent flag is needed — SHA-256 is inherently deterministic.

Note

If the source column contains low-entropy values (e.g. integers or short codes), a brute-force dictionary attack on the hash is feasible. Prefer Uuid4Strategy or FakeDataStrategy in those cases.

Example Python:

HashSha256Strategy()

kind: Literal['HASH_SHA256'] = 'HASH_SHA256'¶

class avatar_yaml.models.schema.Uuid4Strategy(consistent: bool = True, kind: Literal['UUID4'] = 'UUID4')[source]¶

Bases: object

Replace each unique value with a randomly generated UUID (version 4).

UUIDs are opaque, globally unique, and carry no information about the original value. Use this strategy for primary/foreign keys or any identifier where structural realism is not required.

Example Python:

Uuid4Strategy()
Uuid4Strategy(consistent=False)  # fresh UUID per row

consistent¶

When True (default), the same source value always maps to the same UUID within a run, preserving referential integrity across tables. Set to False to generate a new UUID for every row independently.

Type:: bool

consistent: bool = True¶

kind: Literal['UUID4'] = 'UUID4'¶

class avatar_yaml.models.schema.ConstantStrategy(value: str, kind: Literal['CONSTANT'] = 'CONSTANT')[source]¶

Bases: object

Replace every value in the column with a single fixed string.

All rows receive the same replacement value regardless of their original content. This is the simplest strategy and is useful when the column must be fully suppressed or redacted.

Example Python:

ConstantStrategy(value="REDACTED")
ConstantStrategy(value="***")

value¶

The string that replaces every value in the column.

Type:: str

value: str¶

kind: Literal['CONSTANT'] = 'CONSTANT'¶

class avatar_yaml.models.schema.IntegerStrategy(consistent: bool = True, kind: Literal['INTEGER'] = 'INTEGER')[source]¶

Bases: object

Map each unique source value to a unique pseudonymous integer.

The mapping is randomized (not sequential) so that the original sort order is not preserved. Useful for numeric identifiers (e.g. customer IDs, account numbers) when downstream code expects an integer type.

Example Python:

IntegerStrategy()
IntegerStrategy(consistent=False)  # independent integer per row

consistent¶

When True (default), the same source value always maps to the same integer within a run, preserving referential integrity. Set to False to assign an independent integer to every row.

Type:: bool

consistent: bool = True¶

kind: Literal['INTEGER'] = 'INTEGER'¶

class avatar_yaml.models.schema.SpecificIdLetterCase(*values)[source]¶

Bases: StrEnum

Controls the case of letters produced by the ? placeholder in SpecificIdStrategy patterns.

UPPER = 'upper'¶: All generated letters are upper-case (e.g. A, B, Z).

LOWER = 'lower'¶: All generated letters are lower-case (e.g. a, b, z).

BOTH = 'both'¶: Letters are drawn from the full mixed-case alphabet (default).

class avatar_yaml.models.schema.SpecificIdStrategy(pattern: str, letter_case: SpecificIdLetterCase = SpecificIdLetterCase.BOTH, consistent: bool = True, kind: Literal['SPECIFIC_ID'] = 'SPECIFIC_ID')[source]¶

Bases: object

Generate structured identifiers from a user-defined pattern.

Patterns combine literal characters with placeholders and optional references to other (already-pseudonymized) columns.

Placeholders

Prefix any placeholder with a single backslash in the pattern to include it literally (e.g. \# outputs #, not a digit; in a Python string literal write "\#").

Example Python:

SpecificIdStrategy(pattern="EMP-####")
SpecificIdStrategy(pattern="{{department}}-???##", letter_case=SpecificIdLetterCase.UPPER)
SpecificIdStrategy(pattern="USR-^^^^", consistent=False)

pattern¶

The format string defining the structure of the generated ID. Must be a non-empty string. Combine literal text, placeholders (?, #, ^), and column references ({{col_name}}).

Type:: str

letter_case¶

Controls the letter case for ? placeholders. Defaults to SpecificIdLetterCase.BOTH (mixed case).

Type:: SpecificIdLetterCase

consistent¶

When True (default), the same source value always maps to the same generated ID within a run. Set to False to produce a fresh ID for every row.

Type:: bool

pattern: str¶

letter_case: SpecificIdLetterCase = 'both'¶

consistent: bool = True¶

kind: Literal['SPECIFIC_ID'] = 'SPECIFIC_ID'¶