PRIVACY ARCHITECTURE
> DIMENSIONAL MODEL // DATA CLASSIFICATION // SCHEMA DESIGN
The schema implements privacy by design at the data layer.
Every column carries a data_classification meta tag
(PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED) enforced by a custom dbt test.
PII fields are hashed via SHA-256 in the mart layer
using a reusable hash_pii() macro — email,
full name, and IP address are never stored in cleartext beyond staging. Age is generalized
to bands (e.g., 25_34) instead of exact values.
PUBLIC
Content metadata, genre info, aggregated metrics
No access restrictions
INTERNAL
Surrogate keys, timestamps, device types, hashed PII
Internal use only
CONFIDENTIAL
Email, name, IP, viewing history, age, user_id
Requires consent basis
RESTRICTED
Payment records, support transcripts with PII
Maximum protection required
Each jurisdiction has distinct consent models (GDPR: opt-in, CCPA: opt-out, LGPD: legitimate interest), different DSAR deadlines (10-45 days), and varying minor age thresholds (10-18 years).
erDiagram
fct_consent_events ||--o{ dim_users_privacy : "user_key"
fct_consent_events ||--o{ dim_jurisdictions : "jurisdiction_key"
fct_consent_events ||--o{ dim_consent_purposes : "purpose_key"
fct_consent_events ||--o{ dim_date_privacy : "date_key"
fct_dsar_requests ||--o{ dim_users_privacy : "user_key"
fct_dsar_requests ||--o{ dim_jurisdictions : "jurisdiction_key"
fct_dsar_requests ||--o{ dim_date_privacy : "date_key"
fct_retention_actions ||--o{ dim_jurisdictions : "jurisdiction_key"
fct_retention_actions ||--o{ dim_data_categories : "category_key"
dim_users_privacy {
string user_key PK
string user_id UK
string hashed_email
string hashed_name
string hashed_ip
string jurisdiction_code
string age_band
boolean is_minor
date signup_date
}
dim_jurisdictions {
string jurisdiction_key PK
string jurisdiction_code UK
string jurisdiction_name
string privacy_regime
int dsar_sla_days
int minor_age_threshold
boolean data_residency_requirement
}
dim_consent_purposes {
string purpose_key PK
string purpose_name UK
boolean is_required_for_service
string legal_basis
}
dim_data_categories {
string category_key PK
string category_name UK
string classification_level
}
fct_consent_events {
string consent_event_id PK
string user_key FK
string jurisdiction_key FK
string purpose_key FK
int date_key FK
string consent_action
string source
timestamp event_timestamp
}
fct_dsar_requests {
string dsar_id PK
string user_key FK
string jurisdiction_key FK
int date_key_submitted FK
string request_type
string status
int sla_days
int days_elapsed
boolean is_overdue
}
fct_retention_actions {
string action_id PK
string policy_id
string category_key FK
string jurisdiction_key FK
string action_type
int records_affected
int retention_period_days
} {%- macro hash_pii(column_name) -%}
{# SHA-256 hashing for pseudonymization.
Produces consistent, non-reversible identifier.
Used in mart layer — staging retains raw PII. #}
encode(sha256({{ column_name }}::bytea), 'hex')
{%- endmacro -%}
-- Usage in dim_users_privacy.sql:
select
{{ hash_pii('email') }} as hashed_email,
{{ hash_pii('full_name') }} as hashed_name,
{{ hash_pii('ip_address') }} as hashed_ip,
-- Age generalized to band (not exact value)
case
when age < 18 then 'under_18'
when age < 25 then '18_24'
...
end as age_band {%- macro suppress_small_groups(group_cols, count_col, threshold=5) -%}
{# k-anonymity enforcement.
Flags groups with fewer than threshold members
to prevent identification of individuals in small cohorts. #}
case
when {{ count_col }} < {{ threshold }} then true
else false
end as is_suppressed
{%- endmacro -%}
-- Usage in mart_privacy_safe_viewing.sql:
select
jurisdiction_code, genre, view_month,
count(distinct user_id) as user_count,
{{ suppress_small_groups(['jurisdiction_code', 'genre'],
'user_count', 5) }}
from aggregated
-- Groups with < 5 users are flagged for suppression