PRIVACY ARCHITECTURE

> DIMENSIONAL MODEL // DATA CLASSIFICATION // SCHEMA DESIGN

PRIVACY BY DESIGN

The schema implements privacy by design at the data layer. Every column carries a data_classification meta tag (PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED) enforced by a custom dbt test.

PII fields are hashed via SHA-256 in the mart layer using a reusable hash_pii() macro — email, full name, and IP address are never stored in cleartext beyond staging. Age is generalized to bands (e.g., 25_34) instead of exact values.

DATA CLASSIFICATION MATRIX

PUBLIC

Content metadata, genre info, aggregated metrics

No access restrictions

INTERNAL

Surrogate keys, timestamps, device types, hashed PII

Internal use only

CONFIDENTIAL

Email, name, IP, viewing history, age, user_id

Requires consent basis

RESTRICTED

Payment records, support transcripts with PII

Maximum protection required

JURISDICTION COVERAGE — 20 GLOBAL REGIMES
GDPR — DE, FR, ES, IT, NL, PL
UK-GDPR — GB
CCPA / VCDPA / CPA — US-CA, US-VA, US-CO
LGPD — BR
PIPA / APPI — KR, JP
PDPA / DPDPA / POPIA — SG, TH, IN, ZA

Each jurisdiction has distinct consent models (GDPR: opt-in, CCPA: opt-out, LGPD: legitimate interest), different DSAR deadlines (10-45 days), and varying minor age thresholds (10-18 years).

ERD — PRIVACY ENGINE DIMENSIONAL MODEL
erDiagram
    fct_consent_events ||--o{ dim_users_privacy : "user_key"
    fct_consent_events ||--o{ dim_jurisdictions : "jurisdiction_key"
    fct_consent_events ||--o{ dim_consent_purposes : "purpose_key"
    fct_consent_events ||--o{ dim_date_privacy : "date_key"

    fct_dsar_requests ||--o{ dim_users_privacy : "user_key"
    fct_dsar_requests ||--o{ dim_jurisdictions : "jurisdiction_key"
    fct_dsar_requests ||--o{ dim_date_privacy : "date_key"

    fct_retention_actions ||--o{ dim_jurisdictions : "jurisdiction_key"
    fct_retention_actions ||--o{ dim_data_categories : "category_key"

    dim_users_privacy {
        string user_key PK
        string user_id UK
        string hashed_email
        string hashed_name
        string hashed_ip
        string jurisdiction_code
        string age_band
        boolean is_minor
        date signup_date
    }

    dim_jurisdictions {
        string jurisdiction_key PK
        string jurisdiction_code UK
        string jurisdiction_name
        string privacy_regime
        int dsar_sla_days
        int minor_age_threshold
        boolean data_residency_requirement
    }

    dim_consent_purposes {
        string purpose_key PK
        string purpose_name UK
        boolean is_required_for_service
        string legal_basis
    }

    dim_data_categories {
        string category_key PK
        string category_name UK
        string classification_level
    }

    fct_consent_events {
        string consent_event_id PK
        string user_key FK
        string jurisdiction_key FK
        string purpose_key FK
        int date_key FK
        string consent_action
        string source
        timestamp event_timestamp
    }

    fct_dsar_requests {
        string dsar_id PK
        string user_key FK
        string jurisdiction_key FK
        int date_key_submitted FK
        string request_type
        string status
        int sla_days
        int days_elapsed
        boolean is_overdue
    }

    fct_retention_actions {
        string action_id PK
        string policy_id
        string category_key FK
        string jurisdiction_key FK
        string action_type
        int records_affected
        int retention_period_days
    }
dbt — hash_pii.sql (macro)
{%- macro hash_pii(column_name) -%}
{# SHA-256 hashing for pseudonymization.
   Produces consistent, non-reversible identifier.
   Used in mart layer — staging retains raw PII. #}
encode(sha256({{ column_name }}::bytea), 'hex')
{%- endmacro -%}

-- Usage in dim_users_privacy.sql:
select
    {{ hash_pii('email') }} as hashed_email,
    {{ hash_pii('full_name') }} as hashed_name,
    {{ hash_pii('ip_address') }} as hashed_ip,
    -- Age generalized to band (not exact value)
    case
        when age < 18 then 'under_18'
        when age < 25 then '18_24'
        ...
    end as age_band
dbt — suppress_small_groups.sql (k-anonymity)
{%- macro suppress_small_groups(group_cols, count_col, threshold=5) -%}
{# k-anonymity enforcement.
   Flags groups with fewer than threshold members
   to prevent identification of individuals in small cohorts. #}
case
    when {{ count_col }} < {{ threshold }} then true
    else false
end as is_suppressed
{%- endmacro -%}

-- Usage in mart_privacy_safe_viewing.sql:
select
    jurisdiction_code, genre, view_month,
    count(distinct user_id) as user_count,
    {{ suppress_small_groups(['jurisdiction_code', 'genre'],
                              'user_count', 5) }}
from aggregated
-- Groups with < 5 users are flagged for suppression
ACCESS CONTROL — LEAST PRIVILEGE
>
privacy_reader — SELECT on privacy_mart and privacy_analytics only. Cannot read privacy_staging (raw PII).
>
portfolio_reader — SELECT on all privacy schemas. Used by Metabase dashboards.
>
Schema isolation — privacy_staging, privacy_mart, privacy_analytics are separate schemas. Raw PII never leaks to analytical consumers.