ENGINEERING RIGOR

> CODE QUALITY // DATA INTEGRITY // INFRASTRUCTURE DISCIPLINE

QUALITY METRICS
143 dbt Tests
0 npm CVEs
6 Services
100% Pinned Deps
DATA INTEGRITY — dbt TEST SUITE

Every model in the pipeline is tested for correctness. 143 tests across 3 schema layers ensure that transformations produce valid, complete, referentially-intact data.

NOT_NULL

Every column that must be populated is tested — no silent NULLs propagating through the pipeline.

UNIQUE

All primary and surrogate keys verified unique — catches fan-out bugs before they corrupt aggregations.

RELATIONSHIPS

Every foreign key validated against its parent dimension — no orphaned fact rows, no broken joins.

ACCEPTED_VALUES

Enums constrained at the data layer — billing tiers, regions, HTTP codes, size bands all validated.

BUG CAUGHT BY TESTS — dim_companies FAN-OUT

During development, a SELECT DISTINCT company_name, industry query produced 6,457 rows instead of the expected ~4,900. Faker assigned multiple industries per company name, causing a fan-out that silently inflated the fact table from 500K to 640K rows.

The unique test on company_key caught this immediately. Fix: aggregate with mode() WITHIN GROUP to pick the most common industry per company.

dim_companies.sql — the fix
companies as (
    select
        company_name,
        -- Pick the most common industry per company to avoid fan-out
        mode() within group (order by industry) as industry
    from users
    group by company_name
)
-- Result: exactly one row per company, fact table stays at 500K
DEPENDENCY MANAGEMENT

Every dependency is pinned to an exact version. No ranges, no floating. Lock files are committed. Monthly audit cycle for CVEs.

Frontend (npm) 0 vulnerabilities

Astro 5.17+, Tailwind 4.2+, Mermaid 11.13.0 (upgraded from 11.6.0 to resolve 7 moderate CVEs), Prism 1.30.0

Data Pipeline (pip) Pinned

psycopg2-binary 2.9.10, pandas 2.2.3, numpy 2.2.3, faker 37.1.0, scipy 1.15.2

dbt (pip) Pinned

dbt-core 1.9.4, dbt-postgres 1.9.1, dbt_utils 1.3.0

Infrastructure (Docker) Pinned

PostgreSQL 16, Metabase v0.59.1, SonarQube 26.3.0, Streamlit 1.44.0

ACCESS CONTROL & ISOLATION
>
Read-only role portfolio_reader has SELECT-only access. INSERT/UPDATE/DELETE verified denied at load time.
>
Schema isolation — raw_staging, public_star, public_snowflake are separate schemas. OLTP and OLAP never share a schema.
>
No hardcoded credentials — all secrets via .env (gitignored). Docker services use environment variable injection.
>
Network isolation — Traefik reverse proxy as sole entry point. Databases and internal services on isolated Docker network. Only ports 80/443 exposed.
>
Path-based access control — Metabase admin UI blocked from external access. Only /public/ dashboard endpoints routed through Traefik.
>
Self-hosted fonts — no Google CDN requests. Zero external runtime dependencies. Verify in DevTools Network tab.
INFRASTRUCTURE DISCIPLINE

DOCKER HEALTHCHECKS

All 6 services have container-level healthchecks. pg_isready for databases, HTTP endpoints for apps. Compose dependency ordering via service_healthy conditions.

IDEMPOTENT OPERATIONS

Data loading uses DROP IF EXISTS + CREATE. dbt models are fully rerunnable. No manual state, no drift — tear down and rebuild in minutes.

STATIC ANALYSIS

SonarQube Community Edition 26.3.0 runs self-hosted. Quality gates enforce: zero critical issues, coverage thresholds, no unreviewed security hotspots, duplication limits.

TECHNOLOGY SOVEREIGNTY

Every component is open-source and self-hostable. No runtime dependency on big-tech hosted services. Full stack reproducible from docker compose up.

KNOWN LIMITATIONS

All known upstream dependency vulnerabilities are triaged, documented internally, and tracked with remediation timelines. Accepted risks are justified per the CVE response policy in the project's engineering standards.

PQC: This project uses no custom cryptographic operations. TLS at the transport layer is the only crypto dependency. Post-quantum readiness (ML-KEM, ML-DSA, SHA-3) is a documented architectural principle for future systems.