ENGINEERING RIGOR
> CODE QUALITY // DATA INTEGRITY // INFRASTRUCTURE DISCIPLINE
Every model in the pipeline is tested for correctness. 143 tests across 3 schema layers ensure that transformations produce valid, complete, referentially-intact data.
NOT_NULL
Every column that must be populated is tested — no silent NULLs propagating through the pipeline.
UNIQUE
All primary and surrogate keys verified unique — catches fan-out bugs before they corrupt aggregations.
RELATIONSHIPS
Every foreign key validated against its parent dimension — no orphaned fact rows, no broken joins.
ACCEPTED_VALUES
Enums constrained at the data layer — billing tiers, regions, HTTP codes, size bands all validated.
During development, a SELECT DISTINCT company_name, industry
query produced 6,457 rows instead of the expected ~4,900.
Faker assigned multiple industries per company name, causing a fan-out that silently inflated
the fact table from 500K to 640K rows.
The unique test on company_key
caught this immediately. Fix: aggregate with mode() WITHIN GROUP
to pick the most common industry per company.
companies as (
select
company_name,
-- Pick the most common industry per company to avoid fan-out
mode() within group (order by industry) as industry
from users
group by company_name
)
-- Result: exactly one row per company, fact table stays at 500K Every dependency is pinned to an exact version. No ranges, no floating. Lock files are committed. Monthly audit cycle for CVEs.
Astro 5.17+, Tailwind 4.2+, Mermaid 11.13.0 (upgraded from 11.6.0 to resolve 7 moderate CVEs), Prism 1.30.0
psycopg2-binary 2.9.10, pandas 2.2.3, numpy 2.2.3, faker 37.1.0, scipy 1.15.2
dbt-core 1.9.4, dbt-postgres 1.9.1, dbt_utils 1.3.0
PostgreSQL 16, Metabase v0.59.1, SonarQube 26.3.0, Streamlit 1.44.0
portfolio_reader has SELECT-only access. INSERT/UPDATE/DELETE verified denied at load time. /public/ dashboard endpoints routed through Traefik. DOCKER HEALTHCHECKS
All 6 services have container-level healthchecks. pg_isready for databases, HTTP endpoints for apps. Compose dependency ordering via service_healthy conditions.
IDEMPOTENT OPERATIONS
Data loading uses DROP IF EXISTS + CREATE. dbt models are fully rerunnable. No manual state, no drift — tear down and rebuild in minutes.
STATIC ANALYSIS
SonarQube Community Edition 26.3.0 runs self-hosted. Quality gates enforce: zero critical issues, coverage thresholds, no unreviewed security hotspots, duplication limits.
TECHNOLOGY SOVEREIGNTY
Every component is open-source and self-hostable. No runtime dependency on big-tech hosted services. Full stack reproducible from docker compose up.
All known upstream dependency vulnerabilities are triaged, documented internally, and tracked with remediation timelines. Accepted risks are justified per the CVE response policy in the project's engineering standards.
PQC: This project uses no custom cryptographic operations. TLS at the transport layer is the only crypto dependency. Post-quantum readiness (ML-KEM, ML-DSA, SHA-3) is a documented architectural principle for future systems.