ENGINEERING STANDARDS

> cat ./standards/engineering_philosophy.md

This is the living document that governs how I build software. Every project I work on starts with these principles. They aren't theoretical — they're the same standards applied to this site, enforced in CI, and visible in every commit.

01 // CLEAN ARCHITECTURE & DESIGN
  • > API contracts are sacred. Define them first, implement second.
  • > Simplicity wins. If it's hard to explain, it's probably wrong.
  • > Separation of concerns is non-negotiable. Business logic never leaks into transport layers. Data models never bleed across bounded contexts.
  • > Documentation is part of the deliverable, not an afterthought.
  • > Design for the interface, not the implementation.
02 // TEST-DRIVEN DEVELOPMENT
  • > Tests come first. Always. No exceptions.
  • > If it isn't tested, it doesn't exist.
  • > Tests are first-class citizens. Same review standard, same CI gate as production code.

WHAT I ACTUALLY TEST:

  • > Unit against port traits with in-memory fakes — fast, deterministic, runs on every save.
  • > Integration against real databases, not mocks. Mock/prod divergence hides real bugs.
  • > Contract at API boundaries — provider and consumer agree, explicitly.
  • > E2E for critical user flows only — not every path. Expensive tests are a tax.
  • > Data validation tests are a separate discipline — they catch different bugs than code tests, and I count them separately. See [14].

> see also: [14] Data Quality · [18] Code Review

03 // DON'T REPEAT YOURSELF
  • > Every piece of knowledge has one authoritative source.
  • > DRY applies to: code, configuration, documentation, test fixtures, data schemas, and API definitions.
  • > Caveat: DRY does not mean premature abstraction. Two instances may be coincidence. Three is a pattern. Abstract on three.
04 // DATA MODELING & ARCHITECTURE
  • > Access pattern drives schema design. Write-heavy: normalize (3NF/BCNF). Read-heavy: denormalize (Kimball).
  • > Kimball Dimensional Modeling is the default for analytical models. Star schema baseline, snowflake when hierarchy depth justifies join cost.
  • > OLTP and OLAP are separate concerns. Separate schemas, separate roles, separate access patterns.
  • > Schema changes are versioned migrations, never manual edits. Data contracts apply — downstream consumers must not break from upstream changes.

DBT CONVENTIONS:

  • > Layered modeling: stagingintermediate/martsanalytics. Each layer has a contract with the next.
  • > Naming: stg_, int_, fct_, dim_, mart_ prefixes. Prefix tells you where it lives and what it is.
  • > Minimum test coverage per model: not_null + unique on primary keys, relationships on foreign keys, accepted_values on categoricals. Anything less means the model isn't finished.
  • > Custom macros for repeated logic. Examples from this project: hash_pii(), suppress_small_groups(). Write it once.
  • > dbt docs generated after every schema change. Lineage graph is institutional memory.

> see also: [13] SQL Style · [14] Data Quality

05 // SECURITY & CRYPTOGRAPHY
  • > Security is phase 0, not phase 2.
  • > Post-Quantum Cryptography readiness is a first-class concern. Target CRYSTALS-Kyber (ML-KEM), CRYSTALS-Dilithium (ML-DSA), SHA-3/SHAKE.
  • > Crypto agility is mandatory. No algorithm hardcoded. Cipher suites, key sizes, and identifiers must be configurable and swappable.
  • > Security scanning is infrastructure: SAST, SCA, secret scanning, container scanning — on every commit.

CVE RESPONSE SLA:

CRITICAL: before next merge HIGH: this sprint MEDIUM: tracked LOW: triaged & documented
06 // DEPENDENCY MANAGEMENT
  • > Pin versions exactly. No floating specifiers. Lock files are committed and treated as source artifacts.
  • > Every new dependency is a liability. Justify it. Document it. Audit regularly for staleness, abandonment, and CVEs.
  • > Circular dependencies are bugs. Detect in CI, fail the build.
07 // TECHNOLOGY SOVEREIGNTY
  • > Open-source, self-hostable solutions are the default.
  • > No runtime dependency on big-tech hosted services. Their OSS projects evaluated on technical merit only.
  • > Self-deployment capability is mandatory. Docker Compose for local dev, container-orchestration-ready for production.

CURRENT DEFAULTS (snapshot, not a law — these evolve):

  • > Infra: Docker Compose · Traefik · Cloudflare DNS/DDoS · Hetzner VPS
  • > Data: PostgreSQL · dbt Core · Metabase OSS · Streamlit
  • > Languages: Rust by default; Python when the ecosystem demands it; TypeScript for frontend
  • > Auth: self-hosted Keycloak (not Auth0/Clerk/etc.)
  • > Rendering: Astro (static) · self-hosted fonts (no Google CDN)
08 // PROGRAMMATIC EFFICIENCY
  • > Algorithmic complexity is a design concern, not an optimization concern. Always know the complexity.
  • > O(1) > O(log n) > O(n) > O(n log n) > O(n²). Verify no better alternative exists before accepting worse complexity.
  • > Premature optimization is still a sin. Design before profiling. But flag O(n²) or worse.
09 // CODE QUALITY & ANNOTATION
  • > Readable by a stranger in 6 months. Functions do one thing.
  • > No magic numbers or strings — named constants with clear origin.
  • > Error handling is explicit. Silent failures are bugs.

INLINE ANNOTATION TAGS:

// TECH DEBT:Shortcut, needs resolution
// PERF:Performance concern
// SECURITY:Security-sensitive, review carefully
// PQC DEBT:Classical crypto, needs migration
// FIXME:Broken, must fix before release
// SCHEMA:Fragile if schema changes
// VENDOR LOCK-IN:Vendor dependency, needs escape path
// DEPENDENCY RISK:Stale or under-maintained dep
10 // CONVENTIONAL COMMITS

Write the subject line in the imperative mood. Imagine the phrase completes: "If applied, this commit will..."

<type>(optional scope): short summary
feat:New feature or capability
fix:Bug fix
docs:Documentation-only changes
refactor:Neither fix nor feature
perf:Performance improvement
test:Adding or correcting tests
chore:Routine tasks, build processes
style:Formatting changes

EXAMPLES:

Backend: feat(api): add patient redaction endpoint

Data: fix(etl): resolve timestamp anomaly in incremental load

Frontend: refactor(ui): extract vaporwave button into reusable component

11 // DOCUMENTATION & ISSUE TRACKING
  • > READMEs evolve with the code, not written after. Required: quickstart, config reference, architecture overview, testing, deployment, troubleshooting.
  • > The Minimal Working Example is non-negotiable. Runnable in under 10 minutes using only the README.
  • > Bugs that consume 1+ hour of debugging get documented. Format: Date, Environment, Symptoms, Root Cause, Fix, Prevention.
  • > Three or more similar issues = architectural smell. Escalate to an Architecture Decision Record.
12 // PREFERRED PATTERNS & IDIOMS

FUNDAMENTALS:

> Hexagonal Architecture (Ports & Adapters)
> Repository Pattern — data access abstracted
> Strategy Pattern — behaviors swappable
> Factory / Builder — readable object creation
> CQRS — separate read/write when justified
> Event-driven — decoupling across contexts
> Kimball Dimensional Modeling (star default)
> Dependency Injection over hard dependencies

PATTERNS I USE REPEATEDLY (from my own projects):

  • > Dual storage backends — identical port trait, swap at startup. Example: Postgres ↔ flat JSON in ResumeForge. Lets the same code ship in a database-present and database-absent form.
  • > Crypto-agile provider interfaces — a single CryptoProvider trait, multiple implementations. Example: Signal-Lens, so adding post-quantum primitives is a new implementation, not a rewrite.
  • > Independent crate/module boundaries with clean API contracts. Example: Nexus's 9-crate architecture — each crate does one thing, contract is its public types.
  • > Monorepo for development, independent publication for consumption. Example: CivicLens/Sentinel — develop together for velocity, ship separately so each crate can be consumed without the rest.
13 // SQL STYLE
  • > All lowercase keywords. SQL reads as prose, not shouting.
  • > Comma-first field lists. Enables running queries from the end without selecting all — cursor-friendly debugging.
  • > No blank lines between CTEs. Lets cursor-based execution target exactly one CTE at a time.
  • > CTEs over nested subqueries. Every named CTE is a unit of thought. Easier to read, easier to refactor, easier to debug.
  • > Explicit column lists. Never SELECT * in production — schema drift silently breaks downstream.
  • > snake_case for all identifiers. No ambiguity, no case-sensitivity surprises.
  • > Trailing commas on the last field. Cleaner diffs when adding or removing columns.
  • > Table aliases always explicit, 1-3 chars. Every column reference names its table.
  • > Explicit JOIN type (INNER JOIN, LEFT JOIN), never bare JOIN — forces semantic intent.

> see also: [04] Data Modeling · [14] Data Quality

14 // DATA QUALITY & PIPELINE TESTING
  • > Pipelines get validation tests, not just code tests. A pipeline that runs green but produces wrong data is still broken.
  • > Edge cases and failure paths tested explicitly. What happens on empty input? Malformed rows? Duplicate keys? Late-arriving data?
  • > Data contracts between upstream sources and downstream consumers. Breaks are detected at the boundary before they propagate.
  • > Monitoring and alerting for pipeline health. Run duration, row counts, freshness, error rates — all tracked.
  • > Test counts are minimum coverage, not a target. "900+ tests" isn't the brag — "every boundary and edge case covered" is.
  • > Data tests and code tests counted separately. They catch different bugs and belong to different disciplines. Don't blend the numbers.
  • > Test at the layer boundary: staging gets source-freshness and row-count, marts get business-logic, analytics gets k-anonymity and small-group suppression.

> see also: [02] TDD · [04] Data Modeling · [17] Observability

15 // AI & LLM INTEGRATION
  • > PII stripped before any outbound API call. If it's personal, it doesn't leave the machine without consent and without transformation.
  • > User sees the exact payload in a consent dialog before send. Nothing goes over the wire that the user didn't see first.
  • > Every call logged: model, prompt, response, tokens, cost, timestamp. No silent spend, full audit trail.
  • > Multi-model evaluation. Don't trust a single vendor. Test the same prompt across models before baking the choice into a system.
  • > Determinism boundaries are marked. Schema-validated outputs are one thing; probabilistic outputs are another. Readers of the code always know which is which.
  • > Cost awareness is first-class. Per-conversation cost tracking, split input/output pricing per model, alerts on anomalous spend.
  • > Fallback behavior defined for refusal, parse failure, and timeout. Real systems break at these boundaries; an LLM without a fallback is a production outage waiting to happen.
  • > AI-generated content tracked with provenance. Which model, which prompt, which version, which human approved it before it shipped.

> see also: [05] Security · [16] Privacy · [17] Observability

16 // PRIVACY ENGINEERING
  • > Privacy is a design constraint, not a compliance checkbox. If you bolt it on at the end, you've already lost.
  • > Data minimization by default. Collect the least you need to do the job. Every field is a future liability.
  • > PII classified at the column level. Column-level tags drive tooling: masking, access control, auditing, retention.
  • > Jurisdiction-aware consent tracking where applicable. Consent granted in one jurisdiction doesn't grant access in another.
  • > Retention policies defined at schema level, not buried in application code. The schema knows when a record expires.
  • > Pseudonymization over anonymization where re-linkage may be needed. hash_pii() with a stable salt preserves joinability without exposing the original.
  • > Right-to-be-forgotten workflows: cascading deletes, tombstones where needed, and an audit trail of the deletion itself.

> see also: [05] Security · [15] AI/LLM

17 // OBSERVABILITY & MONITORING
  • > Structured logs with consistent fields (timestamp, level, request_id, user_id where safe). Logs are a queryable stream, not a stack of strings.
  • > Log levels used intentionally. DEBUG / INFO / WARN / ERROR / FATAL each mean a specific thing. Don't INFO everything.
  • > Health endpoints on every long-running service. Orchestrator needs to know if the thing is alive AND ready.
  • > Metrics over logs for anything counted — request counts, latencies, error rates. Log lines for things and events; metrics for aggregates.
  • > Dashboards answer questions, not collect metrics. Show the operator what they need to decide; hide everything else.
  • > Alerting thresholds tuned to avoid fatigue. Every page must be actionable. A page that can't be acted on is noise training people to ignore the real ones.
  • > PII never in logs. Cross-reference [16] — logs are long-lived and widely readable.

> see also: [05] Security · [14] Data Quality · [16] Privacy

18 // CODE REVIEW
  • > Every non-trivial change gets review. Solo projects included — review future-you by leaving a self-review checklist in the PR.
  • > Reviewers check: does it do what the commit message says? Are tests present and correct? Is error handling explicit? Does it regress anything obvious?
  • > Reviewers don't check style (the linter does that) or personal preference (call it out as preference, not a block).
  • > Small PRs get faster review. Batch by concern, not by day. A huge multi-concern PR is a bad PR before anyone reads it.
  • > Authors respond to every comment, even if just to acknowledge. Silent dismissal is disrespectful and builds debt.
  • > Approvals require all conversations resolved. An open thread is an open question; ship the answer, not the question.
  • > No self-approval on shared codebases. Self-merge only after a cooling-off period and only with documented justification.

> see also: [02] TDD · [10] Conventional Commits