LLM TELEMETRY
> SYNTHETIC API TELEMETRY THROUGH A MODERN ANALYTICS PIPELINE
A full-stack data engineering demonstration: synthetic LLM API telemetry processed through a modern analytics pipeline. From data generation with statistically realistic distributions through Kimball dimensional modeling, to self-hosted BI dashboards and interactive forecasting.
The domain simulates a large-language-model API platform — modeled on real-world patterns from APIs like Anthropic's Claude. 500K API requests across 5,000 users, 5 models, and 5 endpoints over a 91-day window.
01 GENERATE
Python + NumPy/SciPy
Log-normal, Pareto, sinusoidal distributions
02 LOAD
PostgreSQL 16
COPY-based bulk ingest, read-only roles
03 TRANSFORM
dbt Core
Star + snowflake schemas, 143 tests
04 VISUALIZE
Metabase + Streamlit
Self-hosted dashboards + forecasting
DATA LAYER
PostgreSQL 16 — analytical database
dbt Core 1.9.4 — transformation framework
Python + NumPy/SciPy — data generation
Faker — synthetic PII generation
PRESENTATION LAYER
Metabase OSS v0.59.1 — BI dashboards
Streamlit 1.44.0 — interactive forecasting
Astro + TypeScript — static site
Docker Compose v2 — infrastructure