LLM TELEMETRY

> SYNTHETIC API TELEMETRY THROUGH A MODERN ANALYTICS PIPELINE

SYSTEM TELEMETRY
500K Fact Rows
14 dbt Models
143 Test Cases
99.9% Uptime
PROJECT OVERVIEW

A full-stack data engineering demonstration: synthetic LLM API telemetry processed through a modern analytics pipeline. From data generation with statistically realistic distributions through Kimball dimensional modeling, to self-hosted BI dashboards and interactive forecasting.

The domain simulates a large-language-model API platform — modeled on real-world patterns from APIs like Anthropic's Claude. 500K API requests across 5,000 users, 5 models, and 5 endpoints over a 91-day window.

PIPELINE

01 GENERATE

Python + NumPy/SciPy

Log-normal, Pareto, sinusoidal distributions

02 LOAD

PostgreSQL 16

COPY-based bulk ingest, read-only roles

03 TRANSFORM

dbt Core

Star + snowflake schemas, 143 tests

04 VISUALIZE

Metabase + Streamlit

Self-hosted dashboards + forecasting

TECHNOLOGY STACK

DATA LAYER

PostgreSQL 16 — analytical database

dbt Core 1.9.4 — transformation framework

Python + NumPy/SciPy — data generation

Faker — synthetic PII generation

PRESENTATION LAYER

Metabase OSS v0.59.1 — BI dashboards

Streamlit 1.44.0 — interactive forecasting

Astro + TypeScript — static site

Docker Compose v2 — infrastructure