Project layout¶

bakwas/
├── src/                   # Flask app, DB, summarizer, providers, subtitles
│   ├── app.py             # Flask routes, startup, rate limit config
│   ├── database.py        # SQLite access (get_all_summaries, save_summary, ...)
│   ├── providers.py       # Provider registry: YAML loader + model discovery
│   ├── summarizer.py      # Prompt templates + LiteLLM dispatch
│   ├── subtitles.py       # yt-dlp caption extraction + URL canonicalization
│   └── utils.py           # Small helpers (env parsing, is_local, rate-limit config)
├── templates/             # Jinja2 templates + partials
│   ├── base.html          # Page shell (nav, settings modal, scripts)
│   ├── index.html         # Homepage (form + summaries table)
│   ├── detail.html        # Individual summary view
│   └── partials/          # navbar, footer, modal, icons
├── static/                # CSS, JavaScript, images
│   ├── css/styles.css
│   └── js/                # theme, modal, preferences, index, detail, ...
├── config/
│   └── providers.yaml     # Shipped provider registry (git-tracked)
│                          # Override via providers.local.yaml (git-ignored)
├── data/                  # SQLite database (bind-mounted in Docker)
├── docs/                  # This documentation site (MkDocs Material)
├── run.py                 # Entry point (Werkzeug dev server)
├── Dockerfile             # Production image (runs Gunicorn)
├── docker-compose.yml
└── requirements.txt

Key architectural choices¶

Single Docker container. The whole app runs in one image. Database is SQLite in a bind-mounted volume.

Provider registry, not hardcoded SDKs. src/providers.py loads config/providers.yaml at startup. Any OpenAI-compatible endpoint can be added without code changes.

LiteLLM as the dispatch layer. One code path (litellm.completion) handles every provider type. Authentication is resolved per-call from the provider config.

Gunicorn in Docker, Werkzeug locally. The Dockerfile runs gunicorn --workers 2 --threads 4 for production. python run.py uses Flask's built-in dev server with hot reload.

URL-based caching. Summaries are keyed by canonical YouTube URL + model + summary length. Resubmitting the same combination returns the cached summary without calling the LLM.