elog08/chautauqua

Fork 0

Chautauqua: self-hostable audiobook pipeline (BNM, multi-backend TTS, M4B export). Mirror of github.com/elog08/chautauqua.

Python 65.2%
TypeScript 30.9%
Shell 2.6%
Perl 1%
CSS 0.1%

Find a file

Eyasu Kifle 32e1751472 Some checks are pending runner smoke / hello (push) Successful in 1s Details build-images / build (., Dockerfile.worker, chautauqua-worker) (push) Waiting to run Details build-images / build (ui, ui/Dockerfile, chautauqua-ui) (push) Waiting to run Details build-images / build (., Dockerfile, chautauqua-api) (push) Has started running Details fix(ui): pin pnpm to v9 in Dockerfile corepack now resolves pnpm@latest to v11.0.8, which requires Node.js v22+. The image is based on node:20-alpine, and the lockfile was generated by pnpm v9 (lockfileVersion 9.0). Pin to pnpm@9 to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-07 14:24:25 -07:00
.claude	.claude/scheduled_tasks.lock	2026-05-07 12:37:11 -07:00
.forgejo/workflows	ci: add build-images workflow for Forgejo Actions	2026-05-07 13:58:40 -07:00
.github/workflows	.github/workflows/ui.yml ui/src/components/ui/error-boundary.tsx ui/src/router.tsx	2026-04-25 20:09:41 -07:00
.kilo/plans	.kilo/plans/1778091706716-swift-canyon.md chautauqua/api/routes/ingest.py docs/BNM_SPEC_TRACKER.md	2026-05-06 14:44:49 -07:00
.playwright-mcp	.playwright-mcp/console-2026-04-29T18-20-41-136Z.log ui/src/pages/preplan-page.tsx	2026-04-29 11:46:34 -07:00
chautauqua	chautauqua/api/app.py chautauqua/api/routes/transcribe.py chautauqua/temporal/worker.py	2026-05-07 13:28:52 -07:00
cuda_tts	feat(cli): add voice-map command with multi-backend voice design injection	2026-05-06 11:16:36 -07:00
docs	README.md chautauqua/api/routes/jobs.py chautauqua/cli.py	2026-05-07 11:46:51 -07:00
gemini_tts	feat(spec): implement BNM v0.4 with expanded direction metadata and scene containers	2026-05-07 10:06:03 -07:00
input	test(cuda): add BNM test document for CUDA rendering integration	2026-05-05 14:44:28 -07:00
mlx_tts	.claude/settings.json mlx_tts/voice_map.py	2026-05-02 09:29:11 -07:00
scripts	feat(spec): implement BNM v0.4 with expanded direction metadata and scene containers	2026-05-07 10:06:03 -07:00
tempdocs	.env.example .kilo/plans/1778011014440-brave-meadow.md .kilo/plans/1778013087524-playful-star.md	2026-05-05 15:44:06 -07:00
tests	chautauqua/api/app.py chautauqua/api/routes/transcribe.py chautauqua/temporal/worker.py	2026-05-07 13:28:52 -07:00
ui	fix(ui): pin pnpm to v9 in Dockerfile	2026-05-07 14:24:25 -07:00
vllm_tts	chautauqua/backends/mlx_adapter.py chautauqua/backends/vllm_adapter.py chautauqua/backends/voxtral_adapter.py	2026-04-17 11:12:01 -07:00
voxtral_tts	ui/src/pages/voices-page.tsx voxtral_tts/voxtral_tts.py	2026-04-26 11:03:39 -07:00
.dockerignore	.dockerignore Dockerfile dev.sh	2026-04-25 20:18:54 -07:00
.env.example	.env.example .kilo/plans/1778011014440-brave-meadow.md .kilo/plans/1778013087524-playful-star.md	2026-05-05 15:44:06 -07:00
.env.gpu.example	.env.gpu.example .gitignore docker-compose.gpu-worker.yml	2026-05-05 12:24:36 -07:00
.gitignore	.env.gpu.example .gitignore docker-compose.gpu-worker.yml	2026-05-05 12:24:36 -07:00
AGENTS.md	AGENTS.md CLAUDE.md README.md	2026-04-27 10:20:48 -07:00
CLAUDE.md	CLAUDE.md README.md	2026-04-30 00:58:30 -07:00
dev.sh	README.md chautauqua/api/worker_availability.py dev.sh	2026-04-29 22:46:59 -07:00
docker-compose.gpu-worker.yml	feat: add STT worker support to docker builds and compose	2026-05-06 14:21:46 -07:00
docker-compose.yml	feat: add STT worker support to docker builds and compose	2026-05-06 14:21:46 -07:00
Dockerfile	Dockerfile Dockerfile.worker README.md	2026-05-07 13:48:33 -07:00
Dockerfile.worker	Dockerfile Dockerfile.worker README.md	2026-05-07 13:48:33 -07:00
Dockerfile.worker.cuda	feat: add STT worker support to docker builds and compose	2026-05-06 14:21:46 -07:00
edit-page-toolbar.png	.playwright-mcp/console-2026-04-29T08-08-14-243Z.log .playwright-mcp/console-2026-04-29T08-09-47-159Z.log .playwright-mcp/console-2026-04-29T08-10-22-982Z.log	2026-04-29 01:11:30 -07:00
edit-page-with-buttons.png	.playwright-mcp/console-2026-04-29T08-08-14-243Z.log .playwright-mcp/console-2026-04-29T08-09-47-159Z.log .playwright-mcp/console-2026-04-29T08-10-22-982Z.log	2026-04-29 01:11:30 -07:00
preplan-fixed.png	.playwright-mcp/console-2026-04-29T08-11-10-865Z.log .playwright-mcp/console-2026-04-29T18-18-18-753Z.log .playwright-mcp/console-2026-04-29T18-19-15-272Z.log	2026-04-29 11:21:36 -07:00
preplan-with-cast.png	.playwright-mcp/console-2026-04-29T08-11-10-865Z.log .playwright-mcp/console-2026-04-29T18-18-18-753Z.log .playwright-mcp/console-2026-04-29T18-19-15-272Z.log	2026-04-29 11:21:36 -07:00
pyproject.toml	feat: add STT worker support to docker builds and compose	2026-05-06 14:21:46 -07:00
README.md	Dockerfile Dockerfile.worker README.md	2026-05-07 13:48:33 -07:00
requirements-voxtral.txt	feat(tts): integrate TTS backends for MLX, Voxtral, and vLLM with voice mapping and utilities	2026-04-16 12:34:23 -07:00
uv.lock	chore: update lockfile for STT extra	2026-05-06 14:27:40 -07:00

README.md

Chautauqua

Self-hostable audiobook pipeline: raw text in, chaptered M4B out. Cast differentiation, incremental caching, multiple TTS backends.

System requirements

Minimum

Component	Requirement
OS	macOS 13+, Ubuntu 22.04+, or Windows 11 (WSL2)
Python	3.12+
RAM	4 GB (cloud backends only)
Disk	5 GB (Python deps + Docker images)
Docker	24+ with Compose V2 (for the full stack)

Use case	RAM	Disk	Notes
Cloud TTS only (Voxtral, Gemini)	4 GB	5 GB	Fastest setup, needs API keys
Piper ONNX CPU	4 GB	6 GB	Small local CPU voices, downloaded on demand
CPU TTS (Kokoro via PyTorch)	8 GB	10 GB	No GPU needed, slower inference
MLX local (Kokoro)	8 GB	6 GB	Apple Silicon only, fast
MLX local (Chatterbox / Dia)	16 GB	10 GB	Voice cloning, expressive
MLX local (Voxtral 4B)	16 GB	15 GB	Multilingual, 20 voices
MLX local (kugelaudio 7B)	32 GB	25 GB	SOTA quality, 24 languages
vLLM remote (NVIDIA GPU)	8 GB host	5 GB host	GPU server needs 8+ GB VRAM

MLX model weights and Piper ONNX voices are downloaded on first use to ~/.cache/huggingface/. The disk estimates above include model and voice weights.

How it works

text -> Ingest -> BNM -> Transform -> directed BNM -> Pre-plan -> voice map
                                                                      |
                                                        Render <------+
                                                          |
                                                    chaptered M4B

Stage	Input	Output
Ingest	Plain text	`book.bnm.md` + `book.lock.yaml`
Transform	BNM	Directed BNM (LLM-enriched stage directions)
Pre-plan	Directed BNM	Approved voice map
Render	BNM + voice map	Chaptered M4B + per-cue WAVs

Cache-aware: same text + model + voice = cache hit. Editing one sentence re-renders only that cue.

Quick start

Prerequisites (all platforms)

Python 3.12+
uv (Python package manager)
Docker and Docker Compose (for full stack)
ffmpeg (for M4B assembly)
SoX (optional fallback for WAV concatenation if ffmpeg concat fails)
Node.js 20+ and pnpm (for the web UI)

macOS (Apple Silicon)

Apple Silicon Macs can run the MLX backend natively for fast local TTS with no cloud API keys needed.

# 1. Install system deps
brew install uv ffmpeg sox node
npm install -g pnpm

# 2. Clone and install
git clone <repo-url> && cd chautauqua
uv sync && uv sync --extra mlx

# 3. Set up environment
cp .env.example .env
# Edit .env — defaults work for local dev (see Environment Variables below)

# 4. Start the full stack (Docker services + host MLX workers)
./dev.sh up --mlx

# 5. Open the web UI
open http://localhost:5173

The --mlx flag tells dev.sh to start MLX TTS workers on the host (Metal GPU is not accessible inside Docker). Docker handles Redis, Temporal, MinIO, the API server, and the web UI.

CLI-only (no Docker):

uv sync && uv sync --extra mlx
uv run chautauqua ingest book.txt --auto --output-dir output
uv run chautauqua render output/book.bnm.md --backend mlx --model kokoro --output-dir output

Linux

Linux machines can use the local CPU backends, the vLLM backend with an NVIDIA GPU, or cloud backends (Voxtral, Gemini).

# 1. Install system deps
# Debian/Ubuntu:
sudo apt update && sudo apt install -y ffmpeg sox
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone and install
git clone <repo-url> && cd chautauqua
uv sync

# 3. Set up environment
cp .env.example .env
# Edit .env — see Environment Variables below

# 4. Start the stack with the small Piper ONNX CPU TTS worker
docker compose --profile piper up -d

# 5. Open the web UI
xdg-open http://localhost:5173

With an NVIDIA GPU (vLLM):

# Start a vLLM server on the GPU host (see docs/guides/mlx.md for model setup)
# Then set VLLM_SERVER_URL in .env and:
docker compose --profile vllm up -d

With cloud TTS (no GPU needed):

# Voxtral (Mistral API) — set MISTRAL_API_KEY in .env
docker compose --profile voxtral up -d

# Gemini (Google) — set GEMINI_API_KEY in .env
docker compose --profile gemini up -d

With Piper ONNX CPU (small local voices):

docker compose --profile piper up -d

Piper voice files are resolved from rhasspy/piper-voices and downloaded on first render.

With Kokoro CPU (PyTorch):

docker compose --profile cpu up -d

Kokoro CPU uses CPU-only PyTorch wheels. It avoids CUDA packages, but still downloads Torch because Kokoro depends on PyTorch.

Windows

Windows support is through WSL2 (Windows Subsystem for Linux). Native Windows is not supported.

# 1. Install WSL2 (PowerShell as admin, then restart)
wsl --install

After restarting, open your WSL2 terminal (Ubuntu by default):

# 2. Install system deps inside WSL2
sudo apt update && sudo apt install -y ffmpeg sox
curl -LsSf https://astral.sh/uv/install.sh | sh

# 3. Install Docker Desktop for Windows and enable the WSL2 backend
#    https://docs.docker.com/desktop/install/windows-install/
#    In Docker Desktop Settings > Resources > WSL Integration, enable your distro.

# 4. Clone and install
git clone <repo-url> && cd chautauqua
uv sync

# 5. Set up environment
cp .env.example .env
# Edit .env — see Environment Variables below

# 6. Start the stack with the small Piper ONNX CPU TTS worker
docker compose --profile piper up -d

# 7. Open the web UI
explorer.exe http://localhost:5173

For NVIDIA GPU support on Windows, install the NVIDIA CUDA drivers for WSL2 and follow the Linux vLLM instructions above.

Environment variables

Copy .env.example to .env and configure:

Storage

Variable	Default	Description
`STORAGE_BACKEND`	`local`	`local` for filesystem, `minio` for S3-compatible storage
`CHAUTAUQUA_STORAGE_ROOT`	`~/.chautauqua/storage`	Root directory when using local storage
`MINIO_ENDPOINT`	`localhost:9000`	MinIO server address (Docker sets this automatically)
`MINIO_ACCESS_KEY`	`minioadmin`	MinIO access key
`MINIO_SECRET_KEY`	`minioadmin`	MinIO secret key

TTS backends

Variable	Required for	Description
`MISTRAL_API_KEY`	Voxtral backend	API key from Mistral
`GEMINI_API_KEY`	Gemini backend	API key from Google AI Studio
`HF_TOKEN`	MLX model downloads	HuggingFace token for gated model access
`VLLM_SERVER_URL`	vLLM backend	URL of your vLLM inference server (e.g. `http://gpu-host:8000`)

Infrastructure

Variable	Default	Description
`TEMPORAL_ADDRESS`	`localhost:7233`	Temporal server gRPC endpoint
`TEMPORAL_NAMESPACE`	`default`	Temporal namespace
`REDIS_URL`	—	Redis connection string (e.g. `redis://localhost:6379/0`). Persists job state across restarts

LLM (for Ingest and Transform)

Variable	Default	Description
`LLM_BASE_URL`	—	OpenAI-compatible API base URL (e.g. `https://api.openai.com/v1`)
`LLM_API_KEY`	—	API key for the LLM endpoint
`LLM_MODEL`	—	Model name (e.g. `gpt-4o-mini`)

Tip: For local dev without Docker, only STORAGE_BACKEND=local is required. Everything else is optional depending on which backends and features you use.

Install extras

uv sync                      # core
uv sync --extra mlx          # Apple Silicon TTS (Kokoro, Chatterbox, Qwen3-TTS)
uv sync --extra kokoro-cpu   # CPU-only Kokoro (PyTorch, slower)
uv sync --extra kokoro-gpu   # Kokoro via PyTorch on CUDA
uv sync --extra piper-cpu    # Piper ONNX CPU voices
uv sync --extra vllm         # remote CUDA vLLM server
uv sync --extra voxtral      # Mistral Voxtral cloud API
uv sync --extra gemini       # Google Gemini cloud TTS
uv sync --extra stt          # Whisper STT for word alignment and batch splitting
uv sync --extra temporal     # Temporal workflow engine
uv sync --extra ingest       # spaCy NLP for ingest
uv sync --extra transform    # LLM-based transform pipeline
uv sync --extra convert      # PDF/EPUB -> text conversion
uv sync --extra convert-ocr  # + OCR support (Tesseract)
uv sync --extra convert-ml   # + ML-based conversion (Marker, Docling)
uv sync --extra redis        # Redis job state persistence
uv sync --extra storage-minio # MinIO S3 storage
uv sync --extra dev          # pytest, hypothesis
uv sync --extra all          # everything (except convert-ocr and convert-ml)

CLI

uv run chautauqua ingest book.txt --auto                           # text -> BNM
uv run chautauqua render book.bnm.md --backend mlx --model kokoro  # BNM -> audio
uv run chautauqua voices list --backend mlx --model kokoro
uv run chautauqua voices list --backend piper --model piper
uv run chautauqua validate book.bnm.md
uv run chautauqua doctor
uv run chautauqua serve                                            # web UI + API on :8080

CLI smoke test

Use the included fixtures to verify the command-line generation path without Docker.

# Validate a known-good BNM file.
uv run chautauqua validate fixtures/tiny.bnm.md --summary

# Compile the BNM into render metadata and cue prompts.
uv run chautauqua compile fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --output-dir /tmp/chautauqua-cli-compile

# Exercise the render planner without loading a TTS model.
uv run chautauqua render fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --limit 1 \
  --dry-run

On Apple Silicon with MLX installed, run one real cue render:

uv run chautauqua render fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --limit 1 \
  --force \
  --storage local \
  --storage-root /tmp/chautauqua-cli-storage \
  --output-dir /tmp/chautauqua-cli-render

Expected outputs include:

Per-cue WAV: /tmp/chautauqua-cli-render/<job_id>/cue-0001.wav
Stitched chapter WAV: /tmp/chautauqua-cli-render/<job_id>/chapters/chapter-01.wav
Final M4B: /tmp/chautauqua-cli-render/<job_id>/final/Tiny Test Book.m4b
Stored artifact copy: /tmp/chautauqua-cli-storage/chautauqua-artifacts/<job_id>/cues/cue-0001.wav

To test raw text to BNM generation:

uv run chautauqua ingest fixtures/simple-dialogue.txt \
  --auto \
  --output-dir /tmp/chautauqua-cli-ingest

uv run chautauqua validate /tmp/chautauqua-cli-ingest/simple-dialogue.bnm.md --summary

For the ONNX CPU backend:

uv sync --extra piper-cpu
uv run chautauqua voices list --backend piper --model piper
uv run chautauqua render fixtures/tiny.bnm.md \
  --backend piper \
  --model piper \
  --limit 1

The default voice is en_US-lessac-medium. Other built-in aliases include amy, amy-low, and ryan; the corresponding .onnx and .onnx.json files download from Hugging Face on first use.

Backends

Backend	Flag	Hardware	API key needed
MLX	`--backend mlx`	Apple Silicon	No
Kokoro CPU	`--backend cpu`	Any CPU	No
Kokoro CUDA	`--backend cuda`	NVIDIA CUDA	No
Piper ONNX CPU	`--backend piper`	Any CPU	No
vLLM	`--backend vllm`	NVIDIA GPU (remote server)	No
Voxtral	`--backend voxtral`	Cloud	`MISTRAL_API_KEY`
Gemini	`--backend gemini`	Cloud	`GEMINI_API_KEY`

Two render modes: Single (default) and Overlay (narrator base + character dialogue spliced via Whisper alignment).

Backend selector (UI)

The web UI groups backends by compute tier, then offers a model dropdown per tier. This differs from the flat --backend flag taxonomy above — at the CLI each row is its own backend; in the UI cloud vendors are split out and Piper sits under CPU as a model.

UI tier	Wire backend	Models exposed
Cloud — Gemini	`gemini`	gemini-3.1-flash-tts-preview, gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts
Cloud — Voxtral	`voxtral`	voxtral-mini-tts
MLX (Apple GPU)	`mlx`	kokoro, chatterbox, voxtral (local)
CUDA (NVIDIA GPU)	`cuda`	kokoro
vLLM (remote GPU)	`vllm`	kokoro, voxtral-mini-tts
CPU	`cpu`	kokoro, piper

Selecting CPU × piper in the UI translates to wire --backend piper --model piper at the API boundary, so jobs land on the existing gpu-tts-piper-piper queue. Everything else passes through with the wire backend matching the tier name. The taxonomy lives in ui/src/lib/backend-options.ts.

Which backend should I use?

Just want to try it out? Use gemini or voxtral — cloud-based, no hardware requirements, sign up for a free API key and go.
Apple Silicon Mac (M1/M2/M3/M4)? Use mlx with the kokoro model for the best speed/quality tradeoff. Upgrade to chatterbox for voice cloning or voxtral (the MLX model, not the API) for multilingual support.
Linux with NVIDIA GPU? Either run cuda (Kokoro on PyTorch CUDA — docker compose --profile cuda up -d, see docs/guides/cuda.md) or set up a vllm server for higher-throughput batch rendering.
No GPU, no API key? Use piper for the smallest local ONNX path, or cpu with kokoro for higher quality PyTorch inference.
Production audiobooks? Start with kokoro for drafting, then re-render final output with chatterbox or kugelaudio for higher quality.

Example: mid-range PC (16 GB RAM, integrated GPU, quad-core CPU)

AMD/Intel integrated graphics (Vega, UHD, etc.) are not supported by any TTS backend — MLX needs Apple Silicon and vLLM needs NVIDIA CUDA. Three good options:

Option A — Cloud TTS (recommended). Offload rendering to Gemini or Voxtral. Your PC runs only the orchestration stack (Docker), which is lightweight. Best quality-per-dollar and fastest turnaround.

cp .env.example .env
# Add your API key:
#   GEMINI_API_KEY=your_key    (free tier available at aistudio.google.com)
#   — or —
#   MISTRAL_API_KEY=your_key   (free tier available at console.mistral.ai)

docker compose --profile gemini up -d    # or --profile voxtral

Option B — Piper ONNX CPU. Runs small local Piper voices with no API key. This is the lightest local backend and downloads voice files on first use.

docker compose --profile piper up -d

Option C — Kokoro CPU. Runs Kokoro inference on your CPU with no API key. Expect ~5-10x real-time on a quad-core (a 1-hour audiobook takes 5-10 hours to render). Good for offline/batch work or if you prefer not to use cloud APIs.

docker compose --profile cpu up -d

You can also mix backends: use piper for fast local checks, cpu with kokoro for Kokoro previews, and gemini or voxtral for the final render.

BNM format

Intermediate representation — Markdown + YAML front matter:

---
bnm: "0.3"
title: "Bartleby, the Scrivener"
cast:
  narrator:
    preferred:
      kokoro: { voice: am_adam, lang_code: a }
---
:::chapter {#ch-001 title="Chapter I"}
:::cue {#cue-001 speaker="narrator"}
I am a rather elderly man.
:::
:::

Full spec: docs/SPEC_BNM.md

Web UI + API

uv run chautauqua serve          # API on :8080
cd ui && pnpm install && pnpm dev # UI dev server on :5173 (proxies /api to :8080)

Endpoint	Method	Purpose
`/api/jobs`	GET / POST	List / create jobs
`/api/jobs/{id}/progress`	GET (SSE)	Live progress stream
`/api/jobs/{id}/pause`	POST	Pause / resume / cancel
`/api/ingest/upload`	POST	Upload text for ingest
`/api/preplan/{id}`	GET / POST	Preplan status / approve
`/api/voices`	GET	List voices
`/api/voices/sample`	POST	Render a voice sample
`/api/artifacts/{id}/{path}`	GET	Download artifacts

Docker Compose

The docker-compose.yml provides the full stack. Core services start by default; TTS workers are activated via profiles:

docker compose up -d                        # core (Redis, Temporal, MinIO, API, UI, general worker)
docker compose --profile cpu up -d          # + Kokoro CPU TTS worker (PyTorch)
docker compose --profile cuda up -d         # + Kokoro CUDA worker (NVIDIA GPU, see docs/guides/cuda.md)
docker compose --profile piper up -d        # + Piper ONNX CPU worker
docker compose --profile voxtral up -d      # + Voxtral cloud TTS worker
docker compose --profile vllm up -d         # + vLLM remote GPU worker
docker compose --profile gemini up -d       # + Gemini cloud TTS worker
docker compose --profile stt up -d          # + Whisper STT worker on CPU
docker compose --profile stt-cuda up -d     # + Whisper STT worker on NVIDIA CUDA

The STT workers poll audiobook-stt and are used for Whisper-heavy work such as listen-along word alignment and marker-based one-shot batch splitting.

Service	Port	Description
API	`localhost:8080`	FastAPI server
UI	`localhost:5173`	Vite dev server
Redis	`localhost:6379`	Job state persistence
Temporal	`localhost:7233`	Workflow orchestration (gRPC)
Temporal UI	`localhost:8233`	Temporal web dashboard
MinIO S3	`localhost:9000`	Object storage API
MinIO Console	`localhost:9001`	Object storage web UI

Building images

Four Dockerfiles, all built with the chautauqua/ directory as the build context — the chautauqua subtree is fully self-contained (its own pyproject.toml and uv.lock) and builds without needing a parent workspace:

Image	Dockerfile	Base	Size
`chautauqua-api`	`Dockerfile`	`python:3.12-slim-bookworm`	~650 MB
`chautauqua-worker`	`Dockerfile.worker`	`python:3.12-slim-bookworm`	~1 GB
`chautauqua-worker-cuda`	`Dockerfile.worker.cuda`	`nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04`	~6 GB
`chautauqua-ui`	`ui/Dockerfile`	`node:20-alpine`	~690 MB

# Build all images
docker compose build

# Build a single image
docker compose build api
docker compose build worker
docker compose build ui

# Rebuild after changing pyproject.toml, uv.lock, package.json, or pnpm-lock.yaml
docker compose build api ui worker && docker compose up -d

The API and worker images pin uv to v0.10.4 (matching the host lockfile format). If you upgrade uv locally, update the FROM ghcr.io/astral-sh/uv: line in both Dockerfiles.

The API and worker images include ffmpeg and SoX for recorder/STT normalization, audio stitching, and M4B composition fallback. The Kokoro CPU variant (--profile cpu) builds with INSTALL_KOKORO_CPU=true for PyTorch-based inference, which increases image size. The Piper variant (--profile piper) builds with INSTALL_PIPER_CPU=true for small ONNX Runtime CPU inference.

Lockfile management

This subtree carries its own uv.lock so it can be built standalone from a zip or a subtree-only checkout — no parent workspace required. When this directory is cloned as part of the larger audiobook-generator workspace, uv prefers the parent's audiobook-generator/uv.lock (workspace rules) and the local chautauqua/uv.lock is dormant. Inside Docker the build context is just the chautauqua subtree, so the local lock is what actually pins versions.

When you change anything in chautauqua/pyproject.toml (deps, extras, sources), regenerate both locks so they don't drift:

# 1. Parent workspace lock
cd /path/to/audiobook-generator
uv lock

# 2. Standalone chautauqua lock — copy to a temp dir so uv doesn't detect the
#    parent workspace, then run uv lock and copy the result back.
TMP=$(mktemp -d)
cp chautauqua/pyproject.toml "$TMP/"
cp -r chautauqua/chautauqua "$TMP/"
( cd "$TMP" && uv lock )
cp "$TMP/uv.lock" chautauqua/uv.lock
rm -rf "$TMP"

Both locks should resolve cleanly with uv lock --check.

MLX workers cannot run in Docker on macOS — Metal GPU is inaccessible inside Docker's Linux VM. Run them on the host via ./dev.sh up --mlx or directly:

python -m chautauqua.temporal.worker gpu-tts-mlx-kokoro \
    --backend mlx --model kokoro --temporal-address localhost:7233

Local orchestration (`dev.sh`)

./dev.sh up --mlx            # Docker stack + host MLX workers
./dev.sh up --cpu            # Docker stack + Kokoro CPU worker (PyTorch)
./dev.sh up --piper          # Docker stack + Piper ONNX CPU worker
./dev.sh down                # stop Docker + host workers
./dev.sh restart --mlx       # full stop/start cycle
./dev.sh rebuild --mlx       # rebuild Docker images, then restart
./dev.sh status              # Docker + worker status
./dev.sh worker-restart kokoro

Host worker PIDs live under .dev/run/ and logs under .dev/logs/.

Development

uv run pytest                               # all tests
uv run pytest -m "not slow"                 # skip model-loading tests
cd ui && pnpm typecheck && pnpm test        # frontend type check + vitest

See CLAUDE.md for architecture, conventions, and full docs index.

Docs

File	What
docs/SPEC.md	Product spec (architecture, modules, phases)
docs/SPEC_BNM.md	BNM format (syntax, validation, plugins)
docs/FLOW.md	API lifecycle per workflow phase
docs/guides/transformer.md	T1-T6 LLM pipeline design
docs/guides/mlx.md	MLX backend setup + model presets
docs/guides/bnm-mvp.md	MVP BNM constraints

License

See LICENSE.

README.md Unescape Escape