Chautauqua: self-hostable audiobook pipeline (BNM, multi-backend TTS, M4B export). Mirror of github.com/elog08/chautauqua.
  • Python 65.2%
  • TypeScript 30.9%
  • Shell 2.6%
  • Perl 1%
  • CSS 0.1%
Find a file
Eyasu Kifle 32e1751472
Some checks are pending
runner smoke / hello (push) Successful in 1s
build-images / build (., Dockerfile.worker, chautauqua-worker) (push) Waiting to run
build-images / build (ui, ui/Dockerfile, chautauqua-ui) (push) Waiting to run
build-images / build (., Dockerfile, chautauqua-api) (push) Has started running
fix(ui): pin pnpm to v9 in Dockerfile
corepack now resolves pnpm@latest to v11.0.8, which requires Node.js
v22+. The image is based on node:20-alpine, and the lockfile was
generated by pnpm v9 (lockfileVersion 9.0). Pin to pnpm@9 to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:24:25 -07:00
.claude .claude/scheduled_tasks.lock 2026-05-07 12:37:11 -07:00
.forgejo/workflows ci: add build-images workflow for Forgejo Actions 2026-05-07 13:58:40 -07:00
.github/workflows .github/workflows/ui.yml ui/src/components/ui/error-boundary.tsx ui/src/router.tsx 2026-04-25 20:09:41 -07:00
.kilo/plans .kilo/plans/1778091706716-swift-canyon.md chautauqua/api/routes/ingest.py docs/BNM_SPEC_TRACKER.md 2026-05-06 14:44:49 -07:00
.playwright-mcp .playwright-mcp/console-2026-04-29T18-20-41-136Z.log ui/src/pages/preplan-page.tsx 2026-04-29 11:46:34 -07:00
chautauqua chautauqua/api/app.py chautauqua/api/routes/transcribe.py chautauqua/temporal/worker.py 2026-05-07 13:28:52 -07:00
cuda_tts feat(cli): add voice-map command with multi-backend voice design injection 2026-05-06 11:16:36 -07:00
docs README.md chautauqua/api/routes/jobs.py chautauqua/cli.py 2026-05-07 11:46:51 -07:00
gemini_tts feat(spec): implement BNM v0.4 with expanded direction metadata and scene containers 2026-05-07 10:06:03 -07:00
input test(cuda): add BNM test document for CUDA rendering integration 2026-05-05 14:44:28 -07:00
mlx_tts .claude/settings.json mlx_tts/voice_map.py 2026-05-02 09:29:11 -07:00
scripts feat(spec): implement BNM v0.4 with expanded direction metadata and scene containers 2026-05-07 10:06:03 -07:00
tempdocs .env.example .kilo/plans/1778011014440-brave-meadow.md .kilo/plans/1778013087524-playful-star.md 2026-05-05 15:44:06 -07:00
tests chautauqua/api/app.py chautauqua/api/routes/transcribe.py chautauqua/temporal/worker.py 2026-05-07 13:28:52 -07:00
ui fix(ui): pin pnpm to v9 in Dockerfile 2026-05-07 14:24:25 -07:00
vllm_tts chautauqua/backends/mlx_adapter.py chautauqua/backends/vllm_adapter.py chautauqua/backends/voxtral_adapter.py 2026-04-17 11:12:01 -07:00
voxtral_tts ui/src/pages/voices-page.tsx voxtral_tts/voxtral_tts.py 2026-04-26 11:03:39 -07:00
.dockerignore .dockerignore Dockerfile dev.sh 2026-04-25 20:18:54 -07:00
.env.example .env.example .kilo/plans/1778011014440-brave-meadow.md .kilo/plans/1778013087524-playful-star.md 2026-05-05 15:44:06 -07:00
.env.gpu.example .env.gpu.example .gitignore docker-compose.gpu-worker.yml 2026-05-05 12:24:36 -07:00
.gitignore .env.gpu.example .gitignore docker-compose.gpu-worker.yml 2026-05-05 12:24:36 -07:00
AGENTS.md AGENTS.md CLAUDE.md README.md 2026-04-27 10:20:48 -07:00
CLAUDE.md CLAUDE.md README.md 2026-04-30 00:58:30 -07:00
dev.sh README.md chautauqua/api/worker_availability.py dev.sh 2026-04-29 22:46:59 -07:00
docker-compose.gpu-worker.yml feat: add STT worker support to docker builds and compose 2026-05-06 14:21:46 -07:00
docker-compose.yml feat: add STT worker support to docker builds and compose 2026-05-06 14:21:46 -07:00
Dockerfile Dockerfile Dockerfile.worker README.md 2026-05-07 13:48:33 -07:00
Dockerfile.worker Dockerfile Dockerfile.worker README.md 2026-05-07 13:48:33 -07:00
Dockerfile.worker.cuda feat: add STT worker support to docker builds and compose 2026-05-06 14:21:46 -07:00
edit-page-toolbar.png .playwright-mcp/console-2026-04-29T08-08-14-243Z.log .playwright-mcp/console-2026-04-29T08-09-47-159Z.log .playwright-mcp/console-2026-04-29T08-10-22-982Z.log 2026-04-29 01:11:30 -07:00
edit-page-with-buttons.png .playwright-mcp/console-2026-04-29T08-08-14-243Z.log .playwright-mcp/console-2026-04-29T08-09-47-159Z.log .playwright-mcp/console-2026-04-29T08-10-22-982Z.log 2026-04-29 01:11:30 -07:00
preplan-fixed.png .playwright-mcp/console-2026-04-29T08-11-10-865Z.log .playwright-mcp/console-2026-04-29T18-18-18-753Z.log .playwright-mcp/console-2026-04-29T18-19-15-272Z.log 2026-04-29 11:21:36 -07:00
preplan-with-cast.png .playwright-mcp/console-2026-04-29T08-11-10-865Z.log .playwright-mcp/console-2026-04-29T18-18-18-753Z.log .playwright-mcp/console-2026-04-29T18-19-15-272Z.log 2026-04-29 11:21:36 -07:00
pyproject.toml feat: add STT worker support to docker builds and compose 2026-05-06 14:21:46 -07:00
README.md Dockerfile Dockerfile.worker README.md 2026-05-07 13:48:33 -07:00
requirements-voxtral.txt feat(tts): integrate TTS backends for MLX, Voxtral, and vLLM with voice mapping and utilities 2026-04-16 12:34:23 -07:00
uv.lock chore: update lockfile for STT extra 2026-05-06 14:27:40 -07:00

Chautauqua

Self-hostable audiobook pipeline: raw text in, chaptered M4B out. Cast differentiation, incremental caching, multiple TTS backends.

System requirements

Minimum

Component Requirement
OS macOS 13+, Ubuntu 22.04+, or Windows 11 (WSL2)
Python 3.12+
RAM 4 GB (cloud backends only)
Disk 5 GB (Python deps + Docker images)
Docker 24+ with Compose V2 (for the full stack)
Use case RAM Disk Notes
Cloud TTS only (Voxtral, Gemini) 4 GB 5 GB Fastest setup, needs API keys
Piper ONNX CPU 4 GB 6 GB Small local CPU voices, downloaded on demand
CPU TTS (Kokoro via PyTorch) 8 GB 10 GB No GPU needed, slower inference
MLX local (Kokoro) 8 GB 6 GB Apple Silicon only, fast
MLX local (Chatterbox / Dia) 16 GB 10 GB Voice cloning, expressive
MLX local (Voxtral 4B) 16 GB 15 GB Multilingual, 20 voices
MLX local (kugelaudio 7B) 32 GB 25 GB SOTA quality, 24 languages
vLLM remote (NVIDIA GPU) 8 GB host 5 GB host GPU server needs 8+ GB VRAM

MLX model weights and Piper ONNX voices are downloaded on first use to ~/.cache/huggingface/. The disk estimates above include model and voice weights.

How it works

text -> Ingest -> BNM -> Transform -> directed BNM -> Pre-plan -> voice map
                                                                      |
                                                        Render <------+
                                                          |
                                                    chaptered M4B
Stage Input Output
Ingest Plain text book.bnm.md + book.lock.yaml
Transform BNM Directed BNM (LLM-enriched stage directions)
Pre-plan Directed BNM Approved voice map
Render BNM + voice map Chaptered M4B + per-cue WAVs

Cache-aware: same text + model + voice = cache hit. Editing one sentence re-renders only that cue.

Quick start

Prerequisites (all platforms)

  • Python 3.12+
  • uv (Python package manager)
  • Docker and Docker Compose (for full stack)
  • ffmpeg (for M4B assembly)
  • SoX (optional fallback for WAV concatenation if ffmpeg concat fails)
  • Node.js 20+ and pnpm (for the web UI)

macOS (Apple Silicon)

Apple Silicon Macs can run the MLX backend natively for fast local TTS with no cloud API keys needed.

# 1. Install system deps
brew install uv ffmpeg sox node
npm install -g pnpm

# 2. Clone and install
git clone <repo-url> && cd chautauqua
uv sync && uv sync --extra mlx

# 3. Set up environment
cp .env.example .env
# Edit .env — defaults work for local dev (see Environment Variables below)

# 4. Start the full stack (Docker services + host MLX workers)
./dev.sh up --mlx

# 5. Open the web UI
open http://localhost:5173

The --mlx flag tells dev.sh to start MLX TTS workers on the host (Metal GPU is not accessible inside Docker). Docker handles Redis, Temporal, MinIO, the API server, and the web UI.

CLI-only (no Docker):

uv sync && uv sync --extra mlx
uv run chautauqua ingest book.txt --auto --output-dir output
uv run chautauqua render output/book.bnm.md --backend mlx --model kokoro --output-dir output

Linux

Linux machines can use the local CPU backends, the vLLM backend with an NVIDIA GPU, or cloud backends (Voxtral, Gemini).

# 1. Install system deps
# Debian/Ubuntu:
sudo apt update && sudo apt install -y ffmpeg sox
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Clone and install
git clone <repo-url> && cd chautauqua
uv sync

# 3. Set up environment
cp .env.example .env
# Edit .env — see Environment Variables below

# 4. Start the stack with the small Piper ONNX CPU TTS worker
docker compose --profile piper up -d

# 5. Open the web UI
xdg-open http://localhost:5173

With an NVIDIA GPU (vLLM):

# Start a vLLM server on the GPU host (see docs/guides/mlx.md for model setup)
# Then set VLLM_SERVER_URL in .env and:
docker compose --profile vllm up -d

With cloud TTS (no GPU needed):

# Voxtral (Mistral API) — set MISTRAL_API_KEY in .env
docker compose --profile voxtral up -d

# Gemini (Google) — set GEMINI_API_KEY in .env
docker compose --profile gemini up -d

With Piper ONNX CPU (small local voices):

docker compose --profile piper up -d

Piper voice files are resolved from rhasspy/piper-voices and downloaded on first render.

With Kokoro CPU (PyTorch):

docker compose --profile cpu up -d

Kokoro CPU uses CPU-only PyTorch wheels. It avoids CUDA packages, but still downloads Torch because Kokoro depends on PyTorch.

Windows

Windows support is through WSL2 (Windows Subsystem for Linux). Native Windows is not supported.

# 1. Install WSL2 (PowerShell as admin, then restart)
wsl --install

After restarting, open your WSL2 terminal (Ubuntu by default):

# 2. Install system deps inside WSL2
sudo apt update && sudo apt install -y ffmpeg sox
curl -LsSf https://astral.sh/uv/install.sh | sh

# 3. Install Docker Desktop for Windows and enable the WSL2 backend
#    https://docs.docker.com/desktop/install/windows-install/
#    In Docker Desktop Settings > Resources > WSL Integration, enable your distro.

# 4. Clone and install
git clone <repo-url> && cd chautauqua
uv sync

# 5. Set up environment
cp .env.example .env
# Edit .env — see Environment Variables below

# 6. Start the stack with the small Piper ONNX CPU TTS worker
docker compose --profile piper up -d

# 7. Open the web UI
explorer.exe http://localhost:5173

For NVIDIA GPU support on Windows, install the NVIDIA CUDA drivers for WSL2 and follow the Linux vLLM instructions above.

Environment variables

Copy .env.example to .env and configure:

Storage

Variable Default Description
STORAGE_BACKEND local local for filesystem, minio for S3-compatible storage
CHAUTAUQUA_STORAGE_ROOT ~/.chautauqua/storage Root directory when using local storage
MINIO_ENDPOINT localhost:9000 MinIO server address (Docker sets this automatically)
MINIO_ACCESS_KEY minioadmin MinIO access key
MINIO_SECRET_KEY minioadmin MinIO secret key

TTS backends

Variable Required for Description
MISTRAL_API_KEY Voxtral backend API key from Mistral
GEMINI_API_KEY Gemini backend API key from Google AI Studio
HF_TOKEN MLX model downloads HuggingFace token for gated model access
VLLM_SERVER_URL vLLM backend URL of your vLLM inference server (e.g. http://gpu-host:8000)

Infrastructure

Variable Default Description
TEMPORAL_ADDRESS localhost:7233 Temporal server gRPC endpoint
TEMPORAL_NAMESPACE default Temporal namespace
REDIS_URL Redis connection string (e.g. redis://localhost:6379/0). Persists job state across restarts

LLM (for Ingest and Transform)

Variable Default Description
LLM_BASE_URL OpenAI-compatible API base URL (e.g. https://api.openai.com/v1)
LLM_API_KEY API key for the LLM endpoint
LLM_MODEL Model name (e.g. gpt-4o-mini)

Tip: For local dev without Docker, only STORAGE_BACKEND=local is required. Everything else is optional depending on which backends and features you use.

Install extras

uv sync                      # core
uv sync --extra mlx          # Apple Silicon TTS (Kokoro, Chatterbox, Qwen3-TTS)
uv sync --extra kokoro-cpu   # CPU-only Kokoro (PyTorch, slower)
uv sync --extra kokoro-gpu   # Kokoro via PyTorch on CUDA
uv sync --extra piper-cpu    # Piper ONNX CPU voices
uv sync --extra vllm         # remote CUDA vLLM server
uv sync --extra voxtral      # Mistral Voxtral cloud API
uv sync --extra gemini       # Google Gemini cloud TTS
uv sync --extra stt          # Whisper STT for word alignment and batch splitting
uv sync --extra temporal     # Temporal workflow engine
uv sync --extra ingest       # spaCy NLP for ingest
uv sync --extra transform    # LLM-based transform pipeline
uv sync --extra convert      # PDF/EPUB -> text conversion
uv sync --extra convert-ocr  # + OCR support (Tesseract)
uv sync --extra convert-ml   # + ML-based conversion (Marker, Docling)
uv sync --extra redis        # Redis job state persistence
uv sync --extra storage-minio # MinIO S3 storage
uv sync --extra dev          # pytest, hypothesis
uv sync --extra all          # everything (except convert-ocr and convert-ml)

CLI

uv run chautauqua ingest book.txt --auto                           # text -> BNM
uv run chautauqua render book.bnm.md --backend mlx --model kokoro  # BNM -> audio
uv run chautauqua voices list --backend mlx --model kokoro
uv run chautauqua voices list --backend piper --model piper
uv run chautauqua validate book.bnm.md
uv run chautauqua doctor
uv run chautauqua serve                                            # web UI + API on :8080

CLI smoke test

Use the included fixtures to verify the command-line generation path without Docker.

# Validate a known-good BNM file.
uv run chautauqua validate fixtures/tiny.bnm.md --summary

# Compile the BNM into render metadata and cue prompts.
uv run chautauqua compile fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --output-dir /tmp/chautauqua-cli-compile

# Exercise the render planner without loading a TTS model.
uv run chautauqua render fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --limit 1 \
  --dry-run

On Apple Silicon with MLX installed, run one real cue render:

uv run chautauqua render fixtures/tiny.bnm.md \
  --backend mlx \
  --model kokoro \
  --limit 1 \
  --force \
  --storage local \
  --storage-root /tmp/chautauqua-cli-storage \
  --output-dir /tmp/chautauqua-cli-render

Expected outputs include:

  • Per-cue WAV: /tmp/chautauqua-cli-render/<job_id>/cue-0001.wav
  • Stitched chapter WAV: /tmp/chautauqua-cli-render/<job_id>/chapters/chapter-01.wav
  • Final M4B: /tmp/chautauqua-cli-render/<job_id>/final/Tiny Test Book.m4b
  • Stored artifact copy: /tmp/chautauqua-cli-storage/chautauqua-artifacts/<job_id>/cues/cue-0001.wav

To test raw text to BNM generation:

uv run chautauqua ingest fixtures/simple-dialogue.txt \
  --auto \
  --output-dir /tmp/chautauqua-cli-ingest

uv run chautauqua validate /tmp/chautauqua-cli-ingest/simple-dialogue.bnm.md --summary

For the ONNX CPU backend:

uv sync --extra piper-cpu
uv run chautauqua voices list --backend piper --model piper
uv run chautauqua render fixtures/tiny.bnm.md \
  --backend piper \
  --model piper \
  --limit 1

The default voice is en_US-lessac-medium. Other built-in aliases include amy, amy-low, and ryan; the corresponding .onnx and .onnx.json files download from Hugging Face on first use.

Backends

Backend Flag Hardware API key needed
MLX --backend mlx Apple Silicon No
Kokoro CPU --backend cpu Any CPU No
Kokoro CUDA --backend cuda NVIDIA CUDA No
Piper ONNX CPU --backend piper Any CPU No
vLLM --backend vllm NVIDIA GPU (remote server) No
Voxtral --backend voxtral Cloud MISTRAL_API_KEY
Gemini --backend gemini Cloud GEMINI_API_KEY

Two render modes: Single (default) and Overlay (narrator base + character dialogue spliced via Whisper alignment).

Backend selector (UI)

The web UI groups backends by compute tier, then offers a model dropdown per tier. This differs from the flat --backend flag taxonomy above — at the CLI each row is its own backend; in the UI cloud vendors are split out and Piper sits under CPU as a model.

UI tier Wire backend Models exposed
Cloud — Gemini gemini gemini-3.1-flash-tts-preview, gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts
Cloud — Voxtral voxtral voxtral-mini-tts
MLX (Apple GPU) mlx kokoro, chatterbox, voxtral (local)
CUDA (NVIDIA GPU) cuda kokoro
vLLM (remote GPU) vllm kokoro, voxtral-mini-tts
CPU cpu kokoro, piper

Selecting CPU × piper in the UI translates to wire --backend piper --model piper at the API boundary, so jobs land on the existing gpu-tts-piper-piper queue. Everything else passes through with the wire backend matching the tier name. The taxonomy lives in ui/src/lib/backend-options.ts.

Which backend should I use?

  • Just want to try it out? Use gemini or voxtral — cloud-based, no hardware requirements, sign up for a free API key and go.
  • Apple Silicon Mac (M1/M2/M3/M4)? Use mlx with the kokoro model for the best speed/quality tradeoff. Upgrade to chatterbox for voice cloning or voxtral (the MLX model, not the API) for multilingual support.
  • Linux with NVIDIA GPU? Either run cuda (Kokoro on PyTorch CUDA — docker compose --profile cuda up -d, see docs/guides/cuda.md) or set up a vllm server for higher-throughput batch rendering.
  • No GPU, no API key? Use piper for the smallest local ONNX path, or cpu with kokoro for higher quality PyTorch inference.
  • Production audiobooks? Start with kokoro for drafting, then re-render final output with chatterbox or kugelaudio for higher quality.

Example: mid-range PC (16 GB RAM, integrated GPU, quad-core CPU)

AMD/Intel integrated graphics (Vega, UHD, etc.) are not supported by any TTS backend — MLX needs Apple Silicon and vLLM needs NVIDIA CUDA. Three good options:

Option A — Cloud TTS (recommended). Offload rendering to Gemini or Voxtral. Your PC runs only the orchestration stack (Docker), which is lightweight. Best quality-per-dollar and fastest turnaround.

cp .env.example .env
# Add your API key:
#   GEMINI_API_KEY=your_key    (free tier available at aistudio.google.com)
#   — or —
#   MISTRAL_API_KEY=your_key   (free tier available at console.mistral.ai)

docker compose --profile gemini up -d    # or --profile voxtral

Option B — Piper ONNX CPU. Runs small local Piper voices with no API key. This is the lightest local backend and downloads voice files on first use.

docker compose --profile piper up -d

Option C — Kokoro CPU. Runs Kokoro inference on your CPU with no API key. Expect ~5-10x real-time on a quad-core (a 1-hour audiobook takes 5-10 hours to render). Good for offline/batch work or if you prefer not to use cloud APIs.

docker compose --profile cpu up -d

You can also mix backends: use piper for fast local checks, cpu with kokoro for Kokoro previews, and gemini or voxtral for the final render.

BNM format

Intermediate representation — Markdown + YAML front matter:

---
bnm: "0.3"
title: "Bartleby, the Scrivener"
cast:
  narrator:
    preferred:
      kokoro: { voice: am_adam, lang_code: a }
---
:::chapter {#ch-001 title="Chapter I"}
:::cue {#cue-001 speaker="narrator"}
I am a rather elderly man.
:::
:::

Full spec: docs/SPEC_BNM.md

Web UI + API

uv run chautauqua serve          # API on :8080
cd ui && pnpm install && pnpm dev # UI dev server on :5173 (proxies /api to :8080)
Endpoint Method Purpose
/api/jobs GET / POST List / create jobs
/api/jobs/{id}/progress GET (SSE) Live progress stream
/api/jobs/{id}/pause POST Pause / resume / cancel
/api/ingest/upload POST Upload text for ingest
/api/preplan/{id} GET / POST Preplan status / approve
/api/voices GET List voices
/api/voices/sample POST Render a voice sample
/api/artifacts/{id}/{path} GET Download artifacts

Docker Compose

The docker-compose.yml provides the full stack. Core services start by default; TTS workers are activated via profiles:

docker compose up -d                        # core (Redis, Temporal, MinIO, API, UI, general worker)
docker compose --profile cpu up -d          # + Kokoro CPU TTS worker (PyTorch)
docker compose --profile cuda up -d         # + Kokoro CUDA worker (NVIDIA GPU, see docs/guides/cuda.md)
docker compose --profile piper up -d        # + Piper ONNX CPU worker
docker compose --profile voxtral up -d      # + Voxtral cloud TTS worker
docker compose --profile vllm up -d         # + vLLM remote GPU worker
docker compose --profile gemini up -d       # + Gemini cloud TTS worker
docker compose --profile stt up -d          # + Whisper STT worker on CPU
docker compose --profile stt-cuda up -d     # + Whisper STT worker on NVIDIA CUDA

The STT workers poll audiobook-stt and are used for Whisper-heavy work such as listen-along word alignment and marker-based one-shot batch splitting.

Service Port Description
API localhost:8080 FastAPI server
UI localhost:5173 Vite dev server
Redis localhost:6379 Job state persistence
Temporal localhost:7233 Workflow orchestration (gRPC)
Temporal UI localhost:8233 Temporal web dashboard
MinIO S3 localhost:9000 Object storage API
MinIO Console localhost:9001 Object storage web UI

Building images

Four Dockerfiles, all built with the chautauqua/ directory as the build context — the chautauqua subtree is fully self-contained (its own pyproject.toml and uv.lock) and builds without needing a parent workspace:

Image Dockerfile Base Size
chautauqua-api Dockerfile python:3.12-slim-bookworm ~650 MB
chautauqua-worker Dockerfile.worker python:3.12-slim-bookworm ~1 GB
chautauqua-worker-cuda Dockerfile.worker.cuda nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 ~6 GB
chautauqua-ui ui/Dockerfile node:20-alpine ~690 MB
# Build all images
docker compose build

# Build a single image
docker compose build api
docker compose build worker
docker compose build ui

# Rebuild after changing pyproject.toml, uv.lock, package.json, or pnpm-lock.yaml
docker compose build api ui worker && docker compose up -d

The API and worker images pin uv to v0.10.4 (matching the host lockfile format). If you upgrade uv locally, update the FROM ghcr.io/astral-sh/uv: line in both Dockerfiles.

The API and worker images include ffmpeg and SoX for recorder/STT normalization, audio stitching, and M4B composition fallback. The Kokoro CPU variant (--profile cpu) builds with INSTALL_KOKORO_CPU=true for PyTorch-based inference, which increases image size. The Piper variant (--profile piper) builds with INSTALL_PIPER_CPU=true for small ONNX Runtime CPU inference.

Lockfile management

This subtree carries its own uv.lock so it can be built standalone from a zip or a subtree-only checkout — no parent workspace required. When this directory is cloned as part of the larger audiobook-generator workspace, uv prefers the parent's audiobook-generator/uv.lock (workspace rules) and the local chautauqua/uv.lock is dormant. Inside Docker the build context is just the chautauqua subtree, so the local lock is what actually pins versions.

When you change anything in chautauqua/pyproject.toml (deps, extras, sources), regenerate both locks so they don't drift:

# 1. Parent workspace lock
cd /path/to/audiobook-generator
uv lock

# 2. Standalone chautauqua lock — copy to a temp dir so uv doesn't detect the
#    parent workspace, then run uv lock and copy the result back.
TMP=$(mktemp -d)
cp chautauqua/pyproject.toml "$TMP/"
cp -r chautauqua/chautauqua "$TMP/"
( cd "$TMP" && uv lock )
cp "$TMP/uv.lock" chautauqua/uv.lock
rm -rf "$TMP"

Both locks should resolve cleanly with uv lock --check.

MLX workers cannot run in Docker on macOS — Metal GPU is inaccessible inside Docker's Linux VM. Run them on the host via ./dev.sh up --mlx or directly:

python -m chautauqua.temporal.worker gpu-tts-mlx-kokoro \
    --backend mlx --model kokoro --temporal-address localhost:7233

Local orchestration (dev.sh)

./dev.sh up --mlx            # Docker stack + host MLX workers
./dev.sh up --cpu            # Docker stack + Kokoro CPU worker (PyTorch)
./dev.sh up --piper          # Docker stack + Piper ONNX CPU worker
./dev.sh down                # stop Docker + host workers
./dev.sh restart --mlx       # full stop/start cycle
./dev.sh rebuild --mlx       # rebuild Docker images, then restart
./dev.sh status              # Docker + worker status
./dev.sh worker-restart kokoro

Host worker PIDs live under .dev/run/ and logs under .dev/logs/.

Development

uv run pytest                               # all tests
uv run pytest -m "not slow"                 # skip model-loading tests
cd ui && pnpm typecheck && pnpm test        # frontend type check + vitest

See CLAUDE.md for architecture, conventions, and full docs index.

Docs

File What
docs/SPEC.md Product spec (architecture, modules, phases)
docs/SPEC_BNM.md BNM format (syntax, validation, plugins)
docs/FLOW.md API lifecycle per workflow phase
docs/guides/transformer.md T1-T6 LLM pipeline design
docs/guides/mlx.md MLX backend setup + model presets
docs/guides/bnm-mvp.md MVP BNM constraints

License

See LICENSE.