Python wrapper around the claude CLI for subscription-mode (no API key) backends. Drives one long-running interactive claude per conversation via a PTY and reads events from the JSONL session file; the public surface is Anthropic-Messages-API shaped so a gateway in front of it is a one-liner serializer away.

Not affiliated with Anthropic. You need a working subscription, the claude CLI on PATH, and to have run claude /login once.

Install

As a library inside another project:

uv add "claude-code-api @ git+https://git.kotikot.com/beaver/claude-code-api"

The runtime needs only ptyprocess.

Use

import asyncio
from claude_code_api import BackendOptions, ClaudeCodeBackend

async def main() -> None:
    opts = BackendOptions(cwd="/path/to/project", dangerously_skip_permissions=True)
    async with ClaudeCodeBackend(opts) as backend:
        async for event in backend.complete(
            [{"role": "user", "content": "say hi"}]
        ):
            print(event)

asyncio.run(main())

Multi-turn works by construction — append the assistant reply + a fresh user message to the same messages list and call complete() again. The backend fingerprints messages[:-1], finds the live PTY from the previous turn, and reuses it (so the server-side prompt cache stays warm):

history = [{"role": "user", "content": "remember Beaver"}]
async for ev in backend.complete(history): ...

history += [
    {"role": "assistant", "content": [{"type": "text", "text": "OK"}]},
    {"role": "user", "content": "what was the codeword?"},
]
async for ev in backend.complete(history): ...

Public surface

Events (Anthropic-style, vendored to keep the dep tree empty): AssistantMessage, UserMessage, SystemMessage, ResultMessage, TextBlock, ThinkingBlock, ToolUseBlock, ToolResultBlock.

Errors: BackendError (root), AuthError, ProcessError, CLINotFoundError, RateLimitError, SessionError, MessageParseError.

Backend: ClaudeCodeBackend(opts).complete(messages) is an async generator of events. BackendOptions exposes model / system prompt / allowed-tools / mcp_servers / permission mode / history injection mode.

Lower layers (PtyClaudeProcess, JsonlWatcher, TurnManager, normalize) are re-exported for callers that want to assemble their own session orchestration.

How a turn works

The backend looks up a live session by hash_history(messages[:-1]). If one matches, the new user message goes straight into its PTY.
If nothing matches and messages[:-1] is empty, a fresh claude is spawned with a brand-new --session-id.
If messages[:-1] is non-empty (a continuation we don't have a live PTY for — e.g. after restart), the backend writes a hand-crafted JSONL transcript at ~/.claude/projects/<key>/<id>.jsonl and spawns claude --resume <id>. That is the native_jsonl injection mode; the fallback is concat_message, which folds the prior history into one large first prompt.
The PTY's stdout is drained continuously by a background thread; we never read events from there. The JSONL file is tailed at 100ms cadence and each new record is normalized into a typed Event.
The turn closes on the first assistant record with stop_reason ∈ {end_turn, max_tokens, stop_sequence, refusal}. A ResultMessage is synthesized from its usage and yielded last.

Examples

examples/basic_usage.py — one turn, real claude.
examples/multi_turn.py — two turns sharing one live PTY.
examples/mcp_tool.py — wire up the bundled echo MCP server and let the model call it.

Tests

uv run pytest                  # unit tests (fast, no real claude)
RUN_CLAUDE_SMOKE=1 uv run pytest tests/test_pty.py tests/test_turn.py tests/test_backend.py

The smoke-marked tests spawn a real claude process and need a logged-in subscription on the host.