claude-code-api
Python wrapper around the claude CLI for subscription-mode (no API key)
backends. Drives one long-running interactive claude per conversation via
a PTY and reads events from the JSONL session file; the public surface is
Anthropic-Messages-API shaped so a gateway in front of it is a one-liner
serializer away.
Not affiliated with Anthropic. You need a working subscription, the
claude CLI on PATH, and to have run claude /login once.
Install
As a library inside another project:
uv add "claude-code-api @ git+https://git.kotikot.com/beaver/claude-code-api"
The runtime needs only ptyprocess.
Use
import asyncio
from claude_code_api import BackendOptions, ClaudeCodeBackend
async def main() -> None:
opts = BackendOptions(cwd="/path/to/project", dangerously_skip_permissions=True)
async with ClaudeCodeBackend(opts) as backend:
async for event in backend.complete(
[{"role": "user", "content": "say hi"}]
):
print(event)
asyncio.run(main())
Multi-turn works by construction — append the assistant reply + a fresh
user message to the same messages list and call complete() again. The
backend fingerprints messages[:-1], finds the live PTY from the previous
turn, and reuses it (so the server-side prompt cache stays warm):
history = [{"role": "user", "content": "remember Beaver"}]
async for ev in backend.complete(history): ...
history += [
{"role": "assistant", "content": [{"type": "text", "text": "OK"}]},
{"role": "user", "content": "what was the codeword?"},
]
async for ev in backend.complete(history): ...
Public surface
Events (Anthropic-style, vendored to keep the dep tree empty):
AssistantMessage, UserMessage, SystemMessage, ResultMessage,
TextBlock, ThinkingBlock, ToolUseBlock, ToolResultBlock.
Errors: BackendError (root), AuthError, ProcessError,
CLINotFoundError, RateLimitError, SessionError, MessageParseError.
Backend: ClaudeCodeBackend(opts).complete(messages) is an async
generator of events. BackendOptions exposes model / system prompt /
allowed-tools / mcp_servers / permission mode / history injection mode.
Lower layers (PtyClaudeProcess, JsonlWatcher, TurnManager,
normalize) are re-exported for callers that want to assemble their own
session orchestration.
How a turn works
- The backend looks up a live session by
hash_history(messages[:-1]). If one matches, the new user message goes straight into its PTY. - If nothing matches and
messages[:-1]is empty, a freshclaudeis spawned with a brand-new--session-id. - If
messages[:-1]is non-empty (a continuation we don't have a live PTY for — e.g. after restart), the backend writes a hand-crafted JSONL transcript at~/.claude/projects/<key>/<id>.jsonland spawnsclaude --resume <id>. That is thenative_jsonlinjection mode; the fallback isconcat_message, which folds the prior history into one large first prompt. - The PTY's stdout is drained continuously by a background thread; we
never read events from there. The JSONL file is tailed at 100ms
cadence and each new record is normalized into a typed
Event. - The turn closes on the first
assistantrecord withstop_reason ∈ {end_turn, max_tokens, stop_sequence, refusal}. AResultMessageis synthesized from itsusageand yielded last.
Examples
examples/basic_usage.py— one turn, realclaude.examples/multi_turn.py— two turns sharing one live PTY.examples/mcp_tool.py— wire up the bundled echo MCP server and let the model call it.
Tests
uv run pytest # unit tests (fast, no real claude)
RUN_CLAUDE_SMOKE=1 uv run pytest tests/test_pty.py tests/test_turn.py tests/test_backend.py
The smoke-marked tests spawn a real claude process and need a logged-in
subscription on the host.