llmadapter

Use-Case Matrix

This matrix answers a different question than docs/PROVIDER_MATRIX.md.

PROVIDER_MATRIX.md describes endpoint implementation evidence: what a provider endpoint can encode, decode, route, and smoke-test.

This document describes workload suitability: whether a specific model through a specific provider endpoint is suitable for a use case such as agentic coding.

The current implementation provides the compatibility vocabulary, evaluator, adapterconfig bridge, CLI inspection, library filtering helpers, and a live agentic-coding e2e matrix. The latest recorded live evidence is stored in docs/compatibility/agentic_coding.json.

The result table below is generated from that JSON artifact. Refresh it with:

go run ./cmd/llmadapter compatibility-record --use-case agentic_coding

Use Cases

Use case Purpose
agentic_coding Coding-agent runtime requiring tools, tool continuation, prompt caching, structured output, and usage accounting; reasoning is optional evidence for thinking-model filters.
summarization Text generation or summarization where tools, reasoning, and prompt caching are optional.

Agentic Coding Requirements

Feature Requirement Notes
Streaming text required The client must receive incremental output.
Tools required The model/provider path must support tool calls.
Tool continuation required Tool results must be sendable back into the same API family.
Structured output required JSON mode/schema or tool schemas can carry structured data.
Reasoning optional Thinking/reasoning is recorded when observable and can be used by consumers that want reasoning-only model lists.
Prompt caching required llmadapter must be able to encode useful cache controls.
Usage required Usage events must be mapped when the provider reports usage.
Cache accounting required Provider-reported cache write/read counters are mandatory for agentic coding cost tracking.
Pricing preferred modeldb-backed pricing is preferred.
Gateway optional The mux/library path is enough for agentic coding; gateway coverage remains useful operator evidence.

Current CLI

Evaluate one model:

go run ./cmd/llmadapter compatibility --use-case agentic_coding --model anthropic/claude-haiku-4-5-20251001

Resolve and annotate candidates:

go run ./cmd/llmadapter resolve anthropic/claude-haiku-4-5-20251001 --use-case agentic_coding

Use JSON for consumers:

go run ./cmd/llmadapter compatibility --use-case agentic_coding --model anthropic/claude-haiku-4-5-20251001 --json

The CLI uses the same adapterconfig and modeldb-backed candidate resolution as resolve, infer, gateway, and mux construction. It does not perform a separate model lookup.

Strict approved-only selection is available through modeldb runtime views plus this live evidence artifact:

go run ./cmd/llmadapter resolve anthropic/claude-haiku-4-5-20251001 --use-case agentic_coding --approved-only

Library consumers can use adapterconfig.SelectModelForUseCase or AutoResult.SelectModelForUseCase with LoadCompatibilityEvidence. This fails closed unless a configured provider instance, API kind, and native model match an approved row.

The generated Transport column records the transport observed by the workload compatibility run. It is not a routing requirement unless the use case says so. Codex WebSocket continuation/cache behavior is tracked separately in docs/PROVIDER_MATRIX.md because it is a provider-internal optimization while the public Codex continuation contract remains replay.

llmadapter conformance now validates the approved agentic_coding rows as a strict contract. An approved row is only valid when all required workload checks are recorded as live evidence, cache accounting is live, and the artifact explicitly records consumer continuation, internal continuation, and transport. Reasoning remains recorded evidence, but non-thinking coding models can still be approved. This is the contract consumers such as agentsdk should trust when selecting models for coding-agent use.

Initial Candidate Set

These rows are covered by the live agentic-coding compatibility smoke test:

Public model Provider endpoint candidates
gpt-5.5 openai_responses, codex_responses, openrouter_responses
gpt-5.4 openai_responses, codex_responses, openrouter_responses
kimi-k2.6 openrouter_responses
glm-4.6 openrouter_responses
glm-4.7 openrouter_responses
qwen3-coder openrouter_responses
qwen3-coder-next openrouter_responses
deepseek-v3.2 openrouter_responses
haiku claude, anthropic, openrouter_messages
sonnet claude, anthropic, openrouter_messages
opus claude, anthropic, openrouter_messages
bedrock-haiku bedrock_converse
bedrock-sonnet-4-6 bedrock_converse
bedrock-opus-4-6 bedrock_converse
bedrock-opus-4-7 bedrock_converse
minimax-latest minimax_messages

Short names in this generated evidence artifact are modeldb/catalog or test-harness public model names, not llmadapter-owned built-in aliases. Runtime docs prefer service-qualified names or explicit operator aliases.

Latest Agentic-Coding Result

Latest command:

env GOCACHE=/tmp/go-cache TEST_INTEGRATION=1 go test ./tests/e2e -run TestUseCaseAgenticCoding -count=1 -v

Total duration: 95.062 seconds.

Candidate Provider endpoint Native model Continuation Transport Required checks Cache accounting Status Duration
anthropic_haiku anthropic claude-haiku-4-5-20251001 replay http_sse pass live approved 5.29s
anthropic_opus anthropic claude-opus-4-6 replay http_sse pass live approved 10.26s
anthropic_opus_4_7 anthropic claude-opus-4-7 replay http_sse pass live approved 9.12s
anthropic_sonnet anthropic claude-sonnet-4-6 replay http_sse pass live approved 7.98s
bedrock_converse_haiku bedrock_converse anthropic.claude-haiku-4-5-20251001-v1:0 replay http_sse pass live approved 7.20s
bedrock_converse_opus_4_6 bedrock_converse anthropic.claude-opus-4-6-v1 replay http_sse pass live approved 12.97s
bedrock_converse_opus_4_7 bedrock_converse anthropic.claude-opus-4-7 replay http_sse pass live approved 9.08s
bedrock_converse_sonnet_4_6 bedrock_converse anthropic.claude-sonnet-4-6 replay http_sse pass live approved 8.14s
claude_haiku claude claude-haiku-4-5-20251001 replay http_sse pass live approved 5.78s
claude_opus claude claude-opus-4-6 replay http_sse pass live approved 14.60s
claude_opus_4_7 claude claude-opus-4-7 replay http_sse pass live approved 8.46s
claude_sonnet claude claude-sonnet-4-6 replay http_sse pass live approved 16.83s
codex_gpt_5_4 codex_responses gpt-5.4 replay http_sse pass live approved 9.00s
codex_gpt_5_5 codex_responses gpt-5.5 replay http_sse pass live approved 13.29s
minimax_latest minimax_messages MiniMax-M2.7 replay http_sse pass live approved 27.49s
openai_gpt_5_4 openai_responses gpt-5.4 previous_response_id http_sse pass live approved 8.49s
openai_gpt_5_5 openai_responses gpt-5.5 previous_response_id http_sse pass live approved 12.19s
openrouter_deepseek_v3_2 openrouter_responses deepseek/deepseek-v3.2 replay http_sse pass live approved 37.37s
openrouter_glm_4_6 openrouter_responses z-ai/glm-4.6 replay http_sse pass live approved 71.69s
openrouter_glm_4_7 openrouter_responses z-ai/glm-4.7 replay http_sse pass live approved 52.58s
openrouter_gpt_5_4 openrouter_responses openai/gpt-5.4 replay http_sse pass live approved 8.45s
openrouter_gpt_5_5 openrouter_responses openai/gpt-5.5 replay http_sse pass live approved 17.68s
openrouter_haiku openrouter_messages anthropic/claude-haiku-4.5 replay http_sse pass live approved 6.48s
openrouter_kimi_k2_6 openrouter_responses moonshotai/kimi-k2.6 replay http_sse pass live approved 95.06s
openrouter_opus openrouter_messages anthropic/claude-opus-4.6 replay http_sse pass live approved 13.01s
openrouter_opus_4_7 openrouter_messages anthropic/claude-opus-4.7 replay http_sse pass live approved 7.75s
openrouter_qwen3_coder openrouter_responses qwen/qwen3-coder replay http_sse pass live approved 6.09s
openrouter_qwen3_coder_next openrouter_responses qwen/qwen3-coder-next replay http_sse pass live approved 6.59s
openrouter_sonnet openrouter_messages anthropic/claude-sonnet-4.6 replay http_sse pass live approved 29.28s

Status Meaning

Status Meaning
approved All required and preferred features have supporting evidence.
degraded Required features pass, but at least one preferred feature is unavailable or untested.
failed At least one required feature is unsupported.
untested At least one required feature lacks evidence.
unavailable The model/provider/API candidate cannot be resolved from the configured catalog/routes.

Current Result

The latest live run on 2026-05-03 passed all required agentic-coding checks for every row above:

env GOCACHE=/tmp/go-cache TEST_INTEGRATION=1 go test ./tests/e2e -run TestUseCaseAgenticCoding -count=1 -v

Cache accounting is mandatory for agentic coding. Every approved row reported provider cache write or cache read counters in this run.

Reasoning is optional for agentic coding. Consumers that need a thinking-model-only list should filter the evidence table for reasoning=live.

The same artifact passes llmadapter conformance: every approved row is also a valid approved row, and no approved row is missing required feature, continuation, or transport evidence.

OpenRouter documentation says prompt caching can report cached_tokens and cache_write_tokens in detailed usage. The adapter now decodes both Responses-style input_tokens_details and Chat/Completions-style prompt_tokens_details, which is required because OpenRouter can expose the latter shape on Responses-compatible streams.

Kimi uses OpenRouter model moonshotai/kimi-k2.6. GLM, Qwen, and DeepSeek rows use OpenRouter Responses models z-ai/glm-4.6, z-ai/glm-4.7, qwen/qwen3-coder, qwen/qwen3-coder-next, and deepseek/deepseek-v3.2. Sonnet and Opus rows use catalog/test-harness public model names that resolve to claude-sonnet-4-6 and claude-opus-4-6; these are not llmadapter-owned built-in aliases.