This document describes the current llmadapter architecture, package boundaries, dependency shape, known shortcomings, and roadmap for hardening the design.
docs/API_SURFACE.md records the v1 public package boundary. This document is the practical architecture review of the code as it exists now. Root-level DESIGN.md and PLAN.md are internal background notes rather than primary user documentation.
llmadapter adapts requests between downstream API compatibility surfaces and upstream provider API surfaces through a canonical request/event model.
The core design avoids direct M x N conversions between every endpoint and every provider:
downstream endpoint wire API
-> adapt.Request
-> unified.Request / unified.Event stream
-> provider endpoint wire API
The same stateless routing path is used by:
Conversation/session state intentionally lives above this repository, for example in agentsdk.
unified defines the canonical request, messages, content parts, tools, response formats, usage/cost accounting, extension bag, event stream, unified.Client, and Collect.
adapt defines API kind/family identifiers, strict/best-effort mapping concepts, endpoint/provider request envelopes, warnings, and generic codec/processor interfaces.
These packages should stay small, provider-neutral, and free of concrete provider imports.
pipeline contains generic event/request processors used to transform or enrich streams, including pricing processors.
transport contains byte-stream transport primitives, HTTP request execution, fake transports for tests, SSE/NDJSON parsing, retry/rate-limit wrappers, and extended compression support.
Providers may depend on pipeline and transport. Higher-level routing should not depend on provider wire details.
providers/* packages implement upstream provider endpoint clients that satisfy unified.Client.
Current provider endpoint families include:
anthropic, claude, openrouter_messages, minimax_messages.openai_chat, openrouter_chat, minimax_chat.openai_responses, openrouter_responses, codex_responses.Provider endpoint packages own provider-specific wire request encoding, streaming response decoding, auth behavior, and provider-specific request extensions.
Shared wire structs that are used on both sides of the adapter live outside provider implementation packages. Anthropic Messages wire types are in anthropicwire; the upstream Anthropic provider keeps aliases for compatibility, while the downstream /v1/messages endpoint imports the neutral wire package.
endpoints/* packages implement downstream HTTP compatibility surfaces:
/v1/chat/completions/v1/responses/v1/messagesEach endpoint decodes inbound HTTP requests into adapt.Request, then encodes canonical unified.Event streams back into the downstream API wire format.
Endpoint codecs should not own provider selection or provider credentials.
router defines:
ProviderEndpoint: provider name, exact API kind, API family, client, capabilities, priority, and tags.Route: selected provider endpoint plus public/native model mapping.StaticRouter: deterministic routing by source API, model, required capabilities, route weight, endpoint priority, and declaration order.Routing is intentionally endpoint-based:
Provider = who we talk to.
API kind = exact upstream wire protocol.
API family = compatibility shape.
Provider endpoint = provider + API kind + family + client + capabilities.
This matters for providers such as OpenRouter, MiniMax, Azure, Bedrock, Vertex, or Ollama, where one provider can expose multiple protocol surfaces.
gateway contains the generic HTTP handler that:
gatewayserver wires the three implemented HTTP endpoints to a shared router built from adapterconfig.
adapterconfig is the main construction boundary. It loads and validates JSON/env config, builds provider endpoints through providerregistry, loads the modeldb catalog with overlays, resolves requested models against catalog offerings, applies modeldb metadata and pricing wrappers, builds routers, and constructs the in-process mux client.
providerregistry lists supported provider endpoint descriptors and builds direct provider clients for a configured provider type.
muxclient exposes a stateless unified.Client over the same router/provider endpoint path used by the gateway.
examples/llmadapter.example.json is a load-tested operator config that exercises this construction boundary without requiring provider credentials during inspection.
Provider packages own the wire format for the protocol they originate. OpenAI Responses is the canonical implementation for the Responses request/event shape in providers/openai/responses. Compatible providers such as OpenRouter Responses and Codex Responses depend on that OpenAI Responses base and apply provider-specific adjustments as overlays:
previous_response_id.This prevents proprietary compatibility providers from becoming the accidental source of truth for the base OpenAI Responses schema.
llmadapter is stateless at the public API boundary, so consumers must know which continuation contract a selected route expects. Provider descriptors, router routes, mux unified.RouteEvents, config inspection, CLI resolution, conformance output, and compatibility artifacts expose:
consumer_continuation: what the caller must provide, for example full replay or native previous_response_id.internal_continuation: diagnostic metadata describing what the provider endpoint actually used internally.transport: diagnostic metadata describing the transport class used or advertised by the provider endpoint, for example http_sse or websocket.consumer_continuation is the only public projection-strategy signal. Consumers must not infer projection behavior from provider name, API family, transport, or internal_continuation; those latter fields are for observability, compatibility evidence, and debugging provider optimizations.
OpenAI Responses currently advertises public previous_response_id continuation. Codex Responses advertises public replay because its HTTP/SSE backend rejects previous_response_id; session-mode Codex requests may use the provider-internal WebSocket transport when available, but callers still send full replay projections and HTTP/SSE fallback remains safe. Codex only attaches an internal WebSocket previous_response_id after same-session/same-branch lineage checks pass for model, instructions, exact canonical input-prefix matching, append-only input growth, and an explicit session ID. The Codex provider keeps that WebSocket open per session so backend affinity is connection-scoped rather than relying on unsupported store:true behavior.
OpenAI platform Responses also has an official WebSocket mode for persistent /v1/responses connections with incremental inputs and previous_response_id. llmadapter exposes this as a direct OpenAI Responses client option, responses.WithWebSocketMode(...), and Codex uses the same mode vocabulary while adding Codex-specific auth/session behavior. Provider descriptors, JSON config, auto mux, and the workload matrix still default openai_responses to HTTP/SSE unless a direct client opts into WebSocket mode. The OpenAI Realtime API remains a separate WebSocket/WebRTC surface and should be modeled as its own realtime API kind/family if llmadapter adds it.
The shared OpenAI Responses WebSocket default transport enables compression and forces IPv4 because OpenAI-operated WebSocket connections have shown IPv6 stalls in practice. Shared session reuse/open-or-write mechanics live under the OpenAI provider internals so native OpenAI Responses and Codex use the same connection-affinity primitive while keeping request shaping separate. Providers can still inject a custom WebSocket transport for tests or specialized networks. OpenRouter Responses does not use this path unless it explicitly opts in later.
Codex WebSocket recovery is intentionally conservative. If the WebSocket fails before user-visible response output starts, llmadapter may fall back to HTTP/SSE replay for that same turn. If the WebSocket fails after output has started, llmadapter returns the stream error instead of retrying, because retrying could duplicate partial output. In both cases the provider-internal WebSocket session and continuation state are discarded, so the next request replays history until a fresh WebSocket turn completes and establishes a new internal response ID.
Provider-owned default HTTP/SSE transports retry transient pre-stream failures for 429, 500, 502, 503, and 504. Retries snapshot and replay the request body, use exponential backoff, and honor Retry-After when upstream provides it. Custom transports supplied via WithTransport(...) are not wrapped so tests and operator-provided transport semantics stay exact.
Prompt caching remains a request-level primitive even when Codex uses WebSocket internally. CachePolicy, CacheKey, and the Codex session/window headers are still derived from the canonical request, and the e2e matrix includes a Codex WebSocket prompt-cache smoke that requires both transport=websocket and provider-reported cache-read tokens.
Provider descriptors are defaults. Providers that can choose transport at request time emit unified.ProviderExecutionEvent; muxclient folds that event into the initial RouteEvent so consumers see the actual transport/internal continuation for the completed turn.
modelmeta maps modeldb offering exposure metadata into route capabilities and limits.
Model resolution is centralized in adapterconfig. CLI diagnostics, llmadapter infer, auto route summaries, mux routing, and gateway routing use the same catalog-backed route/native-model decision. Modeldb is the source of truth for whether a model or alias exists when modeldb-backed routing is enabled; dynamic routes reject catalog-missing models instead of falling through to provider defaults. Config inspection and model resolution expose capability provenance as provider_descriptor, config_override, or modeldb_exposure.
Provider instances are projected into modeldb runtime views for strict workload selection. In that path, modeldb View handles aliases, offerings, runtime access, and provider/service preference, while llmadapter compatibility evidence filters the view to provider/API/model rows that have passed the requested workload profile.
pricing enriches canonical usage events with modeldb-backed cost items.
compatibility evaluates route candidates against workload profiles such as agentic_coding and summarization. It consumes candidates produced by adapterconfig; it does not perform a separate modeldb lookup or instantiate providers. Live compatibility artifacts are workload certification evidence, not model identity data; they are joined with modeldb runtime views when consumers request approved-only selection.
conformance joins static provider descriptors with the latest compatibility artifact. For agentic_coding, approved rows are treated as a strict consumer contract: every required feature must have live evidence, cache accounting must be live, and continuation/transport evidence must be explicit in the artifact. Reasoning is optional workload evidence so consumers can filter for thinking models without excluding non-thinking coding models from the base agentic-coding list. This lets agentsdk-style consumers distinguish “configured route exists” from “route is certified for coding-agent workloads.”
Modeldb is metadata and pricing input. It must not secretly instantiate providers or own credentials.
flowchart TB
CLI[cmd/llmadapter] --> Config[adapterconfig]
CLI --> Server[gatewayserver]
Compat[cmd/llmadapter-gateway] --> Config
Compat --> Server
Server --> Gateway[gateway]
Server --> Endpoints[endpoints/*]
Gateway --> Router[router]
Gateway --> Adapt[adapt]
Gateway --> Unified[unified]
Config --> Registry[providerregistry]
Config --> Router
Config --> Mux[muxclient]
Config --> Pricing[pricing]
Config --> Meta[modelmeta]
Config --> Compat[compatibility]
Config --> ModelDB[github.com/codewandler/modeldb]
Mux --> Router
Mux --> Adapt
Mux --> Unified
Registry --> Providers[providers/*]
Providers --> Transport[transport]
Providers --> Pipeline[pipeline]
Providers --> Adapt
Providers --> Unified
Endpoints --> Adapt
Endpoints --> Unified
Router --> Adapt
Router --> Unified
Pipeline --> Adapt
Pipeline --> Unified
Pricing --> Unified
Meta --> Router
HTTP request
-> endpoint DecodeHTTP
-> adapt.Request
-> router candidates
-> selected provider endpoint
-> native model rewrite
-> provider unified.Client
-> unified.Event stream
-> endpoint WriteEvents
-> HTTP response
unified.Request
-> muxclient
-> adapt.Request with optional source API
-> router candidates
-> selected provider endpoint
-> native model rewrite
-> provider unified.Client
-> unified.RouteEvent + provider unified.Event stream
When the mux client source API is empty, routing is in auto-source mode: all configured source routes are eligible, and the router ranks higher-weight routes first, then source-native Anthropic Messages routes before OpenAI Responses and Chat routes. Compatibility gateways still pass an explicit source API derived from the inbound HTTP endpoint.
unified.Request
-> provider request mapping
-> provider wire HTTP request
-> transport byte stream
-> provider wire event decoder
-> unified.Event stream
unified.APIError, including status, JSON error fields, raw provider body, and Retry-After hints.unified.ErrorEvent carrying unified.APIError where the provider exposes structured error fields. Gateway/mux fallback remains pre-stream/pre-response only; once streaming output begins, errors surface to the caller instead of retrying another provider.unsupported_field_dropped warnings in best-effort mode.unified.Request: policy/key/TTL intent is mapped by provider codecs, while session-level cache strategy and stable-prefix projection stay in agentsdk.unified.Request.Extensions instead of being added as core fields too early.unified.Collect preserves reasoning signatures, citations, and raw provider events for higher layers that need continuation or provider-specific metadata.cmd/llmadapter-gateway is now a thin compatibility binary over the shared adapterconfig and gatewayserver path.docs/PROVIDER_MATRIX.md records the exact v1 provider endpoint matrix and latest live result.providerregistry is intentionally static. Descriptors carry endpoint metadata and factories, so metadata and construction stay together without a central provider-type switch. This is stable for v1, but it is not a plugin system. External provider module loading remains post-v1 expansion.
Shared wire structs that cross endpoint/provider boundaries should live at the protocol owner or in neutral packages. Anthropic Messages follows this through anthropicwire because downstream and upstream packages both need the same Anthropic-shaped structs. OpenAI Responses follows the owner-package rule: providers/openai/responses owns the base Responses provider wire shape, and compatible providers wrap it with explicit provider overlays instead of copying it.
Gateway and mux client both implement route candidate fallback. Shared route-attempt mechanics live in internal/routeattempt: candidate lookup, native model rewrite, and provider/API error formatting.
The shared policy classifies request-shape validation failures as non-retryable, including adapt.UnsupportedFieldError and 400/422 provider API errors. Gateway config can set max_attempts; mux library consumers can set muxclient.WithMaxAttempts. The HTTP-specific response-start rule stays in gateway: once response bytes are written, the gateway cannot transparently retry another upstream.
Base capabilities are still partly endpoint-family/provider defaults. Modeldb narrows fixed-route capabilities and known dynamic model requests, and dynamic model IDs missing from the catalog are rejected instead of being rewritten to provider defaults. CLI/config inspection reports where effective capabilities came from:
provider_descriptor: static provider endpoint metadata.config_override: explicit operator override in llmadapter config.modeldb_exposure: modeldb offering exposure metadata for the selected provider API.Live tests are strong smoke coverage, and deterministic offline fixtures cover the currently known compatibility classes: endpoint decode edge cases, reasoning variants, citation metadata variants, provider error shapes, raw/unmapped events, prompt-cache accounting where exposed, and unsupported-media/built-in-tool policy. Remaining conformance gaps are future-facing:
Raw/unmapped event preservation exists for provider usage payloads and selected unmapped provider stream events. More provider-specific events should be preserved before broadening to APIs with richer event streams.
The current stable state is a stateless, stream-first adapter with shared adapterconfig construction for CLI, gateway, and mux client paths. Model resolution is centralized through modeldb catalog loading plus alias overlays when modeldb-backed routing is enabled. Provider support spans Anthropic Messages-compatible, OpenAI Chat-compatible, and OpenAI Responses-compatible endpoint families across Anthropic, Claude Code-compatible access, OpenAI, OpenRouter, MiniMax, and Codex endpoint variants.
Usage/cost accounting is canonical and structured, provider raw usage/error payloads are retained where exposed, prompt-cache controls are explicit request hints, and stateful conversation/session behavior remains outside llmadapter.
Stateful conversations intentionally live outside llmadapter. This is the right boundary, but llmadapter must continue exposing the stateless primitives that agentsdk needs: previous response IDs, prompt cache keys, response IDs, provider session hints, usage, and cost events.
Provider descriptors now carry static client factories, so client construction lives with descriptor metadata instead of in a growing provider-type switch. This remains static and deterministic for v1; a plugin-style registry is not required.
Only extract shared wire packages when there is real duplication or cross-boundary coupling. Anthropic Messages has been extracted to anthropicwire because both downstream /v1/messages and upstream Anthropic-compatible providers use the same wire shape.
Keep the HTTP-specific response-start behavior in gateway, but factor shared route attempt/error metadata where useful so mux and gateway report failures consistently.
Initial shared mechanics are implemented in internal/routeattempt: candidate lookup, native model rewrite, error formatting, retryability classification, and max-attempt checks. Remaining policy expansion, if needed, is post-v1 work such as backoff or richer production failure classification.
Focused offline fixture tests now cover the known v1 classes. Future work should add fixtures as providers expose new event shapes, error bodies, citation annotations, cache accounting details, or supported media/tool types.
Keep extension data namespaced, but add typed helper structs and validation for mature extension groups such as:
unified.OpenRouterExtensions.unified.OpenAIResponsesExtensions.unified.CodexExtensions.unified.AnthropicExtensions; modeldb-resolved adaptive-effort metadata maps canonical effort into adaptive thinking plus provider output_config.effort.Typed extension readers now validate mature extension groups and return invalid_extension_dropped warnings for invalid values. Focused semantic checks cover OpenAI Responses cache retention, OpenRouter routing/provider/plugin/session controls, Anthropic beta header values, and Codex turn metadata. Provider encoders preserve valid extensions and drop invalid controls instead of silently sending malformed provider-specific fields.
Do not move replay history, durable session state, cache policy, or memory projection into llmadapter. Instead, keep improving the stateless primitives consumed by agentsdk and similar clients.
adapt.ApiKind values when their wire shape or event semantics differ.adapt.ApiFamily.adapterconfig construction path.unified.Client, not inside the gateway/router.