llmadapter

Architecture

This document describes the current llmadapter architecture, package boundaries, dependency shape, known shortcomings, and roadmap for hardening the design.

docs/API_SURFACE.md records the v1 public package boundary. This document is the practical architecture review of the code as it exists now. Root-level DESIGN.md and PLAN.md are internal background notes rather than primary user documentation.

Purpose

llmadapter adapts requests between downstream API compatibility surfaces and upstream provider API surfaces through a canonical request/event model.

The core design avoids direct M x N conversions between every endpoint and every provider:

downstream endpoint wire API
  -> adapt.Request
  -> unified.Request / unified.Event stream
  -> provider endpoint wire API

The same stateless routing path is used by:

Conversation/session state intentionally lives above this repository, for example in agentsdk.

Package Layers

Core Model

unified defines the canonical request, messages, content parts, tools, response formats, usage/cost accounting, extension bag, event stream, unified.Client, and Collect.

adapt defines API kind/family identifiers, strict/best-effort mapping concepts, endpoint/provider request envelopes, warnings, and generic codec/processor interfaces.

These packages should stay small, provider-neutral, and free of concrete provider imports.

Stream And Transport

pipeline contains generic event/request processors used to transform or enrich streams, including pricing processors.

transport contains byte-stream transport primitives, HTTP request execution, fake transports for tests, SSE/NDJSON parsing, retry/rate-limit wrappers, and extended compression support.

Providers may depend on pipeline and transport. Higher-level routing should not depend on provider wire details.

Provider Clients

providers/* packages implement upstream provider endpoint clients that satisfy unified.Client.

Current provider endpoint families include:

Provider endpoint packages own provider-specific wire request encoding, streaming response decoding, auth behavior, and provider-specific request extensions.

Shared wire structs that are used on both sides of the adapter live outside provider implementation packages. Anthropic Messages wire types are in anthropicwire; the upstream Anthropic provider keeps aliases for compatibility, while the downstream /v1/messages endpoint imports the neutral wire package.

Endpoint Codecs

endpoints/* packages implement downstream HTTP compatibility surfaces:

Each endpoint decodes inbound HTTP requests into adapt.Request, then encodes canonical unified.Event streams back into the downstream API wire format.

Endpoint codecs should not own provider selection or provider credentials.

Routing

router defines:

Routing is intentionally endpoint-based:

Provider = who we talk to.
API kind = exact upstream wire protocol.
API family = compatibility shape.
Provider endpoint = provider + API kind + family + client + capabilities.

This matters for providers such as OpenRouter, MiniMax, Azure, Bedrock, Vertex, or Ollama, where one provider can expose multiple protocol surfaces.

Gateway

gateway contains the generic HTTP handler that:

  1. Decodes a downstream request through an endpoint codec.
  2. Asks the router for route candidates.
  3. Rewrites the request model to the selected native model.
  4. Calls the selected provider endpoint client.
  5. Writes canonical events through the endpoint codec.
  6. Falls back to lower-ranked candidates when a provider fails before response bytes are written.
  7. Tracks temporary provider endpoint/model health.

gatewayserver wires the three implemented HTTP endpoints to a shared router built from adapterconfig.

Config And Construction

adapterconfig is the main construction boundary. It loads and validates JSON/env config, builds provider endpoints through providerregistry, loads the modeldb catalog with overlays, resolves requested models against catalog offerings, applies modeldb metadata and pricing wrappers, builds routers, and constructs the in-process mux client.

providerregistry lists supported provider endpoint descriptors and builds direct provider clients for a configured provider type.

muxclient exposes a stateless unified.Client over the same router/provider endpoint path used by the gateway.

examples/llmadapter.example.json is a load-tested operator config that exercises this construction boundary without requiring provider credentials during inspection.

Provider Wire Ownership

Provider packages own the wire format for the protocol they originate. OpenAI Responses is the canonical implementation for the Responses request/event shape in providers/openai/responses. Compatible providers such as OpenRouter Responses and Codex Responses depend on that OpenAI Responses base and apply provider-specific adjustments as overlays:

This prevents proprietary compatibility providers from becoming the accidental source of truth for the base OpenAI Responses schema.

Continuation And Transport Metadata

llmadapter is stateless at the public API boundary, so consumers must know which continuation contract a selected route expects. Provider descriptors, router routes, mux unified.RouteEvents, config inspection, CLI resolution, conformance output, and compatibility artifacts expose:

consumer_continuation is the only public projection-strategy signal. Consumers must not infer projection behavior from provider name, API family, transport, or internal_continuation; those latter fields are for observability, compatibility evidence, and debugging provider optimizations.

OpenAI Responses currently advertises public previous_response_id continuation. Codex Responses advertises public replay because its HTTP/SSE backend rejects previous_response_id; session-mode Codex requests may use the provider-internal WebSocket transport when available, but callers still send full replay projections and HTTP/SSE fallback remains safe. Codex only attaches an internal WebSocket previous_response_id after same-session/same-branch lineage checks pass for model, instructions, exact canonical input-prefix matching, append-only input growth, and an explicit session ID. The Codex provider keeps that WebSocket open per session so backend affinity is connection-scoped rather than relying on unsupported store:true behavior.

OpenAI platform Responses also has an official WebSocket mode for persistent /v1/responses connections with incremental inputs and previous_response_id. llmadapter exposes this as a direct OpenAI Responses client option, responses.WithWebSocketMode(...), and Codex uses the same mode vocabulary while adding Codex-specific auth/session behavior. Provider descriptors, JSON config, auto mux, and the workload matrix still default openai_responses to HTTP/SSE unless a direct client opts into WebSocket mode. The OpenAI Realtime API remains a separate WebSocket/WebRTC surface and should be modeled as its own realtime API kind/family if llmadapter adds it.

The shared OpenAI Responses WebSocket default transport enables compression and forces IPv4 because OpenAI-operated WebSocket connections have shown IPv6 stalls in practice. Shared session reuse/open-or-write mechanics live under the OpenAI provider internals so native OpenAI Responses and Codex use the same connection-affinity primitive while keeping request shaping separate. Providers can still inject a custom WebSocket transport for tests or specialized networks. OpenRouter Responses does not use this path unless it explicitly opts in later.

Codex WebSocket recovery is intentionally conservative. If the WebSocket fails before user-visible response output starts, llmadapter may fall back to HTTP/SSE replay for that same turn. If the WebSocket fails after output has started, llmadapter returns the stream error instead of retrying, because retrying could duplicate partial output. In both cases the provider-internal WebSocket session and continuation state are discarded, so the next request replays history until a fresh WebSocket turn completes and establishes a new internal response ID.

Provider-owned default HTTP/SSE transports retry transient pre-stream failures for 429, 500, 502, 503, and 504. Retries snapshot and replay the request body, use exponential backoff, and honor Retry-After when upstream provides it. Custom transports supplied via WithTransport(...) are not wrapped so tests and operator-provided transport semantics stay exact.

Prompt caching remains a request-level primitive even when Codex uses WebSocket internally. CachePolicy, CacheKey, and the Codex session/window headers are still derived from the canonical request, and the e2e matrix includes a Codex WebSocket prompt-cache smoke that requires both transport=websocket and provider-reported cache-read tokens.

Provider descriptors are defaults. Providers that can choose transport at request time emit unified.ProviderExecutionEvent; muxclient folds that event into the initial RouteEvent so consumers see the actual transport/internal continuation for the completed turn.

Metadata And Pricing

modelmeta maps modeldb offering exposure metadata into route capabilities and limits.

Model resolution is centralized in adapterconfig. CLI diagnostics, llmadapter infer, auto route summaries, mux routing, and gateway routing use the same catalog-backed route/native-model decision. Modeldb is the source of truth for whether a model or alias exists when modeldb-backed routing is enabled; dynamic routes reject catalog-missing models instead of falling through to provider defaults. Config inspection and model resolution expose capability provenance as provider_descriptor, config_override, or modeldb_exposure.

Provider instances are projected into modeldb runtime views for strict workload selection. In that path, modeldb View handles aliases, offerings, runtime access, and provider/service preference, while llmadapter compatibility evidence filters the view to provider/API/model rows that have passed the requested workload profile.

pricing enriches canonical usage events with modeldb-backed cost items.

compatibility evaluates route candidates against workload profiles such as agentic_coding and summarization. It consumes candidates produced by adapterconfig; it does not perform a separate modeldb lookup or instantiate providers. Live compatibility artifacts are workload certification evidence, not model identity data; they are joined with modeldb runtime views when consumers request approved-only selection.

conformance joins static provider descriptors with the latest compatibility artifact. For agentic_coding, approved rows are treated as a strict consumer contract: every required feature must have live evidence, cache accounting must be live, and continuation/transport evidence must be explicit in the artifact. Reasoning is optional workload evidence so consumers can filter for thinking models without excluding non-thinking coding models from the base agentic-coding list. This lets agentsdk-style consumers distinguish “configured route exists” from “route is certified for coding-agent workloads.”

Modeldb is metadata and pricing input. It must not secretly instantiate providers or own credentials.

Dependency Diagram

flowchart TB
  CLI[cmd/llmadapter] --> Config[adapterconfig]
  CLI --> Server[gatewayserver]
  Compat[cmd/llmadapter-gateway] --> Config
  Compat --> Server

  Server --> Gateway[gateway]
  Server --> Endpoints[endpoints/*]
  Gateway --> Router[router]
  Gateway --> Adapt[adapt]
  Gateway --> Unified[unified]

  Config --> Registry[providerregistry]
  Config --> Router
  Config --> Mux[muxclient]
  Config --> Pricing[pricing]
  Config --> Meta[modelmeta]
  Config --> Compat[compatibility]
  Config --> ModelDB[github.com/codewandler/modeldb]

  Mux --> Router
  Mux --> Adapt
  Mux --> Unified

  Registry --> Providers[providers/*]
  Providers --> Transport[transport]
  Providers --> Pipeline[pipeline]
  Providers --> Adapt
  Providers --> Unified

  Endpoints --> Adapt
  Endpoints --> Unified

  Router --> Adapt
  Router --> Unified
  Pipeline --> Adapt
  Pipeline --> Unified
  Pricing --> Unified
  Meta --> Router

Request Flow

HTTP Gateway Flow

HTTP request
  -> endpoint DecodeHTTP
  -> adapt.Request
  -> router candidates
  -> selected provider endpoint
  -> native model rewrite
  -> provider unified.Client
  -> unified.Event stream
  -> endpoint WriteEvents
  -> HTTP response

In-Process Mux Flow

unified.Request
  -> muxclient
  -> adapt.Request with optional source API
  -> router candidates
  -> selected provider endpoint
  -> native model rewrite
  -> provider unified.Client
  -> unified.RouteEvent + provider unified.Event stream

When the mux client source API is empty, routing is in auto-source mode: all configured source routes are eligible, and the router ranks higher-weight routes first, then source-native Anthropic Messages routes before OpenAI Responses and Chat routes. Compatibility gateways still pass an explicit source API derived from the inbound HTTP endpoint.

Provider Flow

unified.Request
  -> provider request mapping
  -> provider wire HTTP request
  -> transport byte stream
  -> provider wire event decoder
  -> unified.Event stream

Current Strengths

Known Shortcomings

Static Provider Registry

providerregistry is intentionally static. Descriptors carry endpoint metadata and factories, so metadata and construction stay together without a central provider-type switch. This is stable for v1, but it is not a plugin system. External provider module loading remains post-v1 expansion.

Shared Wire Packages

Shared wire structs that cross endpoint/provider boundaries should live at the protocol owner or in neutral packages. Anthropic Messages follows this through anthropicwire because downstream and upstream packages both need the same Anthropic-shaped structs. OpenAI Responses follows the owner-package rule: providers/openai/responses owns the base Responses provider wire shape, and compatible providers wrap it with explicit provider overlays instead of copying it.

Gateway/Mux Fallback Boundary

Gateway and mux client both implement route candidate fallback. Shared route-attempt mechanics live in internal/routeattempt: candidate lookup, native model rewrite, and provider/API error formatting.

The shared policy classifies request-shape validation failures as non-retryable, including adapt.UnsupportedFieldError and 400/422 provider API errors. Gateway config can set max_attempts; mux library consumers can set muxclient.WithMaxAttempts. The HTTP-specific response-start rule stays in gateway: once response bytes are written, the gateway cannot transparently retry another upstream.

Capability Provenance

Base capabilities are still partly endpoint-family/provider defaults. Modeldb narrows fixed-route capabilities and known dynamic model requests, and dynamic model IDs missing from the catalog are rejected instead of being rewritten to provider defaults. CLI/config inspection reports where effective capabilities came from:

Conformance Depth

Live tests are strong smoke coverage, and deterministic offline fixtures cover the currently known compatibility classes: endpoint decode edge cases, reasoning variants, citation metadata variants, provider error shapes, raw/unmapped events, prompt-cache accounting where exposed, and unsupported-media/built-in-tool policy. Remaining conformance gaps are future-facing:

Raw Event Preservation

Raw/unmapped event preservation exists for provider usage payloads and selected unmapped provider stream events. More provider-specific events should be preserved before broadening to APIs with richer event streams.

Current Stable State

The current stable state is a stateless, stream-first adapter with shared adapterconfig construction for CLI, gateway, and mux client paths. Model resolution is centralized through modeldb catalog loading plus alias overlays when modeldb-backed routing is enabled. Provider support spans Anthropic Messages-compatible, OpenAI Chat-compatible, and OpenAI Responses-compatible endpoint families across Anthropic, Claude Code-compatible access, OpenAI, OpenRouter, MiniMax, and Codex endpoint variants.

Usage/cost accounting is canonical and structured, provider raw usage/error payloads are retained where exposed, prompt-cache controls are explicit request hints, and stateful conversation/session behavior remains outside llmadapter.

Stateful Conversation Policy

Stateful conversations intentionally live outside llmadapter. This is the right boundary, but llmadapter must continue exposing the stateless primitives that agentsdk needs: previous response IDs, prompt cache keys, response IDs, provider session hints, usage, and cost events.

Improvement Roadmap

1. Keep Provider Registry Static And Descriptor-Owned

Provider descriptors now carry static client factories, so client construction lives with descriptor metadata instead of in a growing provider-type switch. This remains static and deterministic for v1; a plugin-style registry is not required.

2. Normalize Shared Wire Packages Where Needed

Only extract shared wire packages when there is real duplication or cross-boundary coupling. Anthropic Messages has been extracted to anthropicwire because both downstream /v1/messages and upstream Anthropic-compatible providers use the same wire shape.

3. Align Gateway And Mux Fallback Policy

Keep the HTTP-specific response-start behavior in gateway, but factor shared route attempt/error metadata where useful so mux and gateway report failures consistently.

Initial shared mechanics are implemented in internal/routeattempt: candidate lookup, native model rewrite, error formatting, retryability classification, and max-attempt checks. Remaining policy expansion, if needed, is post-v1 work such as backoff or richer production failure classification.

4. Broaden Codec Conformance

Focused offline fixture tests now cover the known v1 classes. Future work should add fixtures as providers expose new event shapes, error bodies, citation annotations, cache accounting details, or supported media/tool types.

5. Validate Provider Extensions

Keep extension data namespaced, but add typed helper structs and validation for mature extension groups such as:

Typed extension readers now validate mature extension groups and return invalid_extension_dropped warnings for invalid values. Focused semantic checks cover OpenAI Responses cache retention, OpenRouter routing/provider/plugin/session controls, Anthropic beta header values, and Codex turn metadata. Provider encoders preserve valid extensions and drop invalid controls instead of silently sending malformed provider-specific fields.

6. Keep Conversation State Out Of Core

Do not move replay history, durable session state, cache policy, or memory projection into llmadapter. Instead, keep improving the stateless primitives consumed by agentsdk and similar clients.

Design Rules Going Forward