claude-code-proxy/proxy/internal/handler/openapi.go
sid 8e550b9785 Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain)
This commit captures both the prior accumulated work-in-progress
(framework migration web/→svelte/, postgres storage, conversation
viewer, dashboard auth, OpenAPI spec, integration tests) AND today's
operational improvements layered on top. History wasn't checkpointed
incrementally; happy to split it via interactive rebase if a reviewer
wants smaller commits.

Today's changes (in addition to the older WIP):

1. Configurable upstream response-header timeout
   - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s)
   - Replaces hardcoded 300s in provider/anthropic.go that was firing
     on opus + 1M-context + extended thinking non-streaming requests
   - Files: internal/config/config.go, internal/provider/anthropic.go

2. Structured forward-error diagnostic logging
   - When a forward to Anthropic fails, log a single key=value line
     with request_id, model, stream, body_bytes, has_thinking,
     anthropic_beta, query, elapsed, ctx_err — alongside the existing
     human-readable error line for back-compat
   - Files: internal/handler/handlers.go (logForwardFailure)

3. Full SSE protocol passthrough + Flusher fix
   - handler/handlers.go: forward all SSE lines verbatim (event:, id:,
     retry:, : comments, blank-line terminators), not only data:.
     Previous code produced malformed SSE for strict parsers.
   - middleware/logging.go: explicit Flush() method on responseWriter.
     Embedding http.ResponseWriter (interface) does not auto-promote
     Flush(), so every w.(http.Flusher) check in the streaming
     handler was returning ok=false and SSE writes buffered in net/http
     until the body closed.

4. Non-streaming → streaming demotion (feature-flagged)
   - ANTHROPIC_DEMOTE_NONSTREAMING env (default false)
   - When enabled and the routed provider is anthropic, force stream=true
     upstream for clients that asked for stream=false. Receive SSE,
     accumulate via accumulateSSEToMessage (handles text, tool_use with
     partial_json reassembly, thinking, signature, citations_delta,
     usage merge), and synthesize a single non-streaming JSON response.
   - Eliminates the ResponseHeaderTimeout class of failure entirely.
   - Body rewrite uses json.Decoder + UseNumber() to preserve integer
     precision in unknown nested fields (tool inputs from prior turns).
   - Files: internal/config/config.go, internal/handler/handlers.go,
     cmd/proxy/main.go, cmd/proxy/main_test.go

5. Live operational state: /livez gauge + graceful drain
   - New internal/runtime package: atomic in-flight counter + draining flag
   - New middleware/inflight.go: increments runtime gauge, applied to
     /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough
     are all counted
   - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware
     applies surgically; /health, /livez, /openapi.* remain on parent
     router (unauthenticated, uncounted)
   - Health handler returns 503 draining when runtime.IsDraining() is
     true, so Traefik stops routing to a slot before drain begins
   - New /livez handler returns {status, in_flight, draining, timestamp}
   - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0
     with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown
   - Auth bypass list extended with /livez
   - Files: internal/runtime/runtime.go (new),
     internal/middleware/inflight.go (new),
     internal/middleware/auth.go,
     internal/handler/handlers.go (Health, Livez, runtime import),
     cmd/proxy/main.go (subrouter, drain loop)

6. OpenAPI spec updates
   - Document Health 503 response and new DrainingResponse schema
   - Add /livez path with LivezResponse schema
   - Files: internal/handler/openapi.go

Verified: go build ./... clean, go test ./... all pass, go vet clean.
Three rounds of codex peer review across changes 1-5; all feedback
addressed (citations_delta, json.Number precision, drain-loop logging
via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00

900 lines
26 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package handler
import (
"encoding/json"
"net/http"
"gopkg.in/yaml.v3"
)
// openAPISpec is the embedded OpenAPI 3.0 specification for the proxy API.
var openAPISpec = `
openapi: "3.0.3"
info:
title: Claude Code Proxy API
description: |
An Anthropic API proxy that provides request logging, model routing, usage
analytics, and a dashboard UI. The proxy exposes two groups of endpoints:
**Proxy endpoints** drop-in replacements for the upstream Anthropic API.
Point your Claude Code (or any Anthropic SDK client) at this proxy and all
requests are forwarded, logged, and optionally re-routed to a different model
or provider.
**Dashboard endpoints** read-only analytics and configuration APIs that
power the built-in web dashboard. These are protected by HTTP Basic Auth
when DASHBOARD_PASSWORD is set.
version: "1.0.0"
contact:
name: Claude Code Proxy
license:
name: MIT
servers:
- url: /
description: This proxy instance
tags:
- name: proxy
description: |
Drop-in Anthropic API proxy endpoints. Authenticate with the same
x-api-key / Authorization header you use for the upstream Anthropic API.
- name: dashboard
description: |
Analytics and configuration endpoints for the web dashboard.
Protected by DASHBOARD_PASSWORD basic auth when configured.
- name: health
description: Health and discovery endpoints (no auth required).
paths:
# ── Proxy endpoints ────────────────────────────────────────────────────
/v1/messages:
post:
operationId: createMessage
tags: [proxy]
summary: Create a message (Anthropic Messages API)
description: |
Forwards the request to the upstream Anthropic (or routed) provider.
Supports both streaming (SSE) and non-streaming responses. The proxy
logs the request/response, applies any configured model routing rules
and header rules, then returns the upstream response verbatim.
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/AnthropicRequest"
responses:
"200":
description: Successful message response (non-streaming)
content:
application/json:
schema:
$ref: "#/components/schemas/AnthropicResponse"
text/event-stream:
schema:
type: string
description: SSE stream of Anthropic streaming events
"400":
description: Invalid request
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
"500":
description: Upstream or internal error
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
/v1/chat/completions:
post:
operationId: chatCompletions
tags: [proxy]
summary: Chat completions (OpenAI-compatible not supported)
description: |
Returns a 400 error directing callers to use /v1/messages instead.
This endpoint exists for compatibility detection only.
requestBody:
content:
application/json:
schema:
type: object
responses:
"400":
description: Not supported use /v1/messages
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
/v1/models:
get:
operationId: listModels
tags: [proxy]
summary: List available models
description: |
Returns the list of models known to the proxy. The proxy uses
pattern-based routing so any model accepted by the upstream provider
will work; this endpoint currently returns an empty list.
responses:
"200":
description: Model list
content:
application/json:
schema:
$ref: "#/components/schemas/ModelsResponse"
# ── Health & discovery ─────────────────────────────────────────────────
/health:
get:
operationId: healthCheck
tags: [health]
summary: Health check (binary up/ready signal for load balancers)
description: |
Returns 200 with status=healthy while the process is accepting
traffic, and 503 with status=draining once a SIGTERM has been
received. Traefik (or any LB doing health-based routing) should
treat 503 as "stop sending new requests to this backend", which is
the signal the graceful-drain loop relies on.
responses:
"200":
description: Service is healthy
content:
application/json:
schema:
$ref: "#/components/schemas/HealthResponse"
"503":
description: Service is draining (SIGTERM received). Stop routing here.
content:
application/json:
schema:
$ref: "#/components/schemas/DrainingResponse"
/livez:
get:
operationId: livenessProbe
tags: [health]
summary: Live operational state (in-flight gauge + draining flag)
description: |
Always returns 200 with the current in-flight request count and
draining flag. Distinct from /health, which is a binary up/ready
signal — /livez is for observability and deploy-time orchestration
("how many requests are still active before I cycle this slot?").
responses:
"200":
description: Operational state
content:
application/json:
schema:
$ref: "#/components/schemas/LivezResponse"
/openapi.json:
get:
operationId: getOpenAPISpec
tags: [health]
summary: OpenAPI specification (JSON)
responses:
"200":
description: The OpenAPI 3.0 spec for this API
content:
application/json:
schema:
type: object
/openapi.yaml:
get:
operationId: getOpenAPISpecYAML
tags: [health]
summary: OpenAPI specification (YAML)
responses:
"200":
description: The OpenAPI 3.0 spec for this API
content:
application/x-yaml:
schema:
type: string
# ── Dashboard endpoints ────────────────────────────────────────────────
/api/requests:
get:
operationId: getRequests
tags: [dashboard]
summary: List logged requests
parameters:
- name: page
in: query
schema: { type: integer, default: 1 }
- name: limit
in: query
schema: { type: integer, default: 10 }
- name: model
in: query
schema: { type: string, default: "all" }
description: Filter by model name (substring match) or "all"
responses:
"200":
description: Paginated request list
content:
application/json:
schema:
type: object
properties:
requests:
type: array
items:
$ref: "#/components/schemas/RequestLog"
total:
type: integer
delete:
operationId: deleteRequests
tags: [dashboard]
summary: Clear all logged requests
responses:
"200":
description: Requests cleared
content:
application/json:
schema:
type: object
properties:
message: { type: string }
deleted: { type: integer }
/api/requests/summary:
get:
operationId: getRequestsSummary
tags: [dashboard]
summary: Lightweight request summaries for fast list rendering
parameters:
- name: model
in: query
schema: { type: string, default: "all" }
- name: start
in: query
schema: { type: string, format: date-time }
description: Start of time range (UTC ISO 8601)
- name: end
in: query
schema: { type: string, format: date-time }
description: End of time range (UTC ISO 8601)
- name: offset
in: query
schema: { type: integer, default: 0 }
- name: limit
in: query
schema: { type: integer, default: 0 }
description: Max results (0 = unlimited)
responses:
"200":
description: Paginated request summaries
content:
application/json:
schema:
type: object
properties:
requests:
type: array
items:
$ref: "#/components/schemas/RequestSummary"
total: { type: integer }
offset: { type: integer }
limit: { type: integer }
/api/requests/latest-date:
get:
operationId: getLatestRequestDate
tags: [dashboard]
summary: Date of the most recent logged request
responses:
"200":
content:
application/json:
schema:
type: object
properties:
latestDate: { type: string, format: date-time }
/api/requests/{id}:
get:
operationId: getRequestByID
tags: [dashboard]
summary: Get a single request by ID
parameters:
- name: id
in: path
required: true
schema: { type: string }
description: Short or full request ID
responses:
"200":
content:
application/json:
schema:
type: object
properties:
request:
$ref: "#/components/schemas/RequestLog"
fullId: { type: string }
"404":
description: Request not found
/api/stats:
get:
operationId: getStats
tags: [dashboard]
summary: Aggregated usage statistics
parameters:
- name: start_date
in: query
schema: { type: string }
- name: end_date
in: query
schema: { type: string }
- name: model
in: query
schema: { type: string }
- name: org
in: query
schema: { type: string }
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/UsageStats"
/api/stats/dashboard:
get:
operationId: getDashboardStats
tags: [dashboard]
summary: Daily token usage for dashboard charts
parameters:
- name: start
in: query
schema: { type: string, format: date-time }
- name: end
in: query
schema: { type: string, format: date-time }
- name: org
in: query
schema: { type: string }
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/DashboardStats"
/api/stats/hourly:
get:
operationId: getHourlyStats
tags: [dashboard]
summary: Hourly token usage breakdown
parameters:
- name: start
in: query
required: true
schema: { type: string, format: date-time }
- name: end
in: query
required: true
schema: { type: string, format: date-time }
- name: bucket
in: query
schema: { type: integer, default: 60 }
description: Bucket size in minutes
- name: org
in: query
schema: { type: string }
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/HourlyStatsResponse"
/api/stats/models:
get:
operationId: getModelStats
tags: [dashboard]
summary: Per-model token usage breakdown
parameters:
- name: start
in: query
required: true
schema: { type: string, format: date-time }
- name: end
in: query
required: true
schema: { type: string, format: date-time }
- name: org
in: query
schema: { type: string }
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/ModelStatsResponse"
/api/stats/organizations:
get:
operationId: getOrganizations
tags: [dashboard]
summary: List distinct organization IDs
responses:
"200":
content:
application/json:
schema:
type: object
properties:
organizations:
type: array
items: { type: string }
/api/conversations:
get:
operationId: getConversations
tags: [dashboard]
summary: List conversations (grouped by session)
parameters:
- name: model
in: query
schema: { type: string, default: "all" }
- name: page
in: query
schema: { type: integer, default: 1 }
- name: limit
in: query
schema: { type: integer, default: 10 }
responses:
"200":
content:
application/json:
schema:
type: object
properties:
conversations:
type: array
items:
type: object
properties:
id: { type: string }
requestCount: { type: integer }
startTime: { type: string, format: date-time }
lastActivity: { type: string, format: date-time }
duration: { type: integer, description: "Duration in ms" }
firstMessage: { type: string }
projectPath: { type: string }
projectName: { type: string }
model: { type: string }
hasMore: { type: boolean }
total: { type: integer }
page: { type: integer }
limit: { type: integer }
/api/conversations/{id}:
get:
operationId: getConversationByID
tags: [dashboard]
summary: Get a single conversation by session ID
parameters:
- name: id
in: path
required: true
schema: { type: string }
- name: project
in: query
required: true
schema: { type: string }
description: Project path the conversation belongs to
responses:
"200":
content:
application/json:
schema:
type: object
"404":
description: Conversation not found
/api/conversations/project:
get:
operationId: getConversationsByProject
tags: [dashboard]
summary: List conversations for a specific project
parameters:
- name: project
in: query
required: true
schema: { type: string }
responses:
"200":
content:
application/json:
schema:
type: object
/api/settings:
get:
operationId: getSettings
tags: [dashboard]
summary: Get current proxy settings
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/ProxySettings"
put:
operationId: saveSettings
tags: [dashboard]
summary: Update proxy settings
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/ProxySettings"
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/ProxySettings"
components:
securitySchemes:
apiKey:
type: apiKey
in: header
name: x-api-key
description: Anthropic API key (forwarded to upstream)
bearerAuth:
type: http
scheme: bearer
description: Bearer token authentication
dashboardBasicAuth:
type: http
scheme: basic
description: Dashboard password (username is ignored)
schemas:
ErrorResponse:
type: object
properties:
error: { type: string }
details: { type: string }
HealthResponse:
type: object
properties:
status: { type: string, example: "healthy" }
timestamp: { type: string, format: date-time }
DrainingResponse:
type: object
properties:
status: { type: string, example: "draining" }
timestamp: { type: string, format: date-time }
in_flight: { type: integer, example: 3 }
LivezResponse:
type: object
properties:
status: { type: string, example: "ok" }
timestamp: { type: string, format: date-time }
in_flight: { type: integer, example: 0 }
draining: { type: boolean, example: false }
AnthropicRequest:
type: object
required: [model, messages, max_tokens]
properties:
model:
type: string
description: |
Model ID to use. The proxy may re-route this to a different
model/provider based on configured routing rules.
example: "claude-sonnet-4-5-20250514"
messages:
type: array
items:
$ref: "#/components/schemas/AnthropicMessage"
max_tokens:
type: integer
example: 1024
temperature:
type: number
format: float
system:
type: array
items:
$ref: "#/components/schemas/SystemMessage"
stream:
type: boolean
default: false
tools:
type: array
items:
$ref: "#/components/schemas/Tool"
tool_choice:
description: Tool choice configuration
AnthropicMessage:
type: object
required: [role, content]
properties:
role:
type: string
enum: [user, assistant]
content:
description: String or array of content blocks
oneOf:
- type: string
- type: array
items:
type: object
properties:
type: { type: string }
text: { type: string }
SystemMessage:
type: object
properties:
type: { type: string, example: "text" }
text: { type: string }
cache_control:
type: object
properties:
type: { type: string, example: "ephemeral" }
Tool:
type: object
properties:
name: { type: string }
description: { type: string }
input_schema:
type: object
properties:
type: {}
properties: { type: object }
required:
type: array
items: { type: string }
AnthropicResponse:
type: object
properties:
id: { type: string }
type: { type: string, example: "message" }
role: { type: string, example: "assistant" }
model: { type: string }
stop_reason: { type: string }
stop_sequence: { type: string, nullable: true }
content:
type: array
items:
type: object
properties:
type: { type: string }
text: { type: string }
usage:
$ref: "#/components/schemas/AnthropicUsage"
AnthropicUsage:
type: object
properties:
input_tokens: { type: integer }
output_tokens: { type: integer }
cache_creation_input_tokens: { type: integer }
cache_read_input_tokens: { type: integer }
service_tier: { type: string }
ModelsResponse:
type: object
properties:
object: { type: string, example: "list" }
data:
type: array
items:
type: object
properties:
id: { type: string }
object: { type: string }
created: { type: integer }
owned_by: { type: string }
RequestLog:
type: object
properties:
requestId: { type: string }
timestamp: { type: string, format: date-time }
method: { type: string }
endpoint: { type: string }
model: { type: string }
originalModel: { type: string }
routedModel: { type: string }
userAgent: { type: string }
contentType: { type: string }
conversationHash: { type: string }
messageCount: { type: integer }
organizationId: { type: string }
response:
$ref: "#/components/schemas/ResponseLog"
ResponseLog:
type: object
properties:
statusCode: { type: integer }
responseTime: { type: integer, description: "Response time in ms" }
isStreaming: { type: boolean }
completedAt: { type: string, format: date-time }
streamError: { type: string }
rateLimit:
$ref: "#/components/schemas/RateLimitInfo"
RateLimitInfo:
type: object
properties:
organizationId: { type: string }
requestsLimit: { type: integer }
requestsRemaining: { type: integer }
requestsReset: { type: string }
tokensLimit: { type: integer }
tokensRemaining: { type: integer }
tokensReset: { type: string }
unifiedStatus: { type: string }
unifiedUtilization5h: { type: number }
unifiedReset5h: { type: string }
unifiedUtilization7d: { type: number }
unifiedReset7d: { type: string }
RequestSummary:
type: object
properties:
requestId: { type: string }
timestamp: { type: string, format: date-time }
method: { type: string }
endpoint: { type: string }
model: { type: string }
originalModel: { type: string }
routedModel: { type: string }
statusCode: { type: integer }
responseTime: { type: integer }
usage:
$ref: "#/components/schemas/AnthropicUsage"
conversationHash: { type: string }
messageCount: { type: integer }
stopReason: { type: string }
UsageStats:
type: object
properties:
total_requests: { type: integer }
total_input_tokens: { type: integer, format: int64 }
total_output_tokens: { type: integer, format: int64 }
total_cache_tokens: { type: integer, format: int64 }
requests_by_model:
type: object
additionalProperties:
type: object
properties:
request_count: { type: integer }
input_tokens: { type: integer, format: int64 }
output_tokens: { type: integer, format: int64 }
cache_tokens: { type: integer, format: int64 }
start_date: { type: string }
end_date: { type: string }
DashboardStats:
type: object
properties:
dailyStats:
type: array
items:
type: object
properties:
date: { type: string }
tokens: { type: integer, format: int64 }
requests: { type: integer }
HourlyStatsResponse:
type: object
properties:
hourlyStats:
type: array
items:
type: object
properties:
hour: { type: integer }
label: { type: string }
tokens: { type: integer, format: int64 }
requests: { type: integer }
todayTokens: { type: integer, format: int64 }
todayRequests: { type: integer }
avgResponseTime: { type: integer, format: int64 }
ModelStatsResponse:
type: object
properties:
modelStats:
type: array
items:
type: object
properties:
model: { type: string }
tokens: { type: integer, format: int64 }
requests: { type: integer }
ProxySettings:
type: object
properties:
requestHeaderRules:
type: array
items:
$ref: "#/components/schemas/HeaderRule"
responseHeaderRules:
type: array
items:
$ref: "#/components/schemas/HeaderRule"
HeaderRule:
type: object
properties:
header: { type: string, description: "Header name (case-insensitive)" }
action:
type: string
enum: [block, set, replace]
value: { type: string }
find: { type: string, description: "For replace action: string to find" }
enabled: { type: boolean }
security:
- apiKey: []
- bearerAuth: []
`
// OpenAPIJSON serves the OpenAPI spec as JSON.
func (h *Handler) OpenAPIJSON(w http.ResponseWriter, r *http.Request) {
var spec interface{}
if err := yaml.Unmarshal([]byte(openAPISpec), &spec); err != nil {
writeErrorResponse(w, "Failed to parse OpenAPI spec", http.StatusInternalServerError)
return
}
spec = convertYAMLToJSON(spec)
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Access-Control-Allow-Origin", "*")
json.NewEncoder(w).Encode(spec)
}
// OpenAPIYAML serves the OpenAPI spec as YAML.
func (h *Handler) OpenAPIYAML(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/x-yaml")
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Write([]byte(openAPISpec))
}
// convertYAMLToJSON recursively converts map[string]interface{} (from yaml) to
// JSON-compatible types. yaml.v3 uses map[string]interface{} by default so this
// mainly handles nested maps.
func convertYAMLToJSON(v interface{}) interface{} {
switch val := v.(type) {
case map[string]interface{}:
out := make(map[string]interface{}, len(val))
for k, v2 := range val {
out[k] = convertYAMLToJSON(v2)
}
return out
case []interface{}:
out := make([]interface{}, len(val))
for i, v2 := range val {
out[i] = convertYAMLToJSON(v2)
}
return out
default:
return v
}
}