claude-code-proxy/proxy/internal/model/models.go
sid 8e550b9785 Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain)
This commit captures both the prior accumulated work-in-progress
(framework migration web/→svelte/, postgres storage, conversation
viewer, dashboard auth, OpenAPI spec, integration tests) AND today's
operational improvements layered on top. History wasn't checkpointed
incrementally; happy to split it via interactive rebase if a reviewer
wants smaller commits.

Today's changes (in addition to the older WIP):

1. Configurable upstream response-header timeout
   - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s)
   - Replaces hardcoded 300s in provider/anthropic.go that was firing
     on opus + 1M-context + extended thinking non-streaming requests
   - Files: internal/config/config.go, internal/provider/anthropic.go

2. Structured forward-error diagnostic logging
   - When a forward to Anthropic fails, log a single key=value line
     with request_id, model, stream, body_bytes, has_thinking,
     anthropic_beta, query, elapsed, ctx_err — alongside the existing
     human-readable error line for back-compat
   - Files: internal/handler/handlers.go (logForwardFailure)

3. Full SSE protocol passthrough + Flusher fix
   - handler/handlers.go: forward all SSE lines verbatim (event:, id:,
     retry:, : comments, blank-line terminators), not only data:.
     Previous code produced malformed SSE for strict parsers.
   - middleware/logging.go: explicit Flush() method on responseWriter.
     Embedding http.ResponseWriter (interface) does not auto-promote
     Flush(), so every w.(http.Flusher) check in the streaming
     handler was returning ok=false and SSE writes buffered in net/http
     until the body closed.

4. Non-streaming → streaming demotion (feature-flagged)
   - ANTHROPIC_DEMOTE_NONSTREAMING env (default false)
   - When enabled and the routed provider is anthropic, force stream=true
     upstream for clients that asked for stream=false. Receive SSE,
     accumulate via accumulateSSEToMessage (handles text, tool_use with
     partial_json reassembly, thinking, signature, citations_delta,
     usage merge), and synthesize a single non-streaming JSON response.
   - Eliminates the ResponseHeaderTimeout class of failure entirely.
   - Body rewrite uses json.Decoder + UseNumber() to preserve integer
     precision in unknown nested fields (tool inputs from prior turns).
   - Files: internal/config/config.go, internal/handler/handlers.go,
     cmd/proxy/main.go, cmd/proxy/main_test.go

5. Live operational state: /livez gauge + graceful drain
   - New internal/runtime package: atomic in-flight counter + draining flag
   - New middleware/inflight.go: increments runtime gauge, applied to
     /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough
     are all counted
   - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware
     applies surgically; /health, /livez, /openapi.* remain on parent
     router (unauthenticated, uncounted)
   - Health handler returns 503 draining when runtime.IsDraining() is
     true, so Traefik stops routing to a slot before drain begins
   - New /livez handler returns {status, in_flight, draining, timestamp}
   - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0
     with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown
   - Auth bypass list extended with /livez
   - Files: internal/runtime/runtime.go (new),
     internal/middleware/inflight.go (new),
     internal/middleware/auth.go,
     internal/handler/handlers.go (Health, Livez, runtime import),
     cmd/proxy/main.go (subrouter, drain loop)

6. OpenAPI spec updates
   - Document Health 503 response and new DrainingResponse schema
   - Add /livez path with LivezResponse schema
   - Files: internal/handler/openapi.go

Verified: go build ./... clean, go test ./... all pass, go vet clean.
Three rounds of codex peer review across changes 1-5; all feedback
addressed (citations_delta, json.Number precision, drain-loop logging
via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00

331 lines
12 KiB
Go

package model
import (
"encoding/json"
"time"
)
type ContextKey string
const BodyBytesKey ContextKey = "bodyBytes"
// ProxySettings holds dynamic proxy configuration (persisted in DB)
type ProxySettings struct {
RequestHeaderRules []HeaderRule `json:"requestHeaderRules"`
ResponseHeaderRules []HeaderRule `json:"responseHeaderRules"`
}
// HeaderRule defines an action to take on a specific header.
// Actions: "block" (remove), "set" (override value), "replace" (find & replace in value)
type HeaderRule struct {
Header string `json:"header"` // Header name (case-insensitive match)
Action string `json:"action"` // "block", "set", "replace"
Value string `json:"value,omitempty"` // For "set": the new value. For "replace": the replacement string.
Find string `json:"find,omitempty"` // For "replace": the string to find in the header value
Enabled bool `json:"enabled"` // Toggle without deleting
}
type PromptGrade struct {
Score int `json:"score"`
MaxScore int `json:"maxScore"`
Feedback string `json:"feedback"`
ImprovedPrompt string `json:"improvedPrompt"`
Criteria map[string]CriteriaScore `json:"criteria"`
GradingTimestamp string `json:"gradingTimestamp"`
IsProcessing bool `json:"isProcessing"`
}
type CriteriaScore struct {
Score int `json:"score"`
Feedback string `json:"feedback"`
}
type RequestLog struct {
RequestID string `json:"requestId"`
Timestamp string `json:"timestamp"`
Method string `json:"method"`
Endpoint string `json:"endpoint"`
Headers map[string][]string `json:"headers"`
Body interface{} `json:"body"`
Model string `json:"model,omitempty"`
OriginalModel string `json:"originalModel,omitempty"`
RoutedModel string `json:"routedModel,omitempty"`
UserAgent string `json:"userAgent"`
ContentType string `json:"contentType"`
PromptGrade *PromptGrade `json:"promptGrade,omitempty"`
Response *ResponseLog `json:"response,omitempty"`
ConversationHash string `json:"conversationHash,omitempty"`
MessageCount int `json:"messageCount,omitempty"`
OrganizationID string `json:"organizationId,omitempty"`
}
type ResponseLog struct {
StatusCode int `json:"statusCode"`
Headers map[string][]string `json:"headers"`
Body json.RawMessage `json:"body,omitempty"`
BodyText string `json:"bodyText,omitempty"`
StreamError string `json:"streamError,omitempty"`
ResponseTime int64 `json:"responseTime"`
StreamingChunks []string `json:"streamingChunks,omitempty"`
ChunkTimings []ChunkTiming `json:"chunkTimings,omitempty"`
IsStreaming bool `json:"isStreaming"`
CompletedAt string `json:"completedAt"`
RateLimit *RateLimitInfo `json:"rateLimit,omitempty"`
}
// ChunkTiming records when each SSE chunk arrived during streaming
type ChunkTiming struct {
Index int `json:"index"`
Timestamp string `json:"timestamp"`
ByteSize int `json:"byteSize"`
ElapsedMs int64 `json:"elapsedMs"`
}
// RateLimitInfo captures rate limit / quota data from upstream response headers
type RateLimitInfo struct {
// Organization
OrganizationID string `json:"organizationId,omitempty"`
// Legacy per-resource rate limits
RequestsLimit int `json:"requestsLimit,omitempty"`
RequestsRemaining int `json:"requestsRemaining,omitempty"`
RequestsReset string `json:"requestsReset,omitempty"`
TokensLimit int `json:"tokensLimit,omitempty"`
TokensRemaining int `json:"tokensRemaining,omitempty"`
TokensReset string `json:"tokensReset,omitempty"`
// Unified quota system (Anthropic's current model)
UnifiedStatus string `json:"unifiedStatus,omitempty"`
UnifiedUtilization5h float64 `json:"unifiedUtilization5h,omitempty"`
UnifiedReset5h string `json:"unifiedReset5h,omitempty"`
UnifiedUtilization7d float64 `json:"unifiedUtilization7d,omitempty"`
UnifiedReset7d string `json:"unifiedReset7d,omitempty"`
UnifiedFallbackPercentage float64 `json:"unifiedFallbackPercentage,omitempty"`
UnifiedOverageStatus string `json:"unifiedOverageStatus,omitempty"`
UnifiedRepresentativeClaim string `json:"unifiedRepresentativeClaim,omitempty"`
}
type ChatMessage struct {
Role string `json:"role"`
Content string `json:"content"`
}
type ChatCompletionRequest struct {
Model string `json:"model"`
Messages []ChatMessage `json:"messages"`
Stream bool `json:"stream,omitempty"`
}
type AnthropicUsage struct {
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
CacheCreationInputTokens int `json:"cache_creation_input_tokens,omitempty"`
CacheReadInputTokens int `json:"cache_read_input_tokens,omitempty"`
ServiceTier string `json:"service_tier,omitempty"`
}
type AnthropicResponse struct {
Content []AnthropicContentBlock `json:"content"`
ID string `json:"id"`
Model string `json:"model"`
Role string `json:"role"`
StopReason string `json:"stop_reason"`
StopSequence *string `json:"stop_sequence"`
Type string `json:"type"`
Usage AnthropicUsage `json:"usage"`
}
type AnthropicContentBlock struct {
Type string `json:"type"`
Text string `json:"text"`
}
type AnthropicMessage struct {
Role string `json:"role"`
Content interface{} `json:"content"`
}
func (m *AnthropicMessage) GetContentBlocks() []AnthropicContentBlock {
switch v := m.Content.(type) {
case string:
return []AnthropicContentBlock{{Type: "text", Text: v}}
case []interface{}:
var blocks []AnthropicContentBlock
for _, item := range v {
if block, ok := item.(map[string]interface{}); ok {
if typ, hasType := block["type"].(string); hasType {
if text, hasText := block["text"].(string); hasText {
blocks = append(blocks, AnthropicContentBlock{Type: typ, Text: text})
}
}
}
}
return blocks
case []AnthropicContentBlock:
return v
default:
return []AnthropicContentBlock{}
}
}
type AnthropicSystemMessage struct {
Text string `json:"text"`
Type string `json:"type"`
CacheControl *CacheControl `json:"cache_control,omitempty"`
}
type CacheControl struct {
Type string `json:"type"`
}
type Tool struct {
Name string `json:"name"`
Description string `json:"description"`
InputSchema InputSchema `json:"input_schema"`
}
type InputSchema struct {
Type interface{} `json:"type"`
Properties map[string]interface{} `json:"properties"`
Required []string `json:"required,omitempty"`
}
type AnthropicRequest struct {
Model string `json:"model"`
Messages []AnthropicMessage `json:"messages"`
MaxTokens int `json:"max_tokens"`
Temperature *float64 `json:"temperature,omitempty"`
System []AnthropicSystemMessage `json:"system,omitempty"`
Stream bool `json:"stream,omitempty"`
Tools []Tool `json:"tools,omitempty"`
ToolChoice interface{} `json:"tool_choice,omitempty"`
}
type ModelsResponse struct {
Object string `json:"object"`
Data []ModelInfo `json:"data"`
}
type ModelInfo struct {
ID string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
OwnedBy string `json:"owned_by"`
}
type GradeRequest struct {
Messages []AnthropicMessage `json:"messages"`
SystemMessages []AnthropicSystemMessage `json:"systemMessages"`
RequestID string `json:"requestId,omitempty"`
}
type HealthResponse struct {
Status string `json:"status"`
Timestamp time.Time `json:"timestamp"`
}
type ErrorResponse struct {
Error string `json:"error"`
Details string `json:"details,omitempty"`
}
// UsageStats represents aggregated token usage statistics
type UsageStats struct {
TotalRequests int `json:"total_requests"`
TotalInputTokens int64 `json:"total_input_tokens"`
TotalOutputTokens int64 `json:"total_output_tokens"`
TotalCacheTokens int64 `json:"total_cache_tokens"`
RequestsByModel map[string]ModelStats `json:"requests_by_model"`
StartDate string `json:"start_date,omitempty"`
EndDate string `json:"end_date,omitempty"`
}
// ModelStats represents per-model usage statistics
type ModelStats struct {
RequestCount int `json:"request_count"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
CacheTokens int64 `json:"cache_tokens"`
}
// RequestSummary is a lightweight version of RequestLog for fast list views
type RequestSummary struct {
RequestID string `json:"requestId"`
Timestamp string `json:"timestamp"`
Method string `json:"method"`
Endpoint string `json:"endpoint"`
Model string `json:"model,omitempty"`
OriginalModel string `json:"originalModel,omitempty"`
RoutedModel string `json:"routedModel,omitempty"`
StatusCode int `json:"statusCode,omitempty"`
ResponseTime int64 `json:"responseTime,omitempty"`
Usage *AnthropicUsage `json:"usage,omitempty"`
ConversationHash string `json:"conversationHash,omitempty"`
MessageCount int `json:"messageCount,omitempty"`
StopReason string `json:"stopReason,omitempty"`
}
// Dashboard stats structures
type DashboardStats struct {
DailyStats []DailyTokens `json:"dailyStats"`
}
type HourlyStatsResponse struct {
HourlyStats []HourlyTokens `json:"hourlyStats"`
TodayTokens int64 `json:"todayTokens"`
TodayRequests int `json:"todayRequests"`
AvgResponseTime int64 `json:"avgResponseTime"`
}
type ModelStatsResponse struct {
ModelStats []ModelTokens `json:"modelStats"`
}
type DailyTokens struct {
Date string `json:"date"`
Tokens int64 `json:"tokens"`
Requests int `json:"requests"`
Models map[string]DailyModelStat `json:"models,omitempty"`
}
type HourlyTokens struct {
Hour int `json:"hour"`
Label string `json:"label,omitempty"`
Tokens int64 `json:"tokens"`
Requests int `json:"requests"`
Models map[string]DailyModelStat `json:"models,omitempty"`
}
// DailyModelStat represents per-model stats for dashboard aggregation
type DailyModelStat struct {
Tokens int64 `json:"tokens"`
Requests int `json:"requests"`
}
type ModelTokens struct {
Model string `json:"model"`
Tokens int64 `json:"tokens"`
Requests int `json:"requests"`
}
type StreamingEvent struct {
Type string `json:"type"`
Index *int `json:"index,omitempty"`
Delta *Delta `json:"delta,omitempty"`
ContentBlock *ContentBlock `json:"content_block,omitempty"`
}
type Delta struct {
Type string `json:"type,omitempty"`
Text string `json:"text,omitempty"`
Name string `json:"name,omitempty"`
Input json.RawMessage `json:"input,omitempty"`
}
type ContentBlock struct {
Type string `json:"type"`
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Input json.RawMessage `json:"input,omitempty"`
Text string `json:"text,omitempty"`
}