claude-code-proxy/proxy/internal/service/conversation.go

448 lines
12 KiB
Go
Raw Normal View History

2025-06-29 19:27:00 -04:00
package service
import (
"bufio"
"encoding/json"
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"time"
)
type ConversationService interface {
GetConversations() (map[string][]*Conversation, error)
GetConversation(projectPath, sessionID string) (*Conversation, error)
GetConversationsByProject(projectPath string) ([]*Conversation, error)
}
type conversationService struct {
claudeProjectsPath string
}
func NewConversationService() ConversationService {
homeDir, _ := os.UserHomeDir()
return &conversationService{
claudeProjectsPath: filepath.Join(homeDir, ".claude", "projects"),
}
}
// ConversationMessage represents a single message in a Claude conversation
type ConversationMessage struct {
2025-08-03 22:30:13 -04:00
ParentUUID *string `json:"parentUuid"`
IsSidechain bool `json:"isSidechain"`
UserType string `json:"userType"`
CWD string `json:"cwd"`
SessionID string `json:"sessionId"`
Version string `json:"version"`
Type string `json:"type"`
Message json.RawMessage `json:"message"`
UUID string `json:"uuid"`
Timestamp string `json:"timestamp"`
ParsedTime time.Time `json:"-"`
2025-06-29 19:27:00 -04:00
}
// Conversation represents a complete conversation session
type Conversation struct {
2025-08-03 22:30:13 -04:00
SessionID string `json:"sessionId"`
ProjectPath string `json:"projectPath"`
ProjectName string `json:"projectName"`
Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain) This commit captures both the prior accumulated work-in-progress (framework migration web/→svelte/, postgres storage, conversation viewer, dashboard auth, OpenAPI spec, integration tests) AND today's operational improvements layered on top. History wasn't checkpointed incrementally; happy to split it via interactive rebase if a reviewer wants smaller commits. Today's changes (in addition to the older WIP): 1. Configurable upstream response-header timeout - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s) - Replaces hardcoded 300s in provider/anthropic.go that was firing on opus + 1M-context + extended thinking non-streaming requests - Files: internal/config/config.go, internal/provider/anthropic.go 2. Structured forward-error diagnostic logging - When a forward to Anthropic fails, log a single key=value line with request_id, model, stream, body_bytes, has_thinking, anthropic_beta, query, elapsed, ctx_err — alongside the existing human-readable error line for back-compat - Files: internal/handler/handlers.go (logForwardFailure) 3. Full SSE protocol passthrough + Flusher fix - handler/handlers.go: forward all SSE lines verbatim (event:, id:, retry:, : comments, blank-line terminators), not only data:. Previous code produced malformed SSE for strict parsers. - middleware/logging.go: explicit Flush() method on responseWriter. Embedding http.ResponseWriter (interface) does not auto-promote Flush(), so every w.(http.Flusher) check in the streaming handler was returning ok=false and SSE writes buffered in net/http until the body closed. 4. Non-streaming → streaming demotion (feature-flagged) - ANTHROPIC_DEMOTE_NONSTREAMING env (default false) - When enabled and the routed provider is anthropic, force stream=true upstream for clients that asked for stream=false. Receive SSE, accumulate via accumulateSSEToMessage (handles text, tool_use with partial_json reassembly, thinking, signature, citations_delta, usage merge), and synthesize a single non-streaming JSON response. - Eliminates the ResponseHeaderTimeout class of failure entirely. - Body rewrite uses json.Decoder + UseNumber() to preserve integer precision in unknown nested fields (tool inputs from prior turns). - Files: internal/config/config.go, internal/handler/handlers.go, cmd/proxy/main.go, cmd/proxy/main_test.go 5. Live operational state: /livez gauge + graceful drain - New internal/runtime package: atomic in-flight counter + draining flag - New middleware/inflight.go: increments runtime gauge, applied to /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough are all counted - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware applies surgically; /health, /livez, /openapi.* remain on parent router (unauthenticated, uncounted) - Health handler returns 503 draining when runtime.IsDraining() is true, so Traefik stops routing to a slot before drain begins - New /livez handler returns {status, in_flight, draining, timestamp} - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0 with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown - Auth bypass list extended with /livez - Files: internal/runtime/runtime.go (new), internal/middleware/inflight.go (new), internal/middleware/auth.go, internal/handler/handlers.go (Health, Livez, runtime import), cmd/proxy/main.go (subrouter, drain loop) 6. OpenAPI spec updates - Document Health 503 response and new DrainingResponse schema - Add /livez path with LivezResponse schema - Files: internal/handler/openapi.go Verified: go build ./... clean, go test ./... all pass, go vet clean. Three rounds of codex peer review across changes 1-5; all feedback addressed (citations_delta, json.Number precision, drain-loop logging via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00
Model string `json:"model,omitempty"`
2025-08-03 22:30:13 -04:00
Messages []*ConversationMessage `json:"messages"`
StartTime time.Time `json:"startTime"`
EndTime time.Time `json:"endTime"`
MessageCount int `json:"messageCount"`
FileModTime time.Time `json:"-"` // Used for sorting, not exported
2025-06-29 19:27:00 -04:00
}
// GetConversations returns all conversations organized by project
func (cs *conversationService) GetConversations() (map[string][]*Conversation, error) {
conversations := make(map[string][]*Conversation)
var parseErrors []string
rootPath, err := cs.projectsRoot()
if err != nil {
return nil, fmt.Errorf("failed to resolve claude projects root: %w", err)
}
2025-06-29 19:27:00 -04:00
err = filepath.Walk(rootPath, func(path string, info os.FileInfo, err error) error {
2025-06-29 19:27:00 -04:00
if err != nil {
// Log but don't fail the entire walk
parseErrors = append(parseErrors, fmt.Sprintf("Error accessing %s: %v", path, err))
return nil
}
if !strings.HasSuffix(path, ".jsonl") {
return nil
}
// Reject symlinked files or paths that escape the projects root.
resolvedPath, err := cs.resolveExistingPathWithinProjectsRoot(path)
if err != nil {
parseErrors = append(parseErrors, fmt.Sprintf("Skipping %s: %v", path, err))
return nil
}
// Get the project path relative to the resolved root.
projectDir := filepath.Dir(resolvedPath)
projectRelPath, err := filepath.Rel(rootPath, projectDir)
if err != nil {
parseErrors = append(parseErrors, fmt.Sprintf("Skipping %s: %v", path, err))
return nil
}
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
// Skip files directly in the projects directory
if projectRelPath == "." || projectRelPath == "" {
return nil
}
conv, err := cs.parseConversationFile(resolvedPath, projectRelPath)
2025-06-29 19:27:00 -04:00
if err != nil {
// Log parsing errors but continue processing other files
parseErrors = append(parseErrors, fmt.Sprintf("Failed to parse %s: %v", path, err))
return nil
}
if conv != nil {
// Include conversations even if they have no messages (edge case)
conversations[projectRelPath] = append(conversations[projectRelPath], conv)
}
return nil
})
if err != nil {
return nil, fmt.Errorf("failed to walk claude projects: %w", err)
}
2025-08-03 22:30:13 -04:00
// Some parsing errors may have occurred but were handled
2025-06-29 19:27:00 -04:00
// Sort conversations within each project by file modification time (newest first)
for project := range conversations {
sort.Slice(conversations[project], func(i, j int) bool {
return conversations[project][i].FileModTime.After(conversations[project][j].FileModTime)
})
}
return conversations, nil
}
// GetConversation returns a specific conversation by project and session ID
func (cs *conversationService) GetConversation(projectPath, sessionID string) (*Conversation, error) {
filePath, resolvedProjectPath, err := cs.resolveConversationFile(projectPath, sessionID)
if err != nil {
return nil, fmt.Errorf("failed to resolve conversation path: %w", err)
}
2025-08-03 22:30:13 -04:00
conv, err := cs.parseConversationFile(filePath, resolvedProjectPath)
2025-06-29 19:27:00 -04:00
if err != nil {
return nil, fmt.Errorf("failed to parse conversation: %w", err)
}
return conv, nil
}
// GetConversationsByProject returns all conversations for a specific project
func (cs *conversationService) GetConversationsByProject(projectPath string) ([]*Conversation, error) {
var conversations []*Conversation
projectDir, resolvedProjectPath, err := cs.resolveProjectDir(projectPath)
if err != nil {
return nil, fmt.Errorf("failed to resolve project path: %w", err)
}
2025-06-29 19:27:00 -04:00
files, err := os.ReadDir(projectDir)
if err != nil {
return nil, fmt.Errorf("failed to read project directory: %w", err)
}
for _, file := range files {
if !strings.HasSuffix(file.Name(), ".jsonl") {
continue
}
filePath := filepath.Join(projectDir, file.Name())
conv, err := cs.parseConversationFile(filePath, resolvedProjectPath)
2025-06-29 19:27:00 -04:00
if err != nil {
continue
}
if conv != nil && len(conv.Messages) > 0 {
conversations = append(conversations, conv)
}
}
// Sort by file modification time (newest first)
sort.Slice(conversations, func(i, j int) bool {
return conversations[i].FileModTime.After(conversations[j].FileModTime)
})
return conversations, nil
}
func (cs *conversationService) projectsRoot() (string, error) {
root, err := filepath.Abs(cs.claudeProjectsPath)
if err != nil {
return "", fmt.Errorf("failed to make projects root absolute: %w", err)
}
resolvedRoot, err := filepath.EvalSymlinks(root)
if err != nil {
if os.IsNotExist(err) {
return root, nil
}
return "", fmt.Errorf("failed to resolve projects root symlinks: %w", err)
}
return resolvedRoot, nil
}
func (cs *conversationService) resolveProjectDir(projectPath string) (string, string, error) {
cleanedProjectPath, err := cleanRelativeConversationPath(projectPath)
if err != nil {
return "", "", err
}
rootPath, err := cs.projectsRoot()
if err != nil {
return "", "", err
}
candidate := filepath.Join(rootPath, cleanedProjectPath)
resolvedCandidate, err := cs.resolveExistingPathWithinProjectsRoot(candidate)
if err != nil {
return "", "", err
}
return resolvedCandidate, cleanedProjectPath, nil
}
func (cs *conversationService) resolveConversationFile(projectPath, sessionID string) (string, string, error) {
if sessionID == "" {
return "", "", fmt.Errorf("session ID is required")
}
if sessionID != filepath.Base(sessionID) || sessionID == "." || sessionID == ".." {
return "", "", fmt.Errorf("invalid session ID: %s", sessionID)
}
projectDir, cleanedProjectPath, err := cs.resolveProjectDir(projectPath)
if err != nil {
return "", "", err
}
candidate := filepath.Join(projectDir, sessionID+".jsonl")
resolvedCandidate, err := cs.resolveExistingPathWithinProjectsRoot(candidate)
if err != nil {
return "", "", err
}
return resolvedCandidate, cleanedProjectPath, nil
}
func (cs *conversationService) resolveExistingPathWithinProjectsRoot(path string) (string, error) {
rootPath, err := cs.projectsRoot()
if err != nil {
return "", err
}
absolutePath, err := filepath.Abs(path)
if err != nil {
return "", fmt.Errorf("failed to make path absolute: %w", err)
}
normalizedPath := filepath.Clean(absolutePath)
if !pathWithinRoot(normalizedPath, rootPath) {
return "", fmt.Errorf("path escapes projects root: %s", path)
}
resolvedPath, err := filepath.EvalSymlinks(normalizedPath)
if err != nil {
return "", fmt.Errorf("failed to resolve path symlinks: %w", err)
}
if !pathWithinRoot(resolvedPath, rootPath) {
return "", fmt.Errorf("path escapes projects root after symlink resolution: %s", path)
}
return resolvedPath, nil
}
func cleanRelativeConversationPath(p string) (string, error) {
if p == "" {
return "", fmt.Errorf("path is required")
}
if filepath.IsAbs(p) {
return "", fmt.Errorf("absolute paths are not allowed: %s", p)
}
cleaned := filepath.Clean(p)
if cleaned == "." || cleaned == ".." || strings.HasPrefix(cleaned, ".."+string(os.PathSeparator)) {
return "", fmt.Errorf("path escapes projects root: %s", p)
}
return cleaned, nil
}
func pathWithinRoot(candidatePath, rootPath string) bool {
relPath, err := filepath.Rel(rootPath, candidatePath)
if err != nil {
return false
}
if relPath == "." {
return true
}
return relPath != ".." && !strings.HasPrefix(relPath, ".."+string(os.PathSeparator))
}
2025-06-29 19:27:00 -04:00
// parseConversationFile reads and parses a JSONL conversation file
func (cs *conversationService) parseConversationFile(filePath, projectPath string) (*Conversation, error) {
// Get file info for modification time
fileInfo, err := os.Stat(filePath)
if err != nil {
return nil, fmt.Errorf("failed to stat file: %w", err)
}
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
file, err := os.Open(filePath)
if err != nil {
return nil, fmt.Errorf("failed to open file: %w", err)
}
defer file.Close()
var messages []*ConversationMessage
var parseErrors int
lineNum := 0
Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain) This commit captures both the prior accumulated work-in-progress (framework migration web/→svelte/, postgres storage, conversation viewer, dashboard auth, OpenAPI spec, integration tests) AND today's operational improvements layered on top. History wasn't checkpointed incrementally; happy to split it via interactive rebase if a reviewer wants smaller commits. Today's changes (in addition to the older WIP): 1. Configurable upstream response-header timeout - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s) - Replaces hardcoded 300s in provider/anthropic.go that was firing on opus + 1M-context + extended thinking non-streaming requests - Files: internal/config/config.go, internal/provider/anthropic.go 2. Structured forward-error diagnostic logging - When a forward to Anthropic fails, log a single key=value line with request_id, model, stream, body_bytes, has_thinking, anthropic_beta, query, elapsed, ctx_err — alongside the existing human-readable error line for back-compat - Files: internal/handler/handlers.go (logForwardFailure) 3. Full SSE protocol passthrough + Flusher fix - handler/handlers.go: forward all SSE lines verbatim (event:, id:, retry:, : comments, blank-line terminators), not only data:. Previous code produced malformed SSE for strict parsers. - middleware/logging.go: explicit Flush() method on responseWriter. Embedding http.ResponseWriter (interface) does not auto-promote Flush(), so every w.(http.Flusher) check in the streaming handler was returning ok=false and SSE writes buffered in net/http until the body closed. 4. Non-streaming → streaming demotion (feature-flagged) - ANTHROPIC_DEMOTE_NONSTREAMING env (default false) - When enabled and the routed provider is anthropic, force stream=true upstream for clients that asked for stream=false. Receive SSE, accumulate via accumulateSSEToMessage (handles text, tool_use with partial_json reassembly, thinking, signature, citations_delta, usage merge), and synthesize a single non-streaming JSON response. - Eliminates the ResponseHeaderTimeout class of failure entirely. - Body rewrite uses json.Decoder + UseNumber() to preserve integer precision in unknown nested fields (tool inputs from prior turns). - Files: internal/config/config.go, internal/handler/handlers.go, cmd/proxy/main.go, cmd/proxy/main_test.go 5. Live operational state: /livez gauge + graceful drain - New internal/runtime package: atomic in-flight counter + draining flag - New middleware/inflight.go: increments runtime gauge, applied to /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough are all counted - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware applies surgically; /health, /livez, /openapi.* remain on parent router (unauthenticated, uncounted) - Health handler returns 503 draining when runtime.IsDraining() is true, so Traefik stops routing to a slot before drain begins - New /livez handler returns {status, in_flight, draining, timestamp} - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0 with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown - Auth bypass list extended with /livez - Files: internal/runtime/runtime.go (new), internal/middleware/inflight.go (new), internal/middleware/auth.go, internal/handler/handlers.go (Health, Livez, runtime import), cmd/proxy/main.go (subrouter, drain loop) 6. OpenAPI spec updates - Document Health 503 response and new DrainingResponse schema - Add /livez path with LivezResponse schema - Files: internal/handler/openapi.go Verified: go build ./... clean, go test ./... all pass, go vet clean. Three rounds of codex peer review across changes 1-5; all feedback addressed (citations_delta, json.Number precision, drain-loop logging via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00
conversationModel := ""
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
scanner := bufio.NewScanner(file)
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
// Increase buffer size for large messages
const maxScanTokenSize = 10 * 1024 * 1024 // 10MB
buf := make([]byte, maxScanTokenSize)
scanner.Buffer(buf, maxScanTokenSize)
for scanner.Scan() {
lineNum++
line := scanner.Bytes()
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
// Skip empty lines
if len(line) == 0 {
continue
}
2025-08-03 22:30:13 -04:00
2025-06-29 19:27:00 -04:00
var msg ConversationMessage
if err := json.Unmarshal(line, &msg); err != nil {
parseErrors++
// Log only first few errors to avoid spam
if parseErrors <= 3 {
2025-08-03 22:30:13 -04:00
// Skip malformed line
2025-06-29 19:27:00 -04:00
}
continue
}
// Parse timestamp
if msg.Timestamp != "" {
parsedTime, err := time.Parse(time.RFC3339, msg.Timestamp)
if err != nil {
// Try alternative timestamp formats
parsedTime, err = time.Parse(time.RFC3339Nano, msg.Timestamp)
if err != nil {
2025-08-03 22:30:13 -04:00
// Skip message with invalid timestamp
2025-06-29 19:27:00 -04:00
}
}
msg.ParsedTime = parsedTime
}
messages = append(messages, &msg)
Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain) This commit captures both the prior accumulated work-in-progress (framework migration web/→svelte/, postgres storage, conversation viewer, dashboard auth, OpenAPI spec, integration tests) AND today's operational improvements layered on top. History wasn't checkpointed incrementally; happy to split it via interactive rebase if a reviewer wants smaller commits. Today's changes (in addition to the older WIP): 1. Configurable upstream response-header timeout - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s) - Replaces hardcoded 300s in provider/anthropic.go that was firing on opus + 1M-context + extended thinking non-streaming requests - Files: internal/config/config.go, internal/provider/anthropic.go 2. Structured forward-error diagnostic logging - When a forward to Anthropic fails, log a single key=value line with request_id, model, stream, body_bytes, has_thinking, anthropic_beta, query, elapsed, ctx_err — alongside the existing human-readable error line for back-compat - Files: internal/handler/handlers.go (logForwardFailure) 3. Full SSE protocol passthrough + Flusher fix - handler/handlers.go: forward all SSE lines verbatim (event:, id:, retry:, : comments, blank-line terminators), not only data:. Previous code produced malformed SSE for strict parsers. - middleware/logging.go: explicit Flush() method on responseWriter. Embedding http.ResponseWriter (interface) does not auto-promote Flush(), so every w.(http.Flusher) check in the streaming handler was returning ok=false and SSE writes buffered in net/http until the body closed. 4. Non-streaming → streaming demotion (feature-flagged) - ANTHROPIC_DEMOTE_NONSTREAMING env (default false) - When enabled and the routed provider is anthropic, force stream=true upstream for clients that asked for stream=false. Receive SSE, accumulate via accumulateSSEToMessage (handles text, tool_use with partial_json reassembly, thinking, signature, citations_delta, usage merge), and synthesize a single non-streaming JSON response. - Eliminates the ResponseHeaderTimeout class of failure entirely. - Body rewrite uses json.Decoder + UseNumber() to preserve integer precision in unknown nested fields (tool inputs from prior turns). - Files: internal/config/config.go, internal/handler/handlers.go, cmd/proxy/main.go, cmd/proxy/main_test.go 5. Live operational state: /livez gauge + graceful drain - New internal/runtime package: atomic in-flight counter + draining flag - New middleware/inflight.go: increments runtime gauge, applied to /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough are all counted - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware applies surgically; /health, /livez, /openapi.* remain on parent router (unauthenticated, uncounted) - Health handler returns 503 draining when runtime.IsDraining() is true, so Traefik stops routing to a slot before drain begins - New /livez handler returns {status, in_flight, draining, timestamp} - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0 with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown - Auth bypass list extended with /livez - Files: internal/runtime/runtime.go (new), internal/middleware/inflight.go (new), internal/middleware/auth.go, internal/handler/handlers.go (Health, Livez, runtime import), cmd/proxy/main.go (subrouter, drain loop) 6. OpenAPI spec updates - Document Health 503 response and new DrainingResponse schema - Add /livez path with LivezResponse schema - Files: internal/handler/openapi.go Verified: go build ./... clean, go test ./... all pass, go vet clean. Three rounds of codex peer review across changes 1-5; all feedback addressed (citations_delta, json.Number precision, drain-loop logging via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00
// Claude conversation JSONL records the assistant model inside the nested message object.
// Track the latest model we see so the list view can filter by the active model tier.
var messageMeta struct {
Model string `json:"model"`
}
if err := json.Unmarshal(msg.Message, &messageMeta); err == nil && messageMeta.Model != "" {
conversationModel = messageMeta.Model
}
2025-06-29 19:27:00 -04:00
}
if err := scanner.Err(); err != nil {
return nil, fmt.Errorf("scanner error: %w", err)
}
if parseErrors > 3 {
2025-08-03 22:30:13 -04:00
// Some lines failed to parse but were skipped
2025-06-29 19:27:00 -04:00
}
// Return empty conversation if no messages (caller can decide what to do)
if len(messages) == 0 {
// Extract session ID from filename
sessionID := filepath.Base(filePath)
sessionID = strings.TrimSuffix(sessionID, ".jsonl")
// Use the full project path as provided
projectName := projectPath
// If it looks like a file path, extract the last component
if strings.Contains(projectPath, "-") {
// This handles cases like "-Users-seifghazi-dev-llm-proxy"
projectName = projectPath
}
return &Conversation{
SessionID: sessionID,
ProjectPath: projectPath,
ProjectName: projectName,
Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain) This commit captures both the prior accumulated work-in-progress (framework migration web/→svelte/, postgres storage, conversation viewer, dashboard auth, OpenAPI spec, integration tests) AND today's operational improvements layered on top. History wasn't checkpointed incrementally; happy to split it via interactive rebase if a reviewer wants smaller commits. Today's changes (in addition to the older WIP): 1. Configurable upstream response-header timeout - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s) - Replaces hardcoded 300s in provider/anthropic.go that was firing on opus + 1M-context + extended thinking non-streaming requests - Files: internal/config/config.go, internal/provider/anthropic.go 2. Structured forward-error diagnostic logging - When a forward to Anthropic fails, log a single key=value line with request_id, model, stream, body_bytes, has_thinking, anthropic_beta, query, elapsed, ctx_err — alongside the existing human-readable error line for back-compat - Files: internal/handler/handlers.go (logForwardFailure) 3. Full SSE protocol passthrough + Flusher fix - handler/handlers.go: forward all SSE lines verbatim (event:, id:, retry:, : comments, blank-line terminators), not only data:. Previous code produced malformed SSE for strict parsers. - middleware/logging.go: explicit Flush() method on responseWriter. Embedding http.ResponseWriter (interface) does not auto-promote Flush(), so every w.(http.Flusher) check in the streaming handler was returning ok=false and SSE writes buffered in net/http until the body closed. 4. Non-streaming → streaming demotion (feature-flagged) - ANTHROPIC_DEMOTE_NONSTREAMING env (default false) - When enabled and the routed provider is anthropic, force stream=true upstream for clients that asked for stream=false. Receive SSE, accumulate via accumulateSSEToMessage (handles text, tool_use with partial_json reassembly, thinking, signature, citations_delta, usage merge), and synthesize a single non-streaming JSON response. - Eliminates the ResponseHeaderTimeout class of failure entirely. - Body rewrite uses json.Decoder + UseNumber() to preserve integer precision in unknown nested fields (tool inputs from prior turns). - Files: internal/config/config.go, internal/handler/handlers.go, cmd/proxy/main.go, cmd/proxy/main_test.go 5. Live operational state: /livez gauge + graceful drain - New internal/runtime package: atomic in-flight counter + draining flag - New middleware/inflight.go: increments runtime gauge, applied to /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough are all counted - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware applies surgically; /health, /livez, /openapi.* remain on parent router (unauthenticated, uncounted) - Health handler returns 503 draining when runtime.IsDraining() is true, so Traefik stops routing to a slot before drain begins - New /livez handler returns {status, in_flight, draining, timestamp} - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0 with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown - Auth bypass list extended with /livez - Files: internal/runtime/runtime.go (new), internal/middleware/inflight.go (new), internal/middleware/auth.go, internal/handler/handlers.go (Health, Livez, runtime import), cmd/proxy/main.go (subrouter, drain loop) 6. OpenAPI spec updates - Document Health 503 response and new DrainingResponse schema - Add /livez path with LivezResponse schema - Files: internal/handler/openapi.go Verified: go build ./... clean, go test ./... all pass, go vet clean. Three rounds of codex peer review across changes 1-5; all feedback addressed (citations_delta, json.Number precision, drain-loop logging via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00
Model: conversationModel,
2025-06-29 19:27:00 -04:00
Messages: messages,
StartTime: time.Time{},
EndTime: time.Time{},
MessageCount: 0,
FileModTime: fileInfo.ModTime(),
}, nil
}
// Sort messages by timestamp
sort.Slice(messages, func(i, j int) bool {
return messages[i].ParsedTime.Before(messages[j].ParsedTime)
})
// Extract session ID from filename
sessionID := filepath.Base(filePath)
sessionID = strings.TrimSuffix(sessionID, ".jsonl")
// Use the full project path as provided
projectName := projectPath
// Find first and last valid timestamps
var startTime, endTime time.Time
for _, msg := range messages {
if !msg.ParsedTime.IsZero() {
if startTime.IsZero() || msg.ParsedTime.Before(startTime) {
startTime = msg.ParsedTime
}
if endTime.IsZero() || msg.ParsedTime.After(endTime) {
endTime = msg.ParsedTime
}
}
}
// If no valid timestamps found, use file modification time
if startTime.IsZero() {
startTime = fileInfo.ModTime()
endTime = fileInfo.ModTime()
}
return &Conversation{
SessionID: sessionID,
ProjectPath: projectPath,
ProjectName: projectName,
Local fork: hardening + ops improvements (timeout knob, demotion, /livez, drain) This commit captures both the prior accumulated work-in-progress (framework migration web/→svelte/, postgres storage, conversation viewer, dashboard auth, OpenAPI spec, integration tests) AND today's operational improvements layered on top. History wasn't checkpointed incrementally; happy to split it via interactive rebase if a reviewer wants smaller commits. Today's changes (in addition to the older WIP): 1. Configurable upstream response-header timeout - ANTHROPIC_RESPONSE_HEADER_TIMEOUT env (default 300s) - Replaces hardcoded 300s in provider/anthropic.go that was firing on opus + 1M-context + extended thinking non-streaming requests - Files: internal/config/config.go, internal/provider/anthropic.go 2. Structured forward-error diagnostic logging - When a forward to Anthropic fails, log a single key=value line with request_id, model, stream, body_bytes, has_thinking, anthropic_beta, query, elapsed, ctx_err — alongside the existing human-readable error line for back-compat - Files: internal/handler/handlers.go (logForwardFailure) 3. Full SSE protocol passthrough + Flusher fix - handler/handlers.go: forward all SSE lines verbatim (event:, id:, retry:, : comments, blank-line terminators), not only data:. Previous code produced malformed SSE for strict parsers. - middleware/logging.go: explicit Flush() method on responseWriter. Embedding http.ResponseWriter (interface) does not auto-promote Flush(), so every w.(http.Flusher) check in the streaming handler was returning ok=false and SSE writes buffered in net/http until the body closed. 4. Non-streaming → streaming demotion (feature-flagged) - ANTHROPIC_DEMOTE_NONSTREAMING env (default false) - When enabled and the routed provider is anthropic, force stream=true upstream for clients that asked for stream=false. Receive SSE, accumulate via accumulateSSEToMessage (handles text, tool_use with partial_json reassembly, thinking, signature, citations_delta, usage merge), and synthesize a single non-streaming JSON response. - Eliminates the ResponseHeaderTimeout class of failure entirely. - Body rewrite uses json.Decoder + UseNumber() to preserve integer precision in unknown nested fields (tool inputs from prior turns). - Files: internal/config/config.go, internal/handler/handlers.go, cmd/proxy/main.go, cmd/proxy/main_test.go 5. Live operational state: /livez gauge + graceful drain - New internal/runtime package: atomic in-flight counter + draining flag - New middleware/inflight.go: increments runtime gauge, applied to /v1/* subrouter so Messages, ChatCompletions, and ProxyPassthrough are all counted - /v1/* moved to a gorilla/mux subrouter so the InFlight middleware applies surgically; /health, /livez, /openapi.* remain on parent router (unauthenticated, uncounted) - Health handler returns 503 draining when runtime.IsDraining() is true, so Traefik stops routing to a slot before drain begins - New /livez handler returns {status, in_flight, draining, timestamp} - SIGTERM handler in main.go: SetDraining(true), poll for in_flight==0 with 32-min ceiling and 1s tick (logs every 10s), then srv.Shutdown - Auth bypass list extended with /livez - Files: internal/runtime/runtime.go (new), internal/middleware/inflight.go (new), internal/middleware/auth.go, internal/handler/handlers.go (Health, Livez, runtime import), cmd/proxy/main.go (subrouter, drain loop) 6. OpenAPI spec updates - Document Health 503 response and new DrainingResponse schema - Add /livez path with LivezResponse schema - Files: internal/handler/openapi.go Verified: go build ./... clean, go test ./... all pass, go vet clean. Three rounds of codex peer review across changes 1-5; all feedback addressed (citations_delta, json.Number precision, drain-loop logging via lastLog timestamp, PathPrefix tightened to "/v1/").
2026-05-02 15:15:58 -06:00
Model: conversationModel,
2025-06-29 19:27:00 -04:00
Messages: messages,
StartTime: startTime,
EndTime: endTime,
MessageCount: len(messages),
FileModTime: fileInfo.ModTime(),
}, nil
2025-08-03 22:30:13 -04:00
}