Valet Agent Platform

🤖 Valet Agent Platform

A framework for building AI agents with memory, skills, and tool integration

Home Lab Project · 15 Builtin Tools · Multi-tier Memory

Valet Agent Chat Interface

Valet Agent Tool Execution

The Problem

Building AI agents that can actually do things is harder than it looks:

Stateless conversations - Chat APIs don't remember context between calls
Tool integration - Each tool needs its own calling convention
No visibility - What is the agent doing right now?
Fragile loops - Agents get stuck, repeat themselves, or run forever

I wanted a platform where I could wire up tools, give the agent memory, and watch it work in real-time.

The Solution

Valet Agent Platform is an orchestration layer that sits between your UI and the LLM. It handles sessions, memory, tool execution, and streams progress back to you.

# Simple agent interaction
async with agent.session(user_id="ben") as session:
    response = await session.send("Analyze my recent expenses")
    
    # Watch tools execute in real-time
    async for event in response.stream():
        if event.type == "tool_start":
            print(f"Running: {event.tool_name}")
        elif event.type == "content":
            print(event.delta, end="")

Key Features

🧠 Multi-tier Memory - Core facts, working context, and long-term storage
🔧 15 Builtin Tools - Web browsing, code execution, file ops, vector search, and more
📺 Real-time Streaming - Cursor-like progress events show what the agent is doing
📋 Task Planning - Agents create and execute step-by-step plans
🔌 Plugin System - Add caching, rate limiting, metrics, autonomous mode
💾 Persistent Sessions - Pick up conversations where you left off

Architecture

flowchart TB
    subgraph UI["Client"]
        A[Chat UI]
    end
    
    subgraph Platform["Valet Agent Platform"]
        B[API Gateway]
        C[Agent Service]
        D[Memory Service]
        E[Skills Service]
        F[Tool Executor]
    end
    
    subgraph Tools["Builtin Tools"]
        G[Code Executor]
        H[Web Browser]
        I[Vector Search]
        J[File Ops]
        K[+ 11 more...]
    end
    
    subgraph Backend["Backend Services"]
        L[Valet Runtime<br/>LLM Inference]
        M[PostgreSQL<br/>Sessions + Memory]
    end
    
    A -->|SSE Stream| B
    B --> C
    C --> D
    C --> E
    C --> F
    F --> G & H & I & J & K
    C --> L
    D --> M

Hexagonal Architecture

The platform uses a hexagonal (ports & adapters) architecture:

flowchart TB
    subgraph Adapters["Adapters (Infrastructure)"]
        direction TB
        A1[FastAPI<br/>REST/SSE]
        A2[SQLAlchemy<br/>PostgreSQL]
        A3[Valet Runtime<br/>LLM Client]
        A4[Milvus/Chroma<br/>Vector Store]
    end
    
    subgraph Ports["Ports (Interfaces)"]
        direction TB
        P1[API Port]
        P2[Storage Port]
        P3[LLM Port]
        P4[Embedding Port]
    end
    
    subgraph Domain["Domain (Core Logic)"]
        direction TB
        D1[Agent Service]
        D2[Memory Service]
        D3[Skills Service]
        D4[Task Service]
    end
    
    A1 --> P1 --> D1
    A2 --> P2 --> D2
    A3 --> P3 --> D1
    A4 --> P4 --> D2 & D3

Domain: Pure business logic with no I/O dependencies
Ports: Abstract interfaces for external services
Adapters: Concrete implementations (PostgreSQL, FastAPI, etc.)

This means I can swap PostgreSQL for SQLite for testing, or replace Valet Runtime with a mock—all without touching domain code.

Builtin Tools

Tool	What It Does
`code_executor`	Run Python/JS in a sandboxed environment
`web_browser`	Browse and extract content from web pages
`vector_search`	Semantic search across indexed documents
`knowledge_graph`	Query and update the knowledge graph
`file_ops`	Read, write, and manage files
`content_processor`	Process documents through the ingestion pipeline
`fin_agent`	Personal finance queries and analysis
`calculator`	Mathematical calculations
`image_gallery`	Manage and search images
`graph_query`	Execute graph database queries
`worksuite`	Integration with Work Suite tools
+ 4 more	Context info, DOCeater, and utilities

Chat UI

The platform includes a SvelteKit-based chat interface for interacting with agents:

Valet Agent Platform UI

Chat interface showing real-time tool execution and streaming responses

Memory System

The agent has four tiers of memory, each with different persistence and decay characteristics:

flowchart TB
    subgraph Input["User Message"]
        M[New Message]
    end
    
    subgraph Retrieval["Memory Retrieval"]
        direction LR
        C[Core Memory<br/>Always loaded]
        W[Working Memory<br/>Session scoped]
        L[Long-term<br/>Vector search]
        E[Episodic<br/>Time decayed]
    end
    
    subgraph Assembly["Context Assembly"]
        CTX[Assembled Context<br/>Token budget managed]
    end
    
    M --> Retrieval
    C --> CTX
    W --> CTX
    L -.->|Semantic match| CTX
    E -.->|Relevance + recency| CTX
    CTX --> LLM[LLM Call]

Tier	Persistence	Decay	Use Case
Core	Permanent	None	User identity, preferences, key facts
Working	Session	Cleared on end	Current task context, tool results
Long-term	Permanent	Slow	Conversation history, retrieved semantically
Episodic	Permanent	Time-based	Session events, fades over time

Memory retrieval is budget-aware - the system calculates how many tokens can be injected based on the model's context window (using the 25% rule from LSK principles).

Skills System (LSK)

Skills are structured knowledge units that enhance agent capabilities:

flowchart LR
    subgraph Skills["Skill Registry"]
        S1[Skill: Python Expert]
        S2[Skill: Finance Domain]
        S3[Skill: Code Review]
    end
    
    subgraph Matching["Skill Matching"]
        Q[User Query]
        M{Match<br/>Engine}
    end
    
    subgraph Injection["Context Injection"]
        B[Budget<br/>Calculator]
        CTX[Enhanced Context]
    end
    
    Q --> M
    S1 & S2 & S3 --> M
    M -->|Top matches| B
    B -->|Within budget| CTX

Skills are registered with activation patterns (keywords, semantic similarity)
The system dynamically calculates token budgets based on model context windows
Skills inject expert knowledge, procedures, and examples into the prompt
Usage is tracked for analytics and optimization

Real-time Progress

Unlike typical chat APIs, Valet streams what the agent is doing, not just the final text:

{"type": "thinking", "content": "Analyzing the request..."}
{"type": "plan", "steps": [{"step": 1, "description": "Search expenses"}]}
{"type": "tool_start", "tool": "fin_agent", "args": {"query": "expenses"}}
{"type": "tool_complete", "tool": "fin_agent", "result": "..."}
{"type": "content", "delta": "Based on your recent expenses..."}
{"type": "complete", "usage": {"tokens": 1247}}

Your UI can show progress bars, tool execution status, and partial responses as they happen.

Tech Stack

Component	Technology	Why
API	FastAPI	Async, streaming, OpenAPI
Storage	SQLAlchemy 2.0 + PostgreSQL	ORM with type safety
LLM	Valet Runtime	Unified gateway (see other post)
Events	HookRegistry	Plugin extensibility
Streaming	SSE	Real-time progress

Plugin System

The platform is designed around a hook registry that allows plugins to intercept and modify behavior at key points:

sequenceDiagram
    participant UI as Chat UI
    participant API as API Gateway
    participant Hooks as Hook Registry
    participant Agent as Agent Service
    participant LLM as Valet Runtime
    
    UI->>API: Send message
    API->>Hooks: on_request_start
    Hooks->>Agent: Process
    Agent->>Hooks: on_completion_start
    Note over Hooks: Cache plugin checks cache
    Hooks->>LLM: LLM call (if not cached)
    LLM->>Hooks: Response
    Hooks->>Hooks: on_completion_end
    Note over Hooks: Cache plugin stores result
    Hooks->>API: Final response
    API->>UI: Stream back

# Example: Add caching to all LLM calls
class CachePlugin(Plugin):
    async def on_completion_start(self, params):
        cached = await self.cache.get(params.hash)
        if cached:
            return cached  # Skip LLM call
    
    async def on_completion_end(self, params, result):
        await self.cache.set(params.hash, result)

Available Plugins

Plugin	Purpose
Cache	Avoid redundant LLM calls
Rate Limit	Per-client request throttling
Metrics	Prometheus-compatible metrics
Logging	Structured request logging
Propulsion	Auto-resume pending work
Autonomous	Goal-driven multi-step execution

Propulsion: Auto-Resume

The Propulsion Plugin implements a simple but powerful principle:

"If your hook has work, RUN IT."

On startup, the platform scans for paused or pending workflows and automatically resumes them. This means:

Interrupted tasks pick up where they left off
No orphaned sessions
Resilience to restarts

What I Learned

Building this taught me:

Memory is key - Agents without memory feel broken; users expect continuity
Visibility matters - Showing tool execution builds trust and helps debugging
Hexagonal works - Clean separation makes testing and swapping backends easy
Streaming is hard - SSE with partial JSON requires careful buffering

What's Next

More tool integrations (email, calendar)
Multi-agent orchestration
Better autonomous mode with goals
Improved memory retrieval (hybrid search)

Valet Agent Platform - AI Agent Orchestration Framework