Sentinel

Entities

The core data types in Sentinel — Suites, Cases, Runs, Results, Baselines, and more.

All Sentinel entities embed the sentinel.Entity base type and carry a TypeID identifier.

Base entity

type Entity struct {
    CreatedAt time.Time `json:"created_at"`
    UpdatedAt time.Time `json:"updated_at"`
}

Every entity struct embeds sentinel.Entity to get automatic timestamp tracking.

Entity overview

Suite

A suite groups related test cases together with a shared system prompt, model, and optional persona reference.

type Suite struct {
    sentinel.Entity
    ID           id.SuiteID
    Name         string
    Description  string
    AppID        string
    SystemPrompt string
    Model        string
    Temperature  float64
    PersonaRef   string         // optional agent persona name
    Metadata     map[string]any
}

Case

A test case defines a single evaluation scenario with input, expected output, scenario type, and scorer configuration.

type Case struct {
    sentinel.Entity
    ID           id.CaseID
    SuiteID      id.SuiteID
    Name         string
    Input        string
    Expected     string
    ScenarioType ScenarioType   // standard, skill_challenge, trait_probe, etc.
    Scorers      []ScorerConfig
    Tags         []string
    Context      map[string]any
    Metadata     map[string]any
}

See The Human Model for the 8 scenario types.

EvalRun

An evaluation run records the execution of a suite against a target:

type Run struct {
    sentinel.Entity
    ID              id.EvalRunID
    SuiteID         id.SuiteID
    Model           string
    SystemPrompt    string
    Temperature     float64
    TotalCases      int
    Passed          int
    Failed          int
    PassRate        float64
    AvgScore        float64
    AvgLatencyMs    int
    TotalTokens     int
    TotalCost       float64
    AppID           string
    TargetTenantID  string
    PersonaRef      string
    State           RunState        // running, completed, failed, cancelled
    DimensionScores map[string]float64
    CompletedAt     *time.Time
}

EvalResult

The outcome of evaluating a single test case within a run:

type Result struct {
    sentinel.Entity
    ID              id.EvalResultID
    RunID           id.EvalRunID
    CaseID          id.CaseID
    CaseName        string
    Status          ResultStatus    // pass, fail, error
    Score           float64
    Output          string
    LatencyMs       int
    TokensUsed      int
    Cost            float64
    ScorerResults   []ScorerResult
    DimensionScores map[string]float64
    RunTrace        *RunTrace       // optional agent execution trace
}

Baseline

A snapshot of a known-good evaluation run for regression detection:

type Baseline struct {
    ID              id.BaselineID
    SuiteID         id.SuiteID
    RunID           id.EvalRunID
    Name            string
    Results         []BaselineResult
    PassRate        float64
    AvgScore        float64
    DimensionScores map[string]float64
    IsCurrent       bool
    CreatedAt       time.Time
}

PromptVersion

A versioned system prompt for tracking iterations and A/B testing:

type PromptVersion struct {
    ID           id.PromptVersionID
    SuiteID      id.SuiteID
    Version      int
    SystemPrompt string
    Changelog    string
    IsCurrent    bool
    RunID        string
    PassRate     *float64
    AvgScore     *float64
    CreatedAt    time.Time
}

Entity relationship diagram

Suite
  ├── Case[]
  │     └── ScorerConfig[]
  ├── Run[]
  │     ├── Result[]
  │     │     ├── ScorerResult[]
  │     │     └── RunTrace (optional)
  │     └── DimensionScores
  ├── Baseline[]
  │     └── BaselineResult[]
  └── PromptVersion[]

Store interfaces

Each entity type defines its own store interface. These are composed into the single store.Store composite:

type Store interface {
    suite.Store         // CreateSuite, GetSuite, GetSuiteByName, ...
    testcase.Store      // CreateCase, CreateCaseBatch, GetCase, ...
    evalrun.Store       // GetRun, ListRuns, ListResults, ...
    baseline.Store      // SaveBaseline, GetBaseline, GetLatestBaseline, ...
    promptversion.Store // CreatePromptVersion, GetPromptVersion, ...

    Migrate(ctx context.Context) error
    Ping(ctx context.Context) error
    Close() error
}

On this page