Sentinel

Eval Runs & Results

How Sentinel tracks evaluation execution — runs, results, and scoring.

Every evaluation execution is tracked as a Run containing Results. This model provides full observability into how an evaluation suite performed against a target.

Run

A Run represents a single execution of an evaluation suite against a target:

type Run struct {
    sentinel.Entity
    ID              id.EvalRunID
    SuiteID         id.SuiteID
    Model           string
    SystemPrompt    string
    Temperature     float64
    TotalCases      int
    Passed          int
    Failed          int
    PassRate        float64
    AvgScore        float64
    AvgLatencyMs    int
    TotalTokens     int
    TotalCost       float64
    AppID           string
    TargetTenantID  string
    PersonaRef      string
    State           RunState
    DimensionScores map[string]float64
    CompletedAt     *time.Time
}

Run states

Runs follow a state machine with 4 states:

StateDescription
runningEvaluation is actively processing cases
completedAll cases finished successfully
failedEvaluation terminated with an error
cancelledEvaluation was cancelled

Result

A Result is the outcome of evaluating a single test case within a run:

type Result struct {
    sentinel.Entity
    ID              id.EvalResultID
    RunID           id.EvalRunID
    CaseID          id.CaseID
    CaseName        string
    Status          ResultStatus    // pass, fail, error
    Score           float64
    Output          string
    LatencyMs       int
    TokensUsed      int
    Cost            float64
    ScorerResults   []ScorerResult
    DimensionScores map[string]float64
    RunTrace        *RunTrace
}

Result status

StatusDescription
passScore meets or exceeds the pass threshold
failScore is below the pass threshold
errorEvaluation failed (target error, scorer error, etc.)

ScorerResult

Each scorer produces a ScorerResult:

type ScorerResult struct {
    ScorerName string
    Score      float64
    Passed     bool
    Reason     string
    Dimension  string         // e.g., "skill", "trait"
    Details    map[string]any
}

RunTrace

For persona-aware evaluations using AgentTarget, the RunTrace captures the agent's execution:

type RunTrace struct {
    Steps     []StepTrace
    ToolCalls []ToolTrace
}

type StepTrace struct {
    Index      int
    Type       string
    Output     string
    TokensUsed int
}

type ToolTrace struct {
    ToolName  string
    Arguments string
    Result    string
    Error     string
}

ResultStats

Aggregate statistics for a run:

type ResultStats struct {
    TotalCases      int
    Passed          int
    Failed          int
    Errored         int
    PassRate        float64
    AvgScore        float64
    AvgLatencyMs    int
    TotalTokens     int
    TotalCost       float64
    DimensionScores map[string]float64
}

Engine methods

eng.GetRun(ctx, runID)
eng.ListRuns(ctx, filter)
eng.ListRunsBySuite(ctx, suiteID)
eng.ListResults(ctx, runID)
eng.GetResultStats(ctx, runID)

API routes

MethodPathDescription
GET/sentinel/runsList runs
GET/sentinel/runs/:idGet a specific run
GET/sentinel/runs/:id/resultsList results for a run
GET/sentinel/runs/:id/statsGet aggregate stats
POST/sentinel/suites/:id/runExecute an evaluation

On this page