Eval Runs & Results
How Sentinel tracks evaluation execution — runs, results, and scoring.
Every evaluation execution is tracked as a Run containing Results. This model provides full observability into how an evaluation suite performed against a target.
Run
A Run represents a single execution of an evaluation suite against a target:
type Run struct {
sentinel.Entity
ID id.EvalRunID
SuiteID id.SuiteID
Model string
SystemPrompt string
Temperature float64
TotalCases int
Passed int
Failed int
PassRate float64
AvgScore float64
AvgLatencyMs int
TotalTokens int
TotalCost float64
AppID string
TargetTenantID string
PersonaRef string
State RunState
DimensionScores map[string]float64
CompletedAt *time.Time
}Run states
Runs follow a state machine with 4 states:
| State | Description |
|---|---|
running | Evaluation is actively processing cases |
completed | All cases finished successfully |
failed | Evaluation terminated with an error |
cancelled | Evaluation was cancelled |
Result
A Result is the outcome of evaluating a single test case within a run:
type Result struct {
sentinel.Entity
ID id.EvalResultID
RunID id.EvalRunID
CaseID id.CaseID
CaseName string
Status ResultStatus // pass, fail, error
Score float64
Output string
LatencyMs int
TokensUsed int
Cost float64
ScorerResults []ScorerResult
DimensionScores map[string]float64
RunTrace *RunTrace
}Result status
| Status | Description |
|---|---|
pass | Score meets or exceeds the pass threshold |
fail | Score is below the pass threshold |
error | Evaluation failed (target error, scorer error, etc.) |
ScorerResult
Each scorer produces a ScorerResult:
type ScorerResult struct {
ScorerName string
Score float64
Passed bool
Reason string
Dimension string // e.g., "skill", "trait"
Details map[string]any
}RunTrace
For persona-aware evaluations using AgentTarget, the RunTrace captures the agent's execution:
type RunTrace struct {
Steps []StepTrace
ToolCalls []ToolTrace
}
type StepTrace struct {
Index int
Type string
Output string
TokensUsed int
}
type ToolTrace struct {
ToolName string
Arguments string
Result string
Error string
}ResultStats
Aggregate statistics for a run:
type ResultStats struct {
TotalCases int
Passed int
Failed int
Errored int
PassRate float64
AvgScore float64
AvgLatencyMs int
TotalTokens int
TotalCost float64
DimensionScores map[string]float64
}Engine methods
eng.GetRun(ctx, runID)
eng.ListRuns(ctx, filter)
eng.ListRunsBySuite(ctx, suiteID)
eng.ListResults(ctx, runID)
eng.GetResultStats(ctx, runID)API routes
| Method | Path | Description |
|---|---|---|
GET | /sentinel/runs | List runs |
GET | /sentinel/runs/:id | Get a specific run |
GET | /sentinel/runs/:id/results | List results for a run |
GET | /sentinel/runs/:id/stats | Get aggregate stats |
POST | /sentinel/suites/:id/run | Execute an evaluation |