Baselines & Regression Detection
Track performance over time and detect regressions against known-good baselines.
Baselines capture the results of a known-good evaluation run. Future runs can be compared against the baseline to detect performance regressions.
Baseline
type Baseline struct {
ID id.BaselineID
SuiteID id.SuiteID
RunID id.EvalRunID
Name string
Results []BaselineResult
PassRate float64
AvgScore float64
DimensionScores map[string]float64
IsCurrent bool
CreatedAt time.Time
}BaselineResult
Per-case baseline data for comparison:
type BaselineResult struct {
CaseID id.CaseID
CaseName string
Score float64
Status string
DimensionScores map[string]float64
}Creating a baseline
Save a known-good run as the current baseline:
b := &baseline.Baseline{
SuiteID: suiteID,
RunID: runID,
Name: "v2.1-release",
PassRate: 0.95,
AvgScore: 0.88,
IsCurrent: true,
}
if err := eng.SaveBaseline(ctx, b); err != nil {
log.Fatal(err)
}The IsCurrent flag marks which baseline is the active reference point. Only one baseline per suite should be current.
Regression detection
Compare a new run against the current baseline to detect performance drops:
- Pass rate regression — Overall pass rate dropped below the baseline
- Score regression — Average score dropped by more than the configured threshold
- Per-case regression — Individual cases that passed in the baseline now fail
- Dimension regression — Specific dimensions (e.g., skill, trait) degraded
When a regression is detected, the OnRegressionDetected plugin hook fires:
type RegressionDetected interface {
OnRegressionDetected(ctx context.Context, suiteID id.SuiteID, baselineID id.BaselineID, delta float64) error
}DimensionScores tracking
Baselines track per-dimension scores alongside overall metrics:
b.DimensionScores = map[string]float64{
"skill": 0.92,
"trait": 0.88,
"communication": 0.95,
"cognition": 0.85,
}This enables regression detection per dimension — catch a drop in skill evaluation even if overall pass rate holds steady.
Engine methods
eng.SaveBaseline(ctx, baseline)
eng.GetBaseline(ctx, baselineID)
eng.GetLatestBaseline(ctx, suiteID)
eng.ListBaselines(ctx, suiteID)
eng.DeleteBaseline(ctx, baselineID)API routes
| Method | Path | Description |
|---|---|---|
POST | /sentinel/baselines/:suiteId | Save a baseline |
GET | /sentinel/baselines/:suiteId | Get the latest baseline |
GET | /sentinel/baselines/:suiteId/baselines | List all baselines |