Getting Started
Install Sentinel and run your first evaluation in under five minutes.
Prerequisites
- Go 1.24 or later
- A Go module (
go mod init)
Install
go get github.com/xraph/sentinelStep 1: Create the engine
The Sentinel engine is the central coordinator. It needs a store for persistence:
package main
import (
"context"
"log"
"github.com/xraph/sentinel/engine"
"github.com/xraph/sentinel/store/memory"
)
func main() {
ctx := context.Background()
// Use the in-memory store for development.
memStore := memory.New()
// Build the Sentinel engine.
eng, err := engine.New(
engine.WithStore(memStore),
)
if err != nil {
log.Fatal(err)
}
if err := eng.Start(ctx); err != nil {
log.Fatal(err)
}
_ = eng
}For production, use PostgreSQL or SQLite:
import "github.com/xraph/sentinel/store/postgres"
pgStore := postgres.New(db) // your *bun.DB
if err := pgStore.Migrate(ctx); err != nil {
log.Fatal(err)
}Step 2: Set up a scoped context
Sentinel extracts the app ID from the context and stamps it onto every entity:
import "github.com/xraph/sentinel"
ctx = sentinel.WithTenant(ctx, "tenant-1")
ctx = sentinel.WithApp(ctx, "myapp")WithApp is required. All operations are automatically scoped — cross-app access is structurally impossible.
Step 3: Create a suite
A suite groups related test cases together with a shared system prompt and model:
import "github.com/xraph/sentinel/suite"
s := &suite.Suite{
Name: "customer-support-eval",
Description: "Evaluate the customer support agent",
SystemPrompt: "You are a helpful customer support assistant.",
Model: "gpt-4o",
Temperature: 0,
}
if err := eng.CreateSuite(ctx, s); err != nil {
log.Fatal(err)
}Step 4: Add test cases
Test cases define individual evaluations with input, expected output, and scorers:
import "github.com/xraph/sentinel/testcase"
cases := []*testcase.Case{
{
SuiteID: s.ID,
Name: "greeting",
Input: "Hello, I need help with my order",
Expected: "professional greeting with offer to help",
ScenarioType: testcase.ScenarioStandard,
Scorers: []testcase.ScorerConfig{
{Name: "contains", Config: map[string]any{"substring": "help"}},
{Name: "llm_judge", Config: map[string]any{"criteria": "professional and helpful"}},
},
},
{
SuiteID: s.ID,
Name: "refund-policy",
Input: "What is your refund policy?",
Expected: "accurate refund policy information",
ScenarioType: testcase.ScenarioStandard,
Scorers: []testcase.ScorerConfig{
{Name: "contains", Config: map[string]any{"substring": "refund"}},
{Name: "factual", Config: map[string]any{}},
},
},
}
if err := eng.CreateCaseBatch(ctx, cases); err != nil {
log.Fatal(err)
}Step 5: Query results
After running an evaluation (via the HTTP API or the runner), query the results:
// List runs for a suite
runs, _ := eng.ListRunsBySuite(ctx, s.ID)
// Get results for a specific run
results, _ := eng.ListResults(ctx, runs[0].ID)
// Get aggregate statistics
stats, _ := eng.GetResultStats(ctx, runs[0].ID)Step 6: Save a baseline
Capture a known-good run as a baseline for regression detection:
import "github.com/xraph/sentinel/baseline"
b := &baseline.Baseline{
SuiteID: s.ID,
RunID: runs[0].ID,
Name: "v1-baseline",
PassRate: stats.PassRate,
AvgScore: stats.AvgScore,
IsCurrent: true,
}
if err := eng.SaveBaseline(ctx, b); err != nil {
log.Fatal(err)
}Next steps
- Architecture — Understand how the packages fit together
- The Human Model — Deep dive into persona-aware evaluation dimensions
- Forge Extension — Wire Sentinel into a Forge application
- PostgreSQL Store — Production persistence setup