Sentinel

Getting Started

Install Sentinel and run your first evaluation in under five minutes.

Prerequisites

  • Go 1.24 or later
  • A Go module (go mod init)

Install

go get github.com/xraph/sentinel

Step 1: Create the engine

The Sentinel engine is the central coordinator. It needs a store for persistence:

package main

import (
    "context"
    "log"

    "github.com/xraph/sentinel/engine"
    "github.com/xraph/sentinel/store/memory"
)

func main() {
    ctx := context.Background()

    // Use the in-memory store for development.
    memStore := memory.New()

    // Build the Sentinel engine.
    eng, err := engine.New(
        engine.WithStore(memStore),
    )
    if err != nil {
        log.Fatal(err)
    }
    if err := eng.Start(ctx); err != nil {
        log.Fatal(err)
    }
    _ = eng
}

For production, use PostgreSQL or SQLite:

import "github.com/xraph/sentinel/store/postgres"

pgStore := postgres.New(db) // your *bun.DB
if err := pgStore.Migrate(ctx); err != nil {
    log.Fatal(err)
}

Step 2: Set up a scoped context

Sentinel extracts the app ID from the context and stamps it onto every entity:

import "github.com/xraph/sentinel"

ctx = sentinel.WithTenant(ctx, "tenant-1")
ctx = sentinel.WithApp(ctx, "myapp")

WithApp is required. All operations are automatically scoped — cross-app access is structurally impossible.

Step 3: Create a suite

A suite groups related test cases together with a shared system prompt and model:

import "github.com/xraph/sentinel/suite"

s := &suite.Suite{
    Name:         "customer-support-eval",
    Description:  "Evaluate the customer support agent",
    SystemPrompt: "You are a helpful customer support assistant.",
    Model:        "gpt-4o",
    Temperature:  0,
}
if err := eng.CreateSuite(ctx, s); err != nil {
    log.Fatal(err)
}

Step 4: Add test cases

Test cases define individual evaluations with input, expected output, and scorers:

import "github.com/xraph/sentinel/testcase"

cases := []*testcase.Case{
    {
        SuiteID:      s.ID,
        Name:         "greeting",
        Input:        "Hello, I need help with my order",
        Expected:     "professional greeting with offer to help",
        ScenarioType: testcase.ScenarioStandard,
        Scorers: []testcase.ScorerConfig{
            {Name: "contains", Config: map[string]any{"substring": "help"}},
            {Name: "llm_judge", Config: map[string]any{"criteria": "professional and helpful"}},
        },
    },
    {
        SuiteID:      s.ID,
        Name:         "refund-policy",
        Input:        "What is your refund policy?",
        Expected:     "accurate refund policy information",
        ScenarioType: testcase.ScenarioStandard,
        Scorers: []testcase.ScorerConfig{
            {Name: "contains", Config: map[string]any{"substring": "refund"}},
            {Name: "factual", Config: map[string]any{}},
        },
    },
}
if err := eng.CreateCaseBatch(ctx, cases); err != nil {
    log.Fatal(err)
}

Step 5: Query results

After running an evaluation (via the HTTP API or the runner), query the results:

// List runs for a suite
runs, _ := eng.ListRunsBySuite(ctx, s.ID)

// Get results for a specific run
results, _ := eng.ListResults(ctx, runs[0].ID)

// Get aggregate statistics
stats, _ := eng.GetResultStats(ctx, runs[0].ID)

Step 6: Save a baseline

Capture a known-good run as a baseline for regression detection:

import "github.com/xraph/sentinel/baseline"

b := &baseline.Baseline{
    SuiteID:   s.ID,
    RunID:     runs[0].ID,
    Name:      "v1-baseline",
    PassRate:  stats.PassRate,
    AvgScore:  stats.AvgScore,
    IsCurrent: true,
}
if err := eng.SaveBaseline(ctx, b); err != nil {
    log.Fatal(err)
}

Next steps

On this page