Sentinel

Architecture

How Sentinel's packages fit together.

Sentinel is organized as a set of focused Go packages. The engine package is the central coordinator. All other packages define interfaces, entities, and subsystem logic that compose around it.

Package diagram

┌──────────────────────────────────────────────────────────────────────┐
│                         engine.Engine                                  │
│  CreateSuite / GetSuite / ListSuites / UpdateSuite / DeleteSuite      │
│  CreateCase / CreateCaseBatch / GetCase / ListCases / ImportCases      │
│  GetRun / ListRuns / ListResults / GetResultStats                     │
│  SaveBaseline / GetBaseline / GetLatestBaseline / ListBaselines       │
│  CreatePromptVersion / ListPromptVersions / SetCurrentPromptVersion   │
├──────────────────────────────────────────────────────────────────────┤
│                     Evaluation pipeline                                │
│  1. Load suite + cases                                                 │
│  2. Resolve target (LLM, Agent, or Function)                          │
│  3. Run each case: send input → capture output + trace                │
│  4. Score output with configured scorers                              │
│  5. Aggregate results, emit plugin hooks                              │
│  6. Compare against baseline → detect regressions                     │
├──────────────────────┬───────────────────────────────────────────────┤
│  plugin.Registry      │  api.API (Forge HTTP handlers)                │
│  OnEvalRunStarted     │  32+ REST endpoints:                          │
│  OnEvalRunCompleted   │  - Suites (5 routes)                          │
│  OnCaseCompleted      │  - Cases (5 routes)                           │
│  OnRegressionDetected │  - Runs (5 routes)                            │
│  OnBaselineSaved      │  - Baselines, RedTeam, Prompts,               │
│  (16 total hooks)     │    Scenarios, Reports                         │
├──────────────────────┴───────────────────────────────────────────────┤
│                         store.Store                                    │
│  (composite: suite.Store + testcase.Store + evalrun.Store +           │
│   baseline.Store + promptversion.Store + Migrate/Ping/Close)          │
├──────────────────────────────────────────────────────────────────────┤
│  store/postgres       │  store/sqlite         │  store/memory          │
│  (PostgreSQL + bun)   │  (SQLite + bun)       │  (in-memory maps)     │
└──────────────────────────────────────────────────────────────────────┘

Engine construction

engine.New accepts option functions:

eng, err := engine.New(
    engine.WithStore(pgStore),             // required: composite Store
    engine.WithConfig(sentinel.Config{     // optional: override defaults
        DefaultModel:  "gpt-4o",
        PassThreshold: 0.8,
    }),
    engine.WithExtension(metricsExt),      // optional: lifecycle hooks
    engine.WithLogger(slog.Default()),     // optional: structured logger
)

All components are interfaces — swap any with your own implementation.

Evaluation flow

When an evaluation run is triggered:

  1. Load suite and cases — Read the suite configuration and all test cases from the store.

  2. Resolve target — Set up the evaluation target: an LLMTarget for raw LLM calls, an AgentTarget for agent invocations (with full run trace capture), or a FuncTarget for wrapping plain functions.

  3. Execute cases — For each case, send the input to the target and capture the output. Cases run concurrently up to the configured Concurrency limit.

  4. Score results — Each case's output is evaluated by its configured scorers. Persona-aware scorers also analyze the run trace for tool usage, trait consistency, and cognitive phase transitions.

  5. Aggregate and record — Results are aggregated into the run record with pass rate, average score, dimension scores, token usage, and cost.

  6. Compare against baseline — If a baseline exists, compare the new run against it and detect regressions.

Tenant isolation

sentinel.WithTenant(ctx, id) and sentinel.WithApp(ctx, id) inject identifiers into the context. These are extracted at every layer:

  • Store — all queries include WHERE app_id = ? filters
  • Engine — scope is applied before any store operation
  • API — the Forge request context provides tenant/app identifiers

Cross-tenant access is structurally impossible: even if a caller passes a suite ID from another app, the store layer returns ErrSuiteNotFound.

Plugin system

Extensions implement the plugin.Extension base interface (just Name() string) and then opt in to specific lifecycle hooks by implementing additional interfaces:

type Extension interface {
    Name() string
}

// Opt-in hooks (implement any subset):
type EvalRunStarted interface {
    OnEvalRunStarted(ctx context.Context, suiteID id.SuiteID, runID id.EvalRunID, model string) error
}
type EvalRunCompleted interface { /* ... */ }
type CaseCompleted interface { /* ... */ }
type RegressionDetected interface { /* ... */ }
// ... 16 hooks total

The plugin.Registry type-caches extensions at registration time, so emit calls iterate only over extensions that implement the relevant hook.

Built-in extensions:

  • observability.MetricsExtension — counters for all lifecycle events
  • audithook.Extension — bridges lifecycle events to an audit trail backend

Package index

PackageImport pathPurpose
sentinelgithub.com/xraph/sentinelRoot — Entity, Config, scope helpers, errors
id.../idTypeID-based entity identifiers (7 prefixes)
engine.../engineCentral coordinator — all CRUD and eval orchestration
suite.../suiteSuite entity and store interface
testcase.../testcaseCase entity, ScenarioType, ScorerConfig
evalrun.../evalrunRun, Result, RunTrace entities and store
baseline.../baselineBaseline entity for regression detection
promptversion.../promptversionPrompt version entity for A/B testing
scorer.../scorerScorer interface, registry, 22 built-in scorers
scenario.../scenario6 scenario generators
target.../targetTarget interface — LLM, Agent, Func adapters
redteam.../redteam5 adversarial attack generators
comparison.../comparisonMulti-model comparison and baseline diff
dataset.../datasetData loaders (JSON, CSV, JSONL) and generation
report.../reportReport generators (terminal, JSON, HTML, CI)
store.../storeComposite store interface
store/postgres.../store/postgresPostgreSQL backend (bun ORM)
store/sqlite.../store/sqliteSQLite backend (bun ORM)
store/memory.../store/memoryIn-memory backend for testing
plugin.../pluginExtension interfaces and Registry
observability.../observabilityMetrics extension
audit_hook.../audit_hookAudit trail extension
api.../apiForge-native HTTP handlers (32+ routes)
extension.../extensionForge framework extension adapter

On this page