Architecture
How Sentinel's packages fit together.
Sentinel is organized as a set of focused Go packages. The engine package is the central coordinator. All other packages define interfaces, entities, and subsystem logic that compose around it.
Package diagram
┌──────────────────────────────────────────────────────────────────────┐
│ engine.Engine │
│ CreateSuite / GetSuite / ListSuites / UpdateSuite / DeleteSuite │
│ CreateCase / CreateCaseBatch / GetCase / ListCases / ImportCases │
│ GetRun / ListRuns / ListResults / GetResultStats │
│ SaveBaseline / GetBaseline / GetLatestBaseline / ListBaselines │
│ CreatePromptVersion / ListPromptVersions / SetCurrentPromptVersion │
├──────────────────────────────────────────────────────────────────────┤
│ Evaluation pipeline │
│ 1. Load suite + cases │
│ 2. Resolve target (LLM, Agent, or Function) │
│ 3. Run each case: send input → capture output + trace │
│ 4. Score output with configured scorers │
│ 5. Aggregate results, emit plugin hooks │
│ 6. Compare against baseline → detect regressions │
├──────────────────────┬───────────────────────────────────────────────┤
│ plugin.Registry │ api.API (Forge HTTP handlers) │
│ OnEvalRunStarted │ 32+ REST endpoints: │
│ OnEvalRunCompleted │ - Suites (5 routes) │
│ OnCaseCompleted │ - Cases (5 routes) │
│ OnRegressionDetected │ - Runs (5 routes) │
│ OnBaselineSaved │ - Baselines, RedTeam, Prompts, │
│ (16 total hooks) │ Scenarios, Reports │
├──────────────────────┴───────────────────────────────────────────────┤
│ store.Store │
│ (composite: suite.Store + testcase.Store + evalrun.Store + │
│ baseline.Store + promptversion.Store + Migrate/Ping/Close) │
├──────────────────────────────────────────────────────────────────────┤
│ store/postgres │ store/sqlite │ store/memory │
│ (PostgreSQL + bun) │ (SQLite + bun) │ (in-memory maps) │
└──────────────────────────────────────────────────────────────────────┘Engine construction
engine.New accepts option functions:
eng, err := engine.New(
engine.WithStore(pgStore), // required: composite Store
engine.WithConfig(sentinel.Config{ // optional: override defaults
DefaultModel: "gpt-4o",
PassThreshold: 0.8,
}),
engine.WithExtension(metricsExt), // optional: lifecycle hooks
engine.WithLogger(slog.Default()), // optional: structured logger
)All components are interfaces — swap any with your own implementation.
Evaluation flow
When an evaluation run is triggered:
-
Load suite and cases — Read the suite configuration and all test cases from the store.
-
Resolve target — Set up the evaluation target: an
LLMTargetfor raw LLM calls, anAgentTargetfor agent invocations (with full run trace capture), or aFuncTargetfor wrapping plain functions. -
Execute cases — For each case, send the input to the target and capture the output. Cases run concurrently up to the configured
Concurrencylimit. -
Score results — Each case's output is evaluated by its configured scorers. Persona-aware scorers also analyze the run trace for tool usage, trait consistency, and cognitive phase transitions.
-
Aggregate and record — Results are aggregated into the run record with pass rate, average score, dimension scores, token usage, and cost.
-
Compare against baseline — If a baseline exists, compare the new run against it and detect regressions.
Tenant isolation
sentinel.WithTenant(ctx, id) and sentinel.WithApp(ctx, id) inject identifiers into the context. These are extracted at every layer:
- Store — all queries include
WHERE app_id = ?filters - Engine — scope is applied before any store operation
- API — the Forge request context provides tenant/app identifiers
Cross-tenant access is structurally impossible: even if a caller passes a suite ID from another app, the store layer returns ErrSuiteNotFound.
Plugin system
Extensions implement the plugin.Extension base interface (just Name() string) and then opt in to specific lifecycle hooks by implementing additional interfaces:
type Extension interface {
Name() string
}
// Opt-in hooks (implement any subset):
type EvalRunStarted interface {
OnEvalRunStarted(ctx context.Context, suiteID id.SuiteID, runID id.EvalRunID, model string) error
}
type EvalRunCompleted interface { /* ... */ }
type CaseCompleted interface { /* ... */ }
type RegressionDetected interface { /* ... */ }
// ... 16 hooks totalThe plugin.Registry type-caches extensions at registration time, so emit calls iterate only over extensions that implement the relevant hook.
Built-in extensions:
observability.MetricsExtension— counters for all lifecycle eventsaudithook.Extension— bridges lifecycle events to an audit trail backend
Package index
| Package | Import path | Purpose |
|---|---|---|
sentinel | github.com/xraph/sentinel | Root — Entity, Config, scope helpers, errors |
id | .../id | TypeID-based entity identifiers (7 prefixes) |
engine | .../engine | Central coordinator — all CRUD and eval orchestration |
suite | .../suite | Suite entity and store interface |
testcase | .../testcase | Case entity, ScenarioType, ScorerConfig |
evalrun | .../evalrun | Run, Result, RunTrace entities and store |
baseline | .../baseline | Baseline entity for regression detection |
promptversion | .../promptversion | Prompt version entity for A/B testing |
scorer | .../scorer | Scorer interface, registry, 22 built-in scorers |
scenario | .../scenario | 6 scenario generators |
target | .../target | Target interface — LLM, Agent, Func adapters |
redteam | .../redteam | 5 adversarial attack generators |
comparison | .../comparison | Multi-model comparison and baseline diff |
dataset | .../dataset | Data loaders (JSON, CSV, JSONL) and generation |
report | .../report | Report generators (terminal, JSON, HTML, CI) |
store | .../store | Composite store interface |
store/postgres | .../store/postgres | PostgreSQL backend (bun ORM) |
store/sqlite | .../store/sqlite | SQLite backend (bun ORM) |
store/memory | .../store/memory | In-memory backend for testing |
plugin | .../plugin | Extension interfaces and Registry |
observability | .../observability | Metrics extension |
audit_hook | .../audit_hook | Audit trail extension |
api | .../api | Forge-native HTTP handlers (32+ routes) |
extension | .../extension | Forge framework extension adapter |