Sentinel

Introduction

Composable AI evaluation and testing framework for Go.

Sentinel is a Go library for testing AI agents the way you'd evaluate a human professional. Instead of simple input/output scoring, you evaluate across multiple dimensions — skills, traits, behaviors, cognition, communication, perception, and persona coherence — the same building blocks that make people unique.

Sentinel is a library — not a service. You bring your own LLM provider, database, and HTTP server. Sentinel provides the evaluation orchestration plumbing.

The Human-Like Testing Model

Sentinel evaluates AI agents the way you would evaluate a person:

DimensionWhat it testsScorer
SkillCan the agent do the job? (tool selection, proficiency)skill_usage
TraitWho is the agent? (personality consistency)trait_consistency
BehaviorHow does it react? (trigger-action patterns)behavior_trigger
CognitionHow does it think? (phase transitions, depth)cognitive_phase
CommunicationHow does it talk? (tone, formality, verbosity)communication_style
PerceptionWhat does it notice? (attention focus, detail)perception_focus
PersonaThe whole person (end-to-end identity coherence)persona_coherence

What it does

  • Multi-dimensional scoring — Score across 7 human-like dimensions simultaneously, or use traditional input/output scorers.
  • 22 built-in scorers — From exact match and regex to LLM-as-judge, semantic similarity, and all 7 persona-aware scorers.
  • Scenario generation — Auto-generate test cases targeting specific evaluation dimensions.
  • Baseline & regression detection — Track performance over time and detect regressions.
  • Adversarial testing (red team) — Test for prompt injection, jailbreaks, data leakage, hallucination, and off-topic responses.
  • Multi-model comparison — Compare performance across different LLMs or agent configurations.
  • Prompt versioning — Track system prompt iterations with performance metrics.
  • Plugin system — 16 lifecycle hooks for metrics, audit trails, and custom processing.
  • Three storage backends — PostgreSQL, SQLite, and in-memory.
  • Forge integration — Drop-in forge.Extension with DI-injected Engine and auto-registered HTTP routes.
  • REST API — 32+ endpoints for managing suites, cases, runs, baselines, red team, prompts, scenarios, and reports.
  • go test integration — Assertion functions for CI/CD pipelines.

Design philosophy

Library, not service. Sentinel is a set of Go packages you import. You control main, the database connection, and the process lifecycle.

Human-like evaluation. You don't just test output — you test skills, personality, thinking patterns, and communication style. Traditional LLM evals still work as a subset.

Interfaces over implementations. Every subsystem defines a Go interface. Swap any storage backend with a single type change.

Tenant-scoped by design. sentinel.WithTenant and sentinel.WithApp inject context enforced at every layer.

TypeID everywhere. All entities use type-prefixed, K-sortable, UUIDv7-based identifiers (suite_, tcase_, erun_, eres_, base_, etc.).

Quick look

package main

import (
    "context"
    "log"

    "github.com/xraph/sentinel/engine"
    "github.com/xraph/sentinel/store/memory"
)

func main() {
    ctx := context.Background()

    // Create an in-memory store for development.
    memStore := memory.New()

    // Build the Sentinel engine.
    eng, err := engine.New(
        engine.WithStore(memStore),
    )
    if err != nil {
        log.Fatal(err)
    }
    _ = eng
    _ = ctx
}

Where to go next

On this page