Sentinel

Cognitive Evaluation

Test how an agent thinks — phase transitions, depth, and reasoning strategy.

Cognitive evaluation tests whether an agent follows appropriate thinking strategies, transitions between cognitive phases correctly, and maintains the right depth and focus.

What it tests

  • Does the agent transition between thinking phases appropriately?
  • Does depth of analysis match expectations (shallow vs. deep)?
  • Does focus remain appropriate (broad vs. narrow)?
  • Does the agent self-reflect when needed?

Scorer: cognitive_phase

The cognitive_phase scorer evaluates the agent's reasoning strategy transitions by analyzing the run trace:

testcase.ScorerConfig{
    Name: "cognitive_phase",
    Config: map[string]any{
        "expected_phases": []string{"analytical", "reflective", "methodical"},
        "depth_min":       0.7,
        "focus_min":       0.5,
    },
}

Scenario: cognitive_stress

The cognitive_stress scenario generator creates test cases that require the agent to shift thinking strategies:

Case{
    ScenarioType: testcase.ScenarioCognitiveStress,
    Input:        "This function has both a performance bug and a security issue. Fix the critical one first, then address the other.",
    Context: map[string]any{
        "expected_phases": []string{"analytical", "methodical"},
        "depth_required":  "deep",
    },
}

Cognitive strategies

StrategyDescription
analyticalStructured, step-by-step analysis
creativeExploratory, lateral thinking
methodicalSystematic, exhaustive approach
reactiveQuick response, minimal deliberation
reflectiveSelf-evaluating, iterative refinement
collaborativeSeeks input, considers multiple perspectives

Dimension score

result.DimensionScores["cognition"] // 0.0 to 1.0

Use cases

  • Verify a code reviewer starts with analysis, then reflects on findings, then methodically addresses issues
  • Test that an agent switches from broad exploration to focused investigation when it finds a lead
  • Ensure deep analysis for complex tasks, quick responses for simple ones

On this page