Behavior Evaluation

Behavior evaluation tests whether an agent correctly executes condition-triggered action patterns — the "habits" and "reflexes" that should fire in specific contexts.

What it tests

Does the agent recognize trigger conditions?
Does it execute the correct action when triggered?
Is the trigger-action pattern reliable across variations?
Does priority ordering work when multiple behaviors could fire?

Scorer: `behavior_trigger`

The behavior_trigger scorer evaluates whether the expected behavior was activated:

testcase.ScorerConfig{
    Name: "behavior_trigger",
    Config: map[string]any{
        "trigger_type":   "on_input",
        "trigger_pattern": "security|vulnerability|CVE",
        "expected_action": "inject_prompt",
        "action_value":    "security-focused analysis",
    },
}

Scenario: `behavior_trigger`

The behavior_trigger scenario generator creates test cases with specific trigger conditions:

Case{
    ScenarioType: testcase.ScenarioBehaviorTrigger,
    Input:        "I found a potential SQL injection in the login form",
    Context: map[string]any{
        "behavior":       "security-alert",
        "trigger":        "on_input",
        "expected_action": "escalate and detail security implications",
    },
}

Dimension score

result.DimensionScores["behavior"] // 0.0 to 1.0

Use cases

Verify a code review agent activates security analysis when security keywords appear
Test that a support agent escalates when detecting frustrated customers
Ensure a data agent switches to careful mode when handling PII

Behavior Evaluation

What it tests

Scorer: behavior_trigger

Scenario: behavior_trigger

Dimension score

Use cases

On this page

Scorer: `behavior_trigger`

Scenario: `behavior_trigger`