Assertion types

Assertions are the building blocks of apastra evaluations. You can attach them inline to dataset cases or quick eval files — no separate evaluator file required for simple checks.

{"case_id": "case-1", "inputs": {"text": "..."}, "assert": [{"type": "contains", "value": "summary"}, {"type": "is-json"}]}

Each assertion has a type and an optional value. When you run an eval, the agent applies every assertion in the assert array to the model output for that case and records pass (1) or fail (0).

Deterministic assertions

Deterministic assertions run without calling a model. They are fast, free, and should be your first line of defense.

Type	Description	Value format	Example value
`equals`	Output exactly matches the value (case-sensitive)	string	`"Hello, World!"`
`contains`	Output contains the substring (case-sensitive)	string	`"Bonjour"`
`icontains`	Output contains the substring (case-insensitive)	string	`"bonjour"`
`contains-any`	Output contains at least one value from the list	array of strings	`["hello", "hi", "hey"]`
`contains-all`	Output contains every value in the list	array of strings	`["name", "age", "email"]`
`regex`	Output matches the regular expression	regex string	`"\\d{3}-\\d{4}"`
`starts-with`	Output begins with the value	string	`"Dear "`
`is-json`	Output is valid JSON	(none)	—
`contains-json`	Output contains an embedded JSON block	(none)	—
`is-valid-json-schema`	Output matches the provided JSON Schema object	JSON Schema object	`{"type": "object", "required": ["category"]}`

is-json and contains-json do not require a value field. The dataset-case schema marks value as required on the assertion object, but for these two types you can omit it or pass null.

Examples

{"case_id": "exact", "inputs": {"q": "What is 2+2?"}, "assert": [{"type": "equals", "value": "4"}]}
{"case_id": "keyword", "inputs": {"text": "The fox jumped."}, "assert": [{"type": "icontains", "value": "fox"}]}
{"case_id": "multi-kw", "inputs": {"text": "Name, age, email provided."}, "assert": [{"type": "contains-all", "value": ["name", "age", "email"]}]}
{"case_id": "json-out", "inputs": {"q": "Return JSON."}, "assert": [{"type": "is-json"}]}
{"case_id": "schema-out", "inputs": {"q": "Classify."}, "assert": [{"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}}]}

Model-assisted assertions

Model-assisted assertions use a judge model to evaluate output quality when deterministic checks are not sufficient (tone, coherence, factual accuracy, relevance).

Type	Description	Value format
`similar`	Semantic similarity to a reference string. Requires a `threshold` (0–1).	string (reference text)
`llm-rubric`	The judge model grades the output using a rubric prompt you provide.	string (rubric text)
`factuality`	Checks that the output is factually consistent with the reference.	string (reference facts)
`answer-relevance`	Rates how relevant the output is to the input question.	(none)

`similar` threshold

For similar, include a threshold field alongside value:

{"type": "similar", "value": "The fox jumped over the dog.", "threshold": 0.8}

A threshold of 0.8 means the output must be at least 80% semantically similar to the reference. Lower thresholds allow more variation.

Writing good rubrics for `llm-rubric`

When using llm-rubric, specificity matters:

{"type": "llm-rubric", "value": "Does the response mention the company name in the first sentence? Is it under 100 words? Does it use a professional tone?"}

Vague rubrics (“Is the output good?”) produce unreliable scores. Ask for binary or numeric scales, and version your rubrics — changing rubric text changes what the metric means.

Examples

{"case_id": "semantic", "inputs": {"q": "Summarize the article."}, "assert": [{"type": "similar", "value": "The article is about climate change.", "threshold": 0.75}]}
{"case_id": "rubric", "inputs": {"q": "Write a welcome email."}, "assert": [{"type": "llm-rubric", "value": "Is the email professional, under 150 words, and does it include a greeting?"}]}
{"case_id": "facts", "inputs": {"q": "Who invented the telephone?"}, "assert": [{"type": "factuality", "value": "Alexander Graham Bell invented the telephone in 1876."}]}
{"case_id": "relevance", "inputs": {"q": "What is the capital of France?"}, "assert": [{"type": "answer-relevance"}]}

Performance assertions

Performance assertions check system-level properties rather than output content.

Type	Description	Value (threshold)
`latency`	Response time in milliseconds must be below the threshold.	number (ms)
`cost`	Token cost in dollars must be below the threshold.	number (dollars)

Examples

{"type": "latency", "value": 2000}
{"type": "cost", "value": 0.005}

Use latency and cost assertions in release-candidate suites to enforce SLAs before shipping a prompt to production.

Negation

Any assertion type can be negated by prepending not- to the type name. The result is inverted: the assertion passes when the original assertion would fail.

Negated type	What it checks
`not-equals`	Output does NOT exactly match the value
`not-contains`	Output does NOT contain the substring
`not-icontains`	Output does NOT contain the substring (case-insensitive)
`not-regex`	Output does NOT match the regex
`not-is-json`	Output is NOT valid JSON
`not-contains-json`	Output does NOT contain a JSON block

Examples

{"case_id": "no-pii", "inputs": {"q": "Generate a greeting."}, "assert": [{"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"}]}
{"case_id": "no-refusal", "inputs": {"q": "Translate to French."}, "assert": [{"type": "not-icontains", "value": "i cannot"}]}
{"case_id": "plain-text", "inputs": {"q": "Describe the weather."}, "assert": [{"type": "not-is-json"}]}

Assertion precedence

If a suite references evaluator files AND dataset cases contain inline assert blocks, both apply. They are additive:

Evaluator files are per-suite — they score every case in the suite.
Inline assert blocks are per-case — they score only that specific case.

The case’s assert_pass_rate = (assertions passed) / (total assertions in the assert array). This is reported separately from evaluator metric scores.

Quick eval assertions

In a quick eval file, assertions are embedded directly in each case:

id: summarize-quick
prompt: "Summarize in {{max_length}} words: {{text}}"
cases:
  - id: short
    inputs: { text: "The fox jumps over the dog.", max_length: "10" }
    assert:
      - type: icontains
        value: "fox"
  - id: empty-input
    inputs:
      text: ""
      max_length: "10"
    assert:
      - type: regex
        value: ".*"
  - id: no-lorem
    inputs: { text: "Hello world.", max_length: "5" }
    assert:
      - type: not-contains
        value: "Lorem ipsum"
thresholds:
  pass_rate: 1.0

Decision table

Use this table to choose the right assertion for what you want to check.

I want to check…	Use this assertion	Example
Output contains specific keywords	`contains` / `icontains`	`{"type": "icontains", "value": "summary"}`
Output is valid JSON	`is-json`	`{"type": "is-json"}`
Output matches a specific JSON structure	`is-valid-json-schema`	`{"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}}`
Output doesn’t leak internal data	`not-regex`	`{"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"}`
Output is semantically similar to a reference	`similar`	`{"type": "similar", "value": "expected answer", "threshold": 0.8}`
Output quality requires judgment	`llm-rubric`	`{"type": "llm-rubric", "value": "Is the response helpful, accurate, and concise?"}`
Output mentions at least one of several options	`contains-any`	`{"type": "contains-any", "value": ["yes", "correct", "affirmative"]}`
Output must mention all required fields	`contains-all`	`{"type": "contains-all", "value": ["name", "email", "phone"]}`
Output begins with a specific prefix	`starts-with`	`{"type": "starts-with", "value": "Dear "}`
Output must NOT contain something	`not-contains`	`{"type": "not-contains", "value": "error"}`
Response is fast enough	`latency`	`{"type": "latency", "value": 1000}`
Response is within cost budget	`cost`	`{"type": "cost", "value": 0.01}`

Assertion type pattern

The dataset-case schema validates assertion types against this pattern:

^(not-)?(equals|contains|icontains|contains-any|contains-all|regex|starts-with|is-json|contains-json|is-valid-json-schema|similar|llm-rubric|factuality|answer-relevance|latency|cost)$

Any type not matching this pattern will fail schema validation.

Get Started

Skills

Guides

Reference

Assertion types

Deterministic assertions

Examples

Model-assisted assertions

`similar` threshold

Writing good rubrics for `llm-rubric`

Examples

Performance assertions

Examples

Negation

Examples

Assertion precedence

Quick eval assertions

Decision table

Assertion type pattern

Get Started

Skills

Guides

Reference

Documentation Index

​Deterministic assertions

​Examples

​Model-assisted assertions

​similar threshold

​Writing good rubrics for llm-rubric

​Examples

​Performance assertions

​Examples

​Negation

​Examples

​Assertion precedence

​Quick eval assertions

​Decision table

​Assertion type pattern

Deterministic assertions

Examples

Model-assisted assertions

`similar` threshold

Writing good rubrics for `llm-rubric`

Examples

Performance assertions

Examples

Negation

Examples

Assertion precedence

Quick eval assertions

Decision table

Assertion type pattern