Documentation Index
Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Assertions are the building blocks of apastra evaluations. You can attach them inline to dataset cases or quick eval files — no separate evaluator file required for simple checks.
{"case_id": "case-1", "inputs": {"text": "..."}, "assert": [{"type": "contains", "value": "summary"}, {"type": "is-json"}]}
Each assertion has a type and an optional value. When you run an eval, the agent applies every assertion in the assert array to the model output for that case and records pass (1) or fail (0).
Deterministic assertions
Deterministic assertions run without calling a model. They are fast, free, and should be your first line of defense.
| Type | Description | Value format | Example value |
|---|
equals | Output exactly matches the value (case-sensitive) | string | "Hello, World!" |
contains | Output contains the substring (case-sensitive) | string | "Bonjour" |
icontains | Output contains the substring (case-insensitive) | string | "bonjour" |
contains-any | Output contains at least one value from the list | array of strings | ["hello", "hi", "hey"] |
contains-all | Output contains every value in the list | array of strings | ["name", "age", "email"] |
regex | Output matches the regular expression | regex string | "\\d{3}-\\d{4}" |
starts-with | Output begins with the value | string | "Dear " |
is-json | Output is valid JSON | (none) | — |
contains-json | Output contains an embedded JSON block | (none) | — |
is-valid-json-schema | Output matches the provided JSON Schema object | JSON Schema object | {"type": "object", "required": ["category"]} |
is-json and contains-json do not require a value field. The dataset-case schema marks value as required on the assertion object, but for these two types you can omit it or pass null.
Examples
{"case_id": "exact", "inputs": {"q": "What is 2+2?"}, "assert": [{"type": "equals", "value": "4"}]}
{"case_id": "keyword", "inputs": {"text": "The fox jumped."}, "assert": [{"type": "icontains", "value": "fox"}]}
{"case_id": "multi-kw", "inputs": {"text": "Name, age, email provided."}, "assert": [{"type": "contains-all", "value": ["name", "age", "email"]}]}
{"case_id": "json-out", "inputs": {"q": "Return JSON."}, "assert": [{"type": "is-json"}]}
{"case_id": "schema-out", "inputs": {"q": "Classify."}, "assert": [{"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}}]}
Model-assisted assertions
Model-assisted assertions use a judge model to evaluate output quality when deterministic checks are not sufficient (tone, coherence, factual accuracy, relevance).
| Type | Description | Value format |
|---|
similar | Semantic similarity to a reference string. Requires a threshold (0–1). | string (reference text) |
llm-rubric | The judge model grades the output using a rubric prompt you provide. | string (rubric text) |
factuality | Checks that the output is factually consistent with the reference. | string (reference facts) |
answer-relevance | Rates how relevant the output is to the input question. | (none) |
similar threshold
For similar, include a threshold field alongside value:
{"type": "similar", "value": "The fox jumped over the dog.", "threshold": 0.8}
A threshold of 0.8 means the output must be at least 80% semantically similar to the reference. Lower thresholds allow more variation.
Writing good rubrics for llm-rubric
When using llm-rubric, specificity matters:
{"type": "llm-rubric", "value": "Does the response mention the company name in the first sentence? Is it under 100 words? Does it use a professional tone?"}
Vague rubrics (“Is the output good?”) produce unreliable scores. Ask for binary or numeric scales, and version your rubrics — changing rubric text changes what the metric means.
Examples
{"case_id": "semantic", "inputs": {"q": "Summarize the article."}, "assert": [{"type": "similar", "value": "The article is about climate change.", "threshold": 0.75}]}
{"case_id": "rubric", "inputs": {"q": "Write a welcome email."}, "assert": [{"type": "llm-rubric", "value": "Is the email professional, under 150 words, and does it include a greeting?"}]}
{"case_id": "facts", "inputs": {"q": "Who invented the telephone?"}, "assert": [{"type": "factuality", "value": "Alexander Graham Bell invented the telephone in 1876."}]}
{"case_id": "relevance", "inputs": {"q": "What is the capital of France?"}, "assert": [{"type": "answer-relevance"}]}
Performance assertions check system-level properties rather than output content.
| Type | Description | Value (threshold) |
|---|
latency | Response time in milliseconds must be below the threshold. | number (ms) |
cost | Token cost in dollars must be below the threshold. | number (dollars) |
Examples
{"type": "latency", "value": 2000}
{"type": "cost", "value": 0.005}
Use latency and cost assertions in release-candidate suites to enforce SLAs before shipping a prompt to production.
Negation
Any assertion type can be negated by prepending not- to the type name. The result is inverted: the assertion passes when the original assertion would fail.
| Negated type | What it checks |
|---|
not-equals | Output does NOT exactly match the value |
not-contains | Output does NOT contain the substring |
not-icontains | Output does NOT contain the substring (case-insensitive) |
not-regex | Output does NOT match the regex |
not-is-json | Output is NOT valid JSON |
not-contains-json | Output does NOT contain a JSON block |
Examples
{"case_id": "no-pii", "inputs": {"q": "Generate a greeting."}, "assert": [{"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"}]}
{"case_id": "no-refusal", "inputs": {"q": "Translate to French."}, "assert": [{"type": "not-icontains", "value": "i cannot"}]}
{"case_id": "plain-text", "inputs": {"q": "Describe the weather."}, "assert": [{"type": "not-is-json"}]}
Assertion precedence
If a suite references evaluator files AND dataset cases contain inline assert blocks, both apply. They are additive:
- Evaluator files are per-suite — they score every case in the suite.
- Inline
assert blocks are per-case — they score only that specific case.
The case’s assert_pass_rate = (assertions passed) / (total assertions in the assert array). This is reported separately from evaluator metric scores.
Quick eval assertions
In a quick eval file, assertions are embedded directly in each case:
id: summarize-quick
prompt: "Summarize in {{max_length}} words: {{text}}"
cases:
- id: short
inputs: { text: "The fox jumps over the dog.", max_length: "10" }
assert:
- type: icontains
value: "fox"
- id: empty-input
inputs:
text: ""
max_length: "10"
assert:
- type: regex
value: ".*"
- id: no-lorem
inputs: { text: "Hello world.", max_length: "5" }
assert:
- type: not-contains
value: "Lorem ipsum"
thresholds:
pass_rate: 1.0
Decision table
Use this table to choose the right assertion for what you want to check.
| I want to check… | Use this assertion | Example |
|---|
| Output contains specific keywords | contains / icontains | {"type": "icontains", "value": "summary"} |
| Output is valid JSON | is-json | {"type": "is-json"} |
| Output matches a specific JSON structure | is-valid-json-schema | {"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}} |
| Output doesn’t leak internal data | not-regex | {"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"} |
| Output is semantically similar to a reference | similar | {"type": "similar", "value": "expected answer", "threshold": 0.8} |
| Output quality requires judgment | llm-rubric | {"type": "llm-rubric", "value": "Is the response helpful, accurate, and concise?"} |
| Output mentions at least one of several options | contains-any | {"type": "contains-any", "value": ["yes", "correct", "affirmative"]} |
| Output must mention all required fields | contains-all | {"type": "contains-all", "value": ["name", "email", "phone"]} |
| Output begins with a specific prefix | starts-with | {"type": "starts-with", "value": "Dear "} |
| Output must NOT contain something | not-contains | {"type": "not-contains", "value": "error"} |
| Response is fast enough | latency | {"type": "latency", "value": 1000} |
| Response is within cost budget | cost | {"type": "cost", "value": 0.01} |
Assertion type pattern
The dataset-case schema validates assertion types against this pattern:
^(not-)?(equals|contains|icontains|contains-any|contains-all|regex|starts-with|is-json|contains-json|is-valid-json-schema|similar|llm-rubric|factuality|answer-relevance|latency|cost)$
Any type not matching this pattern will fail schema validation.