Skip to main content

Documentation Index

Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Assertions are the building blocks of apastra evaluations. You can attach them inline to dataset cases or quick eval files — no separate evaluator file required for simple checks.
{"case_id": "case-1", "inputs": {"text": "..."}, "assert": [{"type": "contains", "value": "summary"}, {"type": "is-json"}]}
Each assertion has a type and an optional value. When you run an eval, the agent applies every assertion in the assert array to the model output for that case and records pass (1) or fail (0).

Deterministic assertions

Deterministic assertions run without calling a model. They are fast, free, and should be your first line of defense.
TypeDescriptionValue formatExample value
equalsOutput exactly matches the value (case-sensitive)string"Hello, World!"
containsOutput contains the substring (case-sensitive)string"Bonjour"
icontainsOutput contains the substring (case-insensitive)string"bonjour"
contains-anyOutput contains at least one value from the listarray of strings["hello", "hi", "hey"]
contains-allOutput contains every value in the listarray of strings["name", "age", "email"]
regexOutput matches the regular expressionregex string"\\d{3}-\\d{4}"
starts-withOutput begins with the valuestring"Dear "
is-jsonOutput is valid JSON(none)
contains-jsonOutput contains an embedded JSON block(none)
is-valid-json-schemaOutput matches the provided JSON Schema objectJSON Schema object{"type": "object", "required": ["category"]}
is-json and contains-json do not require a value field. The dataset-case schema marks value as required on the assertion object, but for these two types you can omit it or pass null.

Examples

{"case_id": "exact", "inputs": {"q": "What is 2+2?"}, "assert": [{"type": "equals", "value": "4"}]}
{"case_id": "keyword", "inputs": {"text": "The fox jumped."}, "assert": [{"type": "icontains", "value": "fox"}]}
{"case_id": "multi-kw", "inputs": {"text": "Name, age, email provided."}, "assert": [{"type": "contains-all", "value": ["name", "age", "email"]}]}
{"case_id": "json-out", "inputs": {"q": "Return JSON."}, "assert": [{"type": "is-json"}]}
{"case_id": "schema-out", "inputs": {"q": "Classify."}, "assert": [{"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}}]}

Model-assisted assertions

Model-assisted assertions use a judge model to evaluate output quality when deterministic checks are not sufficient (tone, coherence, factual accuracy, relevance).
TypeDescriptionValue format
similarSemantic similarity to a reference string. Requires a threshold (0–1).string (reference text)
llm-rubricThe judge model grades the output using a rubric prompt you provide.string (rubric text)
factualityChecks that the output is factually consistent with the reference.string (reference facts)
answer-relevanceRates how relevant the output is to the input question.(none)

similar threshold

For similar, include a threshold field alongside value:
{"type": "similar", "value": "The fox jumped over the dog.", "threshold": 0.8}
A threshold of 0.8 means the output must be at least 80% semantically similar to the reference. Lower thresholds allow more variation.

Writing good rubrics for llm-rubric

When using llm-rubric, specificity matters:
{"type": "llm-rubric", "value": "Does the response mention the company name in the first sentence? Is it under 100 words? Does it use a professional tone?"}
Vague rubrics (“Is the output good?”) produce unreliable scores. Ask for binary or numeric scales, and version your rubrics — changing rubric text changes what the metric means.

Examples

{"case_id": "semantic", "inputs": {"q": "Summarize the article."}, "assert": [{"type": "similar", "value": "The article is about climate change.", "threshold": 0.75}]}
{"case_id": "rubric", "inputs": {"q": "Write a welcome email."}, "assert": [{"type": "llm-rubric", "value": "Is the email professional, under 150 words, and does it include a greeting?"}]}
{"case_id": "facts", "inputs": {"q": "Who invented the telephone?"}, "assert": [{"type": "factuality", "value": "Alexander Graham Bell invented the telephone in 1876."}]}
{"case_id": "relevance", "inputs": {"q": "What is the capital of France?"}, "assert": [{"type": "answer-relevance"}]}

Performance assertions

Performance assertions check system-level properties rather than output content.
TypeDescriptionValue (threshold)
latencyResponse time in milliseconds must be below the threshold.number (ms)
costToken cost in dollars must be below the threshold.number (dollars)

Examples

{"type": "latency", "value": 2000}
{"type": "cost", "value": 0.005}
Use latency and cost assertions in release-candidate suites to enforce SLAs before shipping a prompt to production.

Negation

Any assertion type can be negated by prepending not- to the type name. The result is inverted: the assertion passes when the original assertion would fail.
Negated typeWhat it checks
not-equalsOutput does NOT exactly match the value
not-containsOutput does NOT contain the substring
not-icontainsOutput does NOT contain the substring (case-insensitive)
not-regexOutput does NOT match the regex
not-is-jsonOutput is NOT valid JSON
not-contains-jsonOutput does NOT contain a JSON block

Examples

{"case_id": "no-pii", "inputs": {"q": "Generate a greeting."}, "assert": [{"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"}]}
{"case_id": "no-refusal", "inputs": {"q": "Translate to French."}, "assert": [{"type": "not-icontains", "value": "i cannot"}]}
{"case_id": "plain-text", "inputs": {"q": "Describe the weather."}, "assert": [{"type": "not-is-json"}]}

Assertion precedence

If a suite references evaluator files AND dataset cases contain inline assert blocks, both apply. They are additive:
  • Evaluator files are per-suite — they score every case in the suite.
  • Inline assert blocks are per-case — they score only that specific case.
The case’s assert_pass_rate = (assertions passed) / (total assertions in the assert array). This is reported separately from evaluator metric scores.

Quick eval assertions

In a quick eval file, assertions are embedded directly in each case:
id: summarize-quick
prompt: "Summarize in {{max_length}} words: {{text}}"
cases:
  - id: short
    inputs: { text: "The fox jumps over the dog.", max_length: "10" }
    assert:
      - type: icontains
        value: "fox"
  - id: empty-input
    inputs:
      text: ""
      max_length: "10"
    assert:
      - type: regex
        value: ".*"
  - id: no-lorem
    inputs: { text: "Hello world.", max_length: "5" }
    assert:
      - type: not-contains
        value: "Lorem ipsum"
thresholds:
  pass_rate: 1.0

Decision table

Use this table to choose the right assertion for what you want to check.
I want to check…Use this assertionExample
Output contains specific keywordscontains / icontains{"type": "icontains", "value": "summary"}
Output is valid JSONis-json{"type": "is-json"}
Output matches a specific JSON structureis-valid-json-schema{"type": "is-valid-json-schema", "value": {"type": "object", "required": ["category"]}}
Output doesn’t leak internal datanot-regex{"type": "not-regex", "value": "[0-9a-f]{8}-[0-9a-f]{4}"}
Output is semantically similar to a referencesimilar{"type": "similar", "value": "expected answer", "threshold": 0.8}
Output quality requires judgmentllm-rubric{"type": "llm-rubric", "value": "Is the response helpful, accurate, and concise?"}
Output mentions at least one of several optionscontains-any{"type": "contains-any", "value": ["yes", "correct", "affirmative"]}
Output must mention all required fieldscontains-all{"type": "contains-all", "value": ["name", "email", "phone"]}
Output begins with a specific prefixstarts-with{"type": "starts-with", "value": "Dear "}
Output must NOT contain somethingnot-contains{"type": "not-contains", "value": "error"}
Response is fast enoughlatency{"type": "latency", "value": 1000}
Response is within cost budgetcost{"type": "cost", "value": 0.01}

Assertion type pattern

The dataset-case schema validates assertion types against this pattern:
^(not-)?(equals|contains|icontains|contains-any|contains-all|regex|starts-with|is-json|contains-json|is-valid-json-schema|similar|llm-rubric|factuality|answer-relevance|latency|cost)$
Any type not matching this pattern will fail schema validation.