Documentation Index
Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Installation
What is a baseline?
A baseline is a snapshot of a scorecard from a known-good evaluation run. Once established, every future eval for that suite is compared against it. If a prompt change causes quality to drop beyond the allowed thresholds defined in your regression policy, the eval reports a regression.Baselines are stored as JSON files in
derived-index/baselines/. They are never deleted — when you update a baseline, the previous one is archived with a timestamp suffix.How to invoke
Ask your agent:“Use the apastra-baseline skill to set the current results as the baseline for [suite-name]“
Establishing a baseline
Locate the scorecard
Your agent finds the most recent run for the target suite in
promptops/runs/. It looks for the latest directory matching <suite-id>-* and reads its scorecard.json.If no recent run exists, your agent will prompt you to run the apastra-eval skill first.Updating a baseline
When you’ve verified that a prompt improvement is intentional and you want to raise the bar, you can update the baseline. Your agent follows an append-friendly model:- Renames the existing baseline to
<suite-id>-<timestamp>.json(for example,summarize-smoke-2026-03-10.json) as an archive - Writes the new baseline to
derived-index/baselines/<suite-id>.json - Reports both the old and new metric values so the change is visible
Rolling back a baseline
If a regression surfaces and you need to undo a baseline update, ask your agent to restore a prior baseline:“Use the apastra-baseline skill to roll back the summarize-smoke baseline”Your agent copies the archived baseline file (for example,
summarize-smoke-2026-03-10.json) back to derived-index/baselines/summarize-smoke.json. This promotes the prior scorecard as the active baseline without deleting any records.
Baseline file location
Active baselines are always at:Relationship to regression policies
The baseline file contains the reference metrics. The regression policy (promptops/policies/regression.yaml) defines how much deviation is allowed before an eval is marked as a regression:
- For
higher_is_bettermetrics: regression if candidate < (baseline − allowed_delta) or candidate < floor - For
lower_is_bettermetrics: regression if candidate > (baseline + allowed_delta) or candidate > floor
severity: blocker fail the eval. Rules with severity: warning are reported but do not block.
Rules
- Never delete a baseline — archive it with a timestamp suffix
- Only baseline passing runs — the scorecard must have passed all suite thresholds
- One active baseline per suite — the active baseline is always
<suite-id>.json - Baselines are immutable once set — updating means archiving the old file and writing a new one