Apastra runs entirely through your IDE agent. There is no server to start, no API key to configure, and no CI required to get going. By the end of this guide, you will have a working prompt spec, a test dataset, and a passing eval with a baseline set.Documentation Index
Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Apastra works with any IDE agent that supports SKILL.md — including Claude Code, Cursor, Amp, Codex, and 37 more.
Install skills
Run this command in your project root to install all Apastra skills into your IDE agent:This installs five skills:
You can also install individual skills if you only need part of the workflow:
| Skill | What it does |
|---|---|
apastra-getting-started | Project setup and onboarding walkthrough |
apastra-scaffold | Generate prompt specs, datasets, evaluators, and suites |
apastra-eval | Run evaluations and compare against baselines |
apastra-baseline | Establish and manage known-good baselines |
apastra-validate | Validate all files against JSON schemas |
Scaffold your first prompt
Ask your IDE agent:The prompt spec (The dataset (
“Use the apastra-scaffold skill to create a prompt spec, dataset, evaluator, and suite for summarizing text”Your agent creates four files:
prompts/summarize-v1.yaml) looks like this:datasets/summarize-smoke.jsonl) has one JSON object per line:Run your first eval
Ask your IDE agent:The agent also saves run artifacts to
“Use the apastra-eval skill to run the summarize-smoke suite”Your agent reads the suite spec, loads the dataset and evaluator, renders each prompt template with the test case inputs, calls the model, scores the outputs, and reports results:
promptops/runs/<run-id>/ — a scorecard, per-case results, and a run manifest with timestamps and model metadata.- Suite mode
- Quick eval mode
Suite mode uses the full four-file pipeline: prompt spec + dataset + evaluator + suite config. Best for structured, reusable test suites.Ask your agent:
"Use the apastra-eval skill to run the summarize-smoke suite"Set a baseline
Ask your IDE agent:Now every future eval automatically compares against this baseline. If you change the prompt and quality drops, the agent tells you:
“Use the apastra-baseline skill to set the current results as the baseline”Your agent reads the most recent run’s scorecard and writes it to
derived-index/baselines/summarize-smoke.json:Only set a baseline from a passing run. The baseline represents your “known good” quality level — baselining a failing run means future comparisons start from a low bar.
What just happened
Here is the full file structure you now have:apastra-validate any time to confirm all files are correctly formatted.
Next steps
Core concepts
Understand each building block — prompt specs, datasets, evaluators, suites, baselines, and the resolution chain
Writing evals
Learn to write test cases that catch real regressions — not just happy paths
Skills reference
Explore all available skills and what each one does
CI integration
Upgrade from local-first evaluation to automated GitHub Actions PR gating