Skip to main content

Documentation Index

Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

AI teams face a problem that software engineering already solved decades ago: how do you ship changes with confidence? For code, the answer is version control, automated tests, and regression detection. For prompts, most teams are still using comments in a shared doc. Apastra brings software engineering discipline to AI prompts. It is a file-based PromptOps framework — prompts, test cases, scoring rules, and quality baselines are all files in your repo, versioned in Git, and tested automatically by your IDE agent.

The PromptOps problem

Prompts are not static strings. They are the core logic of AI features — and they break in ways that are easy to miss:
  • A wording change improves one use case while quietly degrading another
  • A model update from your provider changes output behavior without warning
  • A well-intentioned edit removes a constraint that was preventing bad outputs
  • No one knows which version of the prompt is actually running in production
Code teams solved this with tests, CI, and deployment pipelines. Prompt teams are still solving it by eyeballing outputs and hoping for the best. Apastra treats prompts as versioned software assets. Every prompt has a schema, every change is tested, every deployment is traceable.

Key principles

File-based. Prompts, datasets, evaluators, suites, baselines, and regression policies are all plain YAML and JSONL files. There is no hidden database, no required SaaS control plane, and no proprietary format. Files live in your repo, move with your code, and work with every tool in your existing workflow. Agent-as-harness. Your IDE agent — Claude, Cursor, Amp, Codex, and many more — is the evaluation harness. When you ask it to run an eval, it reads the protocol files and executes the workflow: renders prompts, calls the model, scores outputs, and reports results. No external runtime. No API keys to configure. Local-first. You can run full evaluations, set baselines, and catch regressions entirely on your machine — no CI required. When your team is ready for PR gating and automated regression detection, the apastra-setup-ci skill upgrades you to GitHub Actions without changing any file formats. Git-native. Because everything is files, you get diffing, history, blame, pull request review, and rollback for free. Prompt changes go through the same review process as code changes. Baselines and run artifacts are append-only records — nothing is mutated in place.

What you get

CapabilityHow it works
Prompt versioningYAML specs with stable IDs, variable schemas, and output contracts
Automated evalsYour IDE agent runs test suites and scores outputs
Regression detectionNew results are compared against known-good baselines
Schema validationJSON schemas ensure all files are correctly formatted
No infrastructureNo CI, no cloud, no hosted platform — just files and your agent

Who it’s for

Solo builders who want prompt unit tests and pinned prompt versions without adopting a platform. Run evaluations locally, catch regressions before they ship, and keep everything in your existing repo. Product engineers who need PR gating and regression detection as part of their normal development workflow. Apastra integrates with GitHub pull requests and required status checks — failing evals block merges. Platform teams responsible for shared prompt infrastructure across multiple apps or teams. Apastra’s file-based protocol supports reusable workflows, CODEOWNERS-based review, and standardized artifact formats that work across repos. Applied AI teams with rigorous evaluation requirements. Apastra supports dataset versioning, judge-based evaluation, multi-run variance tracking, and tiered suite structures (smoke → regression → release candidate).

How Apastra compares

Most tools in this space make a tradeoff between power and portability. Apastra takes a different position.
promptfoo is a capable CI-centric eval runner with good PR feedback loop support. It was acquired by OpenAI in March 2026, making it no longer vendor-neutral for teams using other models. It also does not define a complete system of record for prompt assets — results can be ephemeral unless you build append-only artifacts and promotion semantics around it. Apastra is designed from the start as a complete protocol with promotion lineage, baselines, and delivery semantics built in.
Platform prompt registries solve the “runtime hot swap” problem and make prompts accessible to non-engineers. The tradeoff is that the external platform becomes the source of truth — which weakens Git-based review, diff, and release lineage. Apastra keeps Git as the control plane. Platform observability tools can be integrated as optional sinks rather than replacing the workflow.
Eval frameworks as code libraries give you powerful custom metrics and programmatic control. The cost is coupling your team to a specific runtime and evaluation contract. Apastra defines a thin harness contract — any framework can be a harness adapter — so you are not locked into a single evaluation library.
Observability-first stacks excel at debugging traces and async execution. They solve “what happened” — but they do not inherently solve “pin what shipped.” Apastra handles the packaging, pinning, and promotion semantics that observability platforms leave to you. You can emit run artifacts to these platforms as an optional sink.
The core difference: Apastra is a protocol, not a platform. It defines durable state (files in Git), minimal contracts (prompt spec, dataset, evaluator, suite), and promotion semantics — without requiring you to adopt any particular runtime, framework, or hosted service.

Next steps

Quickstart

Install skills and run your first evaluation in 5 minutes

Core concepts

Understand prompt specs, datasets, evaluators, suites, and baselines