Skip to main content

Documentation Index

Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Every apastra project follows a predictable directory layout. Understanding where files live helps your agent find them and helps you know where to look when something goes wrong.

Full directory tree

promptops/
├── prompts/          # Prompt specs (YAML)
├── datasets/         # Test cases (JSONL)
├── evaluators/       # Scoring rules (YAML)
├── suites/           # Test configurations (YAML)
├── schemas/          # JSON schemas
├── validators/       # Shell scripts for schema validation
├── policies/         # Regression policies
├── harnesses/        # Harness adapter specs
├── resolver/         # Prompt resolution chain
├── runtime/          # Digest computation, resolution runtime
├── runs/             # Run artifacts, scorecard normalizer
├── manifests/        # Consumption manifests
└── delivery/         # Delivery targets
derived-index/
├── baselines/        # Known-good scorecards
└── regressions/      # Regression reports

Directory reference

promptops/prompts/

Prompt spec YAML files. Each file defines a single prompt with a stable ID, variable schema, template, and optional output contract. File naming: <prompt-id>.yaml or <prompt-id>/prompt.yaml for multi-file prompts. Example:
id: summarize-v1
variables:
  text: { type: string }
template: "Summarize: {{text}}"
Validated by: prompt-spec.schema.json

promptops/datasets/

Test case files in JSONL format — one JSON object per line. Each line has a case_id, inputs, and optionally assert (inline assertions) or expected_outputs. File naming: <dataset-id>.jsonl Example:
{"case_id": "case-1", "inputs": {"text": "The fox jumped."}, "assert": [{"type": "contains", "value": "fox"}]}
{"case_id": "case-2", "inputs": {"text": "Hello world."}, "expected_outputs": {"should_contain": ["hello"]}}
Validated by: dataset-case.schema.json, dataset-manifest.schema.json

promptops/evaluators/

Evaluator YAML files describing how to score model outputs. Evaluator types include deterministic, schema, judge, and human. File naming: <evaluator-id>.yaml Example:
id: keyword-check
type: deterministic
metrics: [keyword_recall]
Validated by: evaluator.schema.json

promptops/suites/

Suite YAML files that tie together datasets, evaluators, and models into a runnable test configuration. File naming: <suite-id>.yaml Example:
id: smoke
name: Smoke Suite
datasets: [summarize-smoke]
evaluators: [keyword-check]
model_matrix: [default]
thresholds:
  keyword_recall: 0.6
Validated by: suite.schema.json

promptops/schemas/

56 JSON Schema files that validate every protocol file type in apastra. Your agent and CI both reference these schemas when validating prompts, datasets, evaluators, and run artifacts. See the full schema reference for the complete list.

promptops/validators/

Shell scripts that invoke ajv-cli to validate files against the schemas. Used by schema-validation.yml in CI. Key scripts:
  • validate-prompt-spec.sh — validates a prompt spec YAML
  • validate-dataset.sh — validates a dataset manifest + JSONL cases

promptops/policies/

Regression policy YAML files that define per-metric thresholds and severity levels used by the regression engine. File naming: regression.yaml (conventional default) Example:
baseline: "prod-current"
rules:
  - metric: keyword_recall
    floor: 0.5
    allowed_delta: 0.1
    direction: higher_is_better
    severity: blocker
Validated by: regression-policy.schema.json

promptops/harnesses/

Harness adapter spec files describing how to invoke an external eval harness. These are optional; your IDE agent acts as the default harness. Validated by: harness-adapter.schema.json

promptops/resolver/

Python implementation of the four-level prompt resolution chain. See the resolver reference for a full walkthrough. Files:
  • chain.py — the ResolverChain class that orchestrates resolution
  • local.py — local path override resolver
  • workspace.py — same-repo workspace resolver
  • git_ref.py — git tag / commit SHA resolver
  • packaged.py — packaged artifact resolver (OCI, npm, PyPI, GitHub Release)

promptops/runtime/

Digest computation utilities and the resolution runtime. Implements the content digest convention: canonicalize JSON/YAML → SHA-256 → sha256:<hex>.

promptops/runs/

Run artifact output directory. After each eval, the agent or harness writes:
promptops/runs/<run-id>/
├── run_manifest.json   # Resolved digests, model IDs, harness, status
├── scorecard.json      # Normalized metrics + metric definitions
└── cases.jsonl         # Per-case results
Run IDs use the pattern <suite-id>-<YYYY-MM-DD-HHmmss>.

promptops/manifests/

Consumption manifest files. The default file is consumption.yaml, which declares which prompt versions your app pins. Example:
version: "1.0"
prompts:
  summarize-v1:
    id: summarize-v1
    pin: "abc123"
Validated by: consumption-manifest.schema.json

promptops/delivery/

Delivery target spec files. Each file declares a downstream sync target (e.g., a GitHub repo to receive a PR when a new version is promoted). Validated by: delivery-target.schema.json

derived-index/baselines/

Known-good scorecards. After a passing eval run, the baseline skill saves the scorecard here. Future evals compare against this file to detect regressions. File naming: <suite-id>.json Validated by: baseline.schema.json

derived-index/regressions/

Regression reports produced by the regression engine. Each report compares a candidate scorecard against a baseline and produces a pass or fail result with per-metric evidence. Validated by: regression-report.schema.json

Quick eval files

For single-file evaluations, place files under promptops/evals/:
promptops/evals/
└── summarize-quick.yaml   # Prompt + cases + assertions in one file
The workspace resolver checks promptops/evals/ automatically. See assertion types for the full list of available assertions.

Repo topology options

Apastra supports three repo shapes without changing the conceptual model.
Prompts live inside the app repo under promptops/. This is the simplest starting point — one PR can update code and prompts together.
.
├── promptops/
│   ├── prompts/
│   ├── datasets/
│   ├── evaluators/
│   ├── suites/
│   ├── harnesses/
│   ├── policies/
│   ├── delivery/
│   └── manifests/
│       └── consumption.yaml
├── derived-index/
│   ├── baselines/
│   └── promotions/
└── .github/workflows/
Best for: most teams starting out; single product repo.

The derived-index/ directory

derived-index/ is intentionally separate from promptops/. It stores derived artifacts — outputs computed from source files, not source files themselves:
  • Baselines — scorecards saved after a passing eval run
  • Promotions — promotion records binding approved versions to channels
  • Regressions — regression reports comparing candidate vs baseline
Keeping derived artifacts out of promptops/ prevents confusion about what is authoritative. Source files in promptops/ are the source of truth; files in derived-index/ are computed results.
Never manually edit files in derived-index/. They are always written by the agent (baselines), the regression engine (reports), or the promotion workflow (records).

The artifacts branch pattern

For CI pipelines, run artifacts (scorecards, manifests, reports, promotion records) are stored on a separate Git branch called promptops-artifacts. This keeps the main branch clean and avoids merge conflicts from concurrent CI runs.
# branch: promptops-artifacts
artifacts/
  runs/YYYY/MM/DD/<run_id>/
    run_manifest.json
    scorecard.json
    cases.jsonl
    artifact_refs.json
  reports/YYYY/MM/DD/<report_id>/
    regression_report.json
  promotions/YYYY/MM/DD/<promotion_id>/
    promotion_record.json
The regression-gate.yml and promote.yml workflows both fetch from origin/promptops-artifacts to read and write these records.
Store only small index files and JSON records on the artifacts branch. Keep large raw outputs (transcripts, traces) in an external artifact backend referenced by digest. GitHub Actions artifacts have 90-day retention by default and should not be treated as long-term storage.