Skip to main content

Documentation Index

Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Apastra ships 56 JSON schemas that validate every file type in the protocol. Schemas ensure your prompts, datasets, evaluators, and run artifacts are machine-readable by any agent or harness — not just your own. All schemas use JSON Schema draft 2020-12.
Your agent validates files against these schemas when you run the apastra-validate skill. The schema-validation.yml CI workflow also validates changed files on every pull request.

Schema categories

Schema fileWhat it validates
prompt-spec.schema.jsonPrompt template + variable schema + output contract
dataset-manifest.schema.jsonDataset identity, version, digest, provenance
dataset-case.schema.jsonA single JSONL test case with inputs and inline assertions
evaluator.schema.jsonScoring rules (deterministic, schema, judge, human)
suite.schema.jsonBenchmark suite: datasets, evaluators, model matrix, thresholds
quick-eval.schema.jsonSingle-file eval combining prompt, cases, and assertions

Core schema field reference

prompt-spec.schema.json

Source-of-truth prompt definition. Required in every apastra project.
id
string
required
Stable identifier for the prompt. Use a namespaced slug such as my-app/summarize-v1. Renaming breaks consumption manifest pins.
variables
object
required
Map of variable names to JSON Schema type objects. Each key is a template placeholder; each value defines the type.
variables:
  text: { type: string }
  max_length: { type: string }
template
string | object | array
required
The prompt template. For completion models, use a string with {{variable}} placeholders. For chat models, use an array of message objects.
template: "Summarize: {{text}}"
output_contract
object
JSON Schema defining the expected output structure. Used by schema evaluators to validate model responses.
tool_contract
object
JSON Schema defining expected tool calling structure and available tools. Required if the prompt uses function calling.
metadata
object
Arbitrary key-value pairs such as author, intent, and tags.

dataset-manifest.schema.json

Declares a dataset’s identity and content digest for reproducibility.
id
string
required
Stable identifier for the dataset.
version
string
required
Semantic version or revision of the dataset. Treat dataset edits as new versions.
digest
string
required
SHA-256 content digest of the .jsonl file. Format: sha256:<hex>. See digest convention.
schema_version
string
required
Version of the dataset-case schema used by the JSONL file.
provenance
object
Information about the origin of the dataset.

dataset-case.schema.json

A single test case — one line in a JSONL dataset file.
case_id
string
required
Stable identifier for the test case. Never change existing case_id values; add new cases instead.
inputs
object
required
Map of variable names to input values, matching the prompt spec’s variables schema.
assert
array
Array of inline assertion objects. Each object has type (string) and value (any). See assertion types for the full list.
expected_outputs
object
Expected output values for evaluator scoring (e.g., should_contain keyword lists).
metadata
object
Arbitrary metadata for the case (e.g., tags, difficulty, domain).

evaluator.schema.json

Scoring definition for a suite.
id
string
required
Stable identifier for the evaluator.
type
string
required
Evaluator type. One of: deterministic, schema, judge, human.
metrics
array
required
Array of metric names produced by this evaluator (e.g., ["keyword_recall"]). At least one is required.
config
object
Evaluator-type-specific configuration. For judge evaluators, this includes rubric and model details. For schema evaluators, this includes the target JSON Schema.
metric_versions
object
Mapping of metric names to their semantic versions. Increment when changing how a metric is computed to preserve historical comparability.
digest
string
SHA-256 hash of the evaluator content. Format: sha256:<64 hex chars>.

suite.schema.json

Benchmark suite configuration.
id
string
required
Stable identifier for the suite.
name
string
required
Human-readable name.
datasets
array
required
Array of dataset IDs (at least one). Each ID maps to a .jsonl file in promptops/datasets/.
evaluators
array
required
Array of evaluator IDs (at least one). Each ID maps to a .yaml file in promptops/evaluators/.
model_matrix
array
required
Array of model or provider identifiers to run the suite against. Use "default" to mean the agent’s own model.
tier
string
Execution tier. One of: smoke, regression, full, release-candidate. Default: smoke.
trials
integer
Number of times to run each case for variance measurement. Default: 1. Use 3+ for regression suites.
budgets
object
Cost and time limits. Supports cost_budget (dollars) and time (seconds).
thresholds
object
Pass/fail criteria. Keys are metric names; values are minimum acceptable scores.
thresholds:
  keyword_recall: 0.6
  pass_rate: 1.0

quick-eval.schema.json

Single-file evaluation format combining prompt, cases, and assertions.
id
string
required
Stable identifier for the quick eval.
prompt
string
required
The prompt template with {{variable}} placeholders.
cases
array
required
Array of test cases. Each case follows the dataset-case schema with id, inputs, and assert.
thresholds
object
Pass/fail thresholds, typically pass_rate: 1.0.

run-manifest.schema.json

Durable metadata record for a completed run. Written by the harness.
input_refs
object
required
References to input files (suite, prompt, dataset, evaluator IDs).
resolved_digests
object
required
Content digests of all resolved inputs at run time, enabling replay.
timestamps
object
required
Run start and end times.
harness_identifier
string
required
Identifier of the execution environment. Common values: claude-code, antigravity, cursor, copilot, api, github-actions, jules.
harness_version
string
required
Version of the harness. The same model in different harnesses can produce different outputs.
model_ids
array
required
Array of model identifiers used in the run.
sampling_config
object
required
Temperature, top-p, and other sampling parameters used.
environment
object
required
Environment metadata for reproduction attempts.
status
string
required
Run outcome. Typically pass or fail.
total_cost
number
Total cost of the run in dollars (input tokens × price + output tokens × price).
provenance
object
SLSA-style provenance metadata: builder.id, buildType, invocation, and metadata.

scorecard.schema.json

Normalized metrics summary for a run.
normalized_metrics
object
required
Mapping of metric names to their aggregated values (0–1 scale for most metrics).
{"keyword_recall": 0.85, "pass_rate": 1.0}
metric_definitions
object
required
Metadata for each metric. Each entry requires version and optionally includes description and direction.
{
  "keyword_recall": {
    "version": "1.0",
    "description": "Fraction of expected keywords found in output",
    "direction": "higher_is_better"
  }
}
variance
object
Variance data if trials > 1 was configured in the suite.
flake_rates
object
Mapping of metric names to their observed flake rates.

baseline.schema.json

Reference to a known-good scorecard for regression comparison.
baseline_id
string
required
Identifier for this baseline record.
run_digest
string
required
Content digest of the reference run’s scorecard.
created_at
string
required
ISO 8601 timestamp when the baseline was established.
description
string
Human-readable description (e.g., “post-v2-launch baseline”).

regression-policy.schema.json

Defines how candidate scorecards are compared against baselines.
baseline
string
required
Baseline reference rule (e.g., "prod current", "last-rc-passing-run").
rules
array
required
Array of per-metric rule objects. Each rule requires metric and severity.
rules:
  - metric: keyword_recall
    floor: 0.5
    allowed_delta: 0.1
    direction: higher_is_better
    severity: blocker
Rule fields:
rules[].metric
string
required
Metric name to evaluate.
rules[].severity
string
required
blocker — fails the check and blocks merge. warning — reported but does not block.
rules[].floor
number
Absolute minimum acceptable value for this metric.
rules[].allowed_delta
number
Maximum allowed drop from the baseline value.
rules[].direction
string
higher_is_better or lower_is_better. Controls which direction a delta is treated as a regression.

consumption-manifest.schema.json

App-side file declaring prompt pins and resolution overrides.
version
string
required
Version of the consumption manifest format.
prompts
object
required
Mapping of local prompt names to resolution configurations. Each entry requires id and optionally includes pin, override, and model.
prompts:
  summarize-v1:
    id: summarize-v1
    pin: "abc123"
prompts[].id
string
required
Stable prompt ID to resolve.
prompts[].pin
string
Git ref, commit SHA, semver range, or packaged artifact reference. See resolver for supported pin formats.
prompts[].override
string
Local file path overriding resolution. Used for local-linked development.
prompts[].model
string
Override the default model for this specific prompt.
defaults
object
Global fallbacks: model and provider.

promotion-record.schema.json

Append-only record binding an approved version to a delivery channel.
version
string
required
The approved version being promoted.
channel
string
required
Target channel (e.g., prod, staging, release).
digest
string
required
Content digest of the promoted version.
evidence
object
Links to supporting evidence (e.g., run_id of the release-candidate run).
timestamp
string
ISO 8601 timestamp of the promotion event.

delivery-target.schema.json

Declarative configuration for a downstream sync target.
type
string
required
Target type (e.g., github_pr, oci_registry).
repo
string
required
Target repository (for github_pr type).

Digest convention

All digest fields in apastra schemas use SHA-256 computed over canonicalized content.

Canonicalization rules

  1. Parse the JSON.
  2. Sort all keys alphabetically (recursively).
  3. Remove all insignificant whitespace.
  4. This is equivalent to: jq -cSM . <file>
  1. Parse the YAML into a JSON object.
  2. Apply the same canonicalization as JSON files.
  1. Parse each line as JSON.
  2. Canonicalize each line independently.
  3. Rejoin lines with a single \n between each.
  4. Hash the resulting string.

Digest format

sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
All digest fields that use the pattern validator require exactly sha256: followed by 64 lowercase hex characters.

Using schemas for validation

With the validate skill

Ask your agent:
Use the apastra-validate skill to validate all promptops files

With the CLI

npm install -g ajv-formats ajv-cli
ajv validate -s promptops/schemas/prompt-spec.schema.json -d promptops/prompts/summarize-v1.yaml

In CI

The schema-validation.yml workflow validates changed prompt and dataset files on every pull request automatically. See GitHub workflows reference for details.