Skip to main content

Documentation Index

Fetch the complete documentation index at: https://bintzgavin-apastra-14.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Installation

npx skills add BintzGavin/apastra/skills/scaffold

How to invoke

Ask your agent to create any combination of files:
“Use the apastra-scaffold skill to create a prompt spec, dataset, evaluator, and suite for summarizing text”
For a quick start without four separate files:
“Use the apastra-scaffold skill to create a quick eval for email classification”

What gets created

A full scaffold creates four files:
promptops/
├── prompts/summarize-v1.yaml        # Prompt template + variables
├── datasets/summarize-smoke.jsonl   # Test cases (5 examples)
├── evaluators/contains-keywords.yaml # Scoring rule
└── suites/summarize-smoke.yaml      # Test configuration
You can also ask for any individual piece: just a prompt spec, just a dataset, just an evaluator, or just a suite.

Prompt spec template

Your agent creates promptops/prompts/<id>.yaml:
id: <kebab-case-id>
variables:
  <var_name>:
    type: string
template: |
  <The actual prompt text with {{var_name}} placeholders>
output_contract:
  type: object
  properties:
    <output_field>:
      type: string
metadata:
  author: <user or team name>
  intent: <what this prompt does>
  tags:
    - <relevant-tags>
Rules for prompt specs:
  • id is required and must be unique — use kebab-case with a version suffix (for example, classify-email-v1)
  • variables is required — defines the input schema as a map of variable names to JSON Schema type objects
  • template is required — the prompt text with {{variable}} placeholders
  • output_contract is optional but recommended — defines expected output structure
  • Never rename an id; create a new version instead

Dataset template

Your agent creates promptops/datasets/<id>.jsonl — one JSON object per line:
{"case_id": "<unique-case-id>", "inputs": {"<var>": "<value>"}, "expected_outputs": {"<field>": "<expected>"}, "metadata": {"tags": ["<tag>"]}}
Rules for datasets:
  • Use .jsonl format (one JSON object per line, not a JSON array)
  • case_id is required and must be unique within the dataset
  • inputs is required — keys must match the prompt spec’s variables
  • expected_outputs is optional — used by evaluators for checking
  • Aim for 5–10 cases in a smoke dataset and 50+ in a regression dataset
  • Include edge cases: empty inputs, very long inputs, adversarial inputs

Evaluator templates

Your agent creates promptops/evaluators/<id>.yaml. Three evaluator types are available:
Rule-based checks — fastest to run, no model calls required:
id: keyword-check
type: deterministic
metrics:
  - keyword_recall
description: Checks if output contains expected keywords.
config:
  match_field: should_contain
  case_sensitive: false
Rules for evaluators:
  • id is required and must be unique
  • type is required — must be one of deterministic, schema, or judge
  • metrics is required — array of metric names this evaluator produces (minimum 1)
  • For judge evaluators: treat the rubric text as a versioned artifact — changing it changes what the metric means

Suite template

Your agent creates promptops/suites/<id>.yaml:
id: <suite-id>
name: <Human Readable Name>
description: <what this suite tests>
datasets:
  - <dataset-id>
evaluators:
  - <evaluator-id>
model_matrix:
  - default
trials: 1
thresholds:
  <metric>: <minimum-score>
Suite tiers — recommended usage:
TierWhen to runCasesTrials
SmokeEvery prompt edit5–101
RegressionBefore merging20–503
FullNightly or on-demand50+5
ReleaseBefore shipping100+5

Quick eval template

For rapid iteration, your agent can scaffold a single file instead of four: Your agent creates promptops/evals/<id>.yaml:
id: classify-email-quick
prompt: |
  Classify the following email into one of these categories: spam, support, sales, personal.
  Respond with JSON: {"category": "<category>", "confidence": <0-1>}

  Email: {{email}}
cases:
  - id: obvious-spam
    inputs:
      email: "CONGRATULATIONS! You've won $1,000,000! Click here NOW!"
    assert:
      - type: is-json
      - type: contains
        value: "spam"
  - id: support-request
    inputs:
      email: "Hi, I'm having trouble logging in. My password reset isn't working."
    assert:
      - type: is-json
      - type: contains-any
        value: ["support", "help"]
  - id: personal-email
    inputs:
      email: "Hey! Want to grab lunch on Friday?"
    assert:
      - type: is-json
      - type: contains
        value: "personal"
thresholds:
  pass_rate: 1.0
When to use quick eval vs. full suite:
Quick evalFull suite
1–5 test cases10+ cases
Simple inline assertionsReusable evaluator files
Rapid iteration on a new promptBaseline tracking and regression detection
No evaluator file neededMultiple evaluator types

Dataset with inline assertions

When you want per-case checks without a separate evaluator file, ask your agent to add assert arrays directly in the JSONL:
{"case_id": "case-1", "inputs": {"text": "Hello"}, "assert": [{"type": "contains", "value": "Bonjour"}, {"type": "not-contains", "value": "error"}]}
{"case_id": "case-2", "inputs": {"text": ""}, "assert": [{"type": "regex", "value": ".*"}]}
Inline assertions and evaluator files complement each other. Use inline assertions for per-case checks and evaluator files for suite-wide scoring rules.

Available assertion types

Deterministic: equals, contains, icontains, contains-any, contains-all, regex, starts-with, is-json, contains-json, is-valid-json-schema Model-assisted: similar, llm-rubric, factuality, answer-relevance Performance: latency, cost Negate any type with not- prefix — for example, not-contains, not-is-json.
After scaffolding, run the apastra-validate skill to catch any typos or formatting issues before your first eval.