Promptfoo Evaluation Pipeline
How a YAML config drives test cases through multiple LLM providers and assertion grading
Run Eval (Pass)
Run Eval (Fail)
Red Team Scan
Reset
prompts
test cases
call
call
call
output
output
output
# promptfooconfig.yaml
prompts:
- "Translate: {{input}}"
providers:
- openai:gpt-4o
- anthropic:claude
- ollama:llama3
tests:
- vars:
input:
"Hello world"
assert:
- type:
contains
value:
"Hola"
- type:
llm-rubric
value:
"Is accurate"
GPT-4o
openai provider
Claude
anthropic provider
Llama 3
ollama provider
LLM Response
cached locally
LLM Response
cached locally
LLM Response
cached locally
Assert
grade output
Assert
grade output
Assert
grade output
Results
Web UI
or CLI
CONFIG
PROVIDERS
INFERENCE
GRADING
OUTPUT
Interactive Diagram:
Hover over any component to learn more, or use the buttons to simulate an eval run, a failure, or a red team scan.