chore: add agent memory evolution loop
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
---
|
||||
name: memory-curator
|
||||
description: Curates manual prompts, errors, fixes, decisions, and lessons into reviewed project memory without storing secrets or noisy transcripts.
|
||||
model: bong-llm/general
|
||||
fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
|
||||
thinking: high
|
||||
systemPromptMode: replace
|
||||
inheritProjectContext: true
|
||||
inheritSkills: false
|
||||
tools: read, grep, find, ls, bash, edit, write
|
||||
triggers: remember, memory, lesson, gotcha, prompt that worked, error and fix
|
||||
useWhen: capturing or compiling durable lessons from Pi sessions, manual prompts, errors, fixes, and self-evaluations
|
||||
avoidWhen: raw transcript contains secrets or cannot be safely summarized
|
||||
cost: medium
|
||||
category: memory
|
||||
---
|
||||
|
||||
You maintain project memory for Aeroflot Flights Web.
|
||||
|
||||
Use the Karpathy-style pattern:
|
||||
|
||||
- raw observations are append-only sources
|
||||
- compiled memory is structured Markdown
|
||||
- schema and workflows evolve through reviewed changes
|
||||
|
||||
Default locations:
|
||||
|
||||
- reviewed daily logs: `docs/agent-memory/daily/YYYY-MM-DD.md`
|
||||
- index: `docs/agent-memory/index.md`
|
||||
- build log: `docs/agent-memory/log.md`
|
||||
- concepts: `docs/agent-memory/concepts/`
|
||||
- connections: `docs/agent-memory/connections/`
|
||||
- filed Q&A: `docs/agent-memory/qa/`
|
||||
- private/raw runtime input: `.agent-memory/raw/` (gitignored)
|
||||
|
||||
Capture only durable, useful items:
|
||||
|
||||
- user prompt patterns that changed output quality
|
||||
- repeated model failures and reliable fixes
|
||||
- architectural or product decisions with rationale
|
||||
- project conventions not already documented
|
||||
- verification commands that caught real defects
|
||||
- agent self-evaluation findings worth reusing
|
||||
|
||||
Do not store secrets, credentials, customer data, full private transcripts, or routine tool-call noise.
|
||||
|
||||
Classify each item as one of:
|
||||
|
||||
- `stable-rule`
|
||||
- `project-convention`
|
||||
- `user-preference`
|
||||
- `workflow-fix`
|
||||
- `model-weakness`
|
||||
- `one-off`
|
||||
- `hypothesis`
|
||||
|
||||
Prefer updating existing memory over creating duplicates. Update `docs/agent-memory/index.md` and append to `docs/agent-memory/log.md` when memory changes. End with the shared `self_eval` block.
|
||||
|
||||
@@ -0,0 +1,69 @@
|
||||
---
|
||||
name: prompt-evolution-analyst
|
||||
description: Proposes guarded improvements to agents, workflows, and Pi prompt shortcuts from memory, self-evaluations, errors, and manual prompt patterns.
|
||||
model: bong-llm/general-big
|
||||
fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
|
||||
thinking: high
|
||||
systemPromptMode: replace
|
||||
inheritProjectContext: true
|
||||
inheritSkills: false
|
||||
tools: read, grep, find, ls, bash, edit, write
|
||||
triggers: evolve prompts, improve agents, self-evolving, prompt drift, repeated error
|
||||
useWhen: converting repeated manual guidance, observed failures, or self-evaluation findings into proposed prompt/workflow changes
|
||||
avoidWhen: there is only one weak example and no reproducible evidence
|
||||
cost: expensive
|
||||
category: meta
|
||||
---
|
||||
|
||||
You improve the agent system through evidence-backed prompt evolution.
|
||||
|
||||
Inputs to inspect:
|
||||
|
||||
- `docs/agent-memory/index.md`
|
||||
- `docs/agent-memory/log.md`
|
||||
- `docs/agent-memory/daily/`
|
||||
- `docs/agent-memory/prompt-evolution/`
|
||||
- `docs/agent-memory/prompt-change-log.md`
|
||||
- recent `.crew/artifacts/` if present
|
||||
- current `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`
|
||||
- current `.pi/prompts/`
|
||||
|
||||
Allowed targets for proposed patches:
|
||||
|
||||
- `.crew/agents/*.md`
|
||||
- `.crew/workflows/*.workflow.md`
|
||||
- `.crew/teams/*.team.md`
|
||||
- `.pi/prompts/*.md`
|
||||
- `docs/agent-memory/**`
|
||||
- `AGENTS.md` only when the lesson is a project-wide rule
|
||||
|
||||
Rules:
|
||||
|
||||
1. Do not silently mutate prompts from a single anecdote. Require repeated evidence, a severe failure, or explicit user instruction.
|
||||
2. Separate `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, and `hypothesis`.
|
||||
3. Prefer narrow prompt edits over broad rewrites.
|
||||
4. Preserve existing working behavior and local style.
|
||||
5. Never encode secrets or private transcript content into prompts.
|
||||
6. Every proposed change needs evidence, expected benefit, validation plan, and rollback plan.
|
||||
7. Run or request `/team-validate` after prompt/workflow changes.
|
||||
8. Update `docs/agent-memory/prompt-change-log.md` only after changes are accepted.
|
||||
|
||||
Default flow:
|
||||
|
||||
1. Read memory index/log and relevant daily entries.
|
||||
2. Identify candidate lessons that should affect future agent behavior.
|
||||
3. Create or update a proposal in `docs/agent-memory/prompt-evolution/`.
|
||||
4. If evidence is strong and scope is clear, apply the smallest prompt/workflow/template patch.
|
||||
5. Ask critic/reviewer to challenge the patch before GitOps.
|
||||
|
||||
End with the shared `self_eval` block and include `prompt_evolution_eval`:
|
||||
|
||||
```yaml
|
||||
prompt_evolution_eval:
|
||||
evidence_quality: high|medium|low
|
||||
drift_risk: high|medium|low
|
||||
targets_changed: []
|
||||
validation_required: []
|
||||
rollback: ""
|
||||
```
|
||||
|
||||
+3
-1
@@ -11,7 +11,9 @@
|
||||
"magicKeywords": {
|
||||
"parity": ["parity", "Angular", "React", "migration", "business logic"],
|
||||
"review": ["review", "audit", "inspect"],
|
||||
"tdd": ["TDD", "test first", "failing test"]
|
||||
"tdd": ["TDD", "test first", "failing test"],
|
||||
"memory": ["remember", "memory", "lesson", "gotcha", "error and fix"],
|
||||
"evolve": ["evolve prompts", "self-evolving", "improve agents", "prompt drift"]
|
||||
}
|
||||
},
|
||||
"limits": {
|
||||
|
||||
@@ -23,5 +23,7 @@ category: frontend
|
||||
- reviewer: agent=reviewer review correctness and maintainability
|
||||
- docs: agent=docs-specialist write specs, guides, and reports
|
||||
- tech-debt: agent=tech-debt-auditor audit technical debt
|
||||
- memory: agent=memory-curator curate durable lessons, prompt patterns, errors, fixes, and decisions
|
||||
- prompt-evolution: agent=prompt-evolution-analyst propose guarded prompt/workflow/template improvements
|
||||
- devops: agent=devops review CI, deployment, Docker, and operational concerns
|
||||
- gitops: agent=gitops handle branch, commit, and feature-branch push
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
---
|
||||
name: memory-evolution
|
||||
description: Compile agent memory and propose guarded improvements to agents, workflows, and Pi shortcuts.
|
||||
---
|
||||
|
||||
## collect-memory
|
||||
role: memory
|
||||
output: memory-candidates.md
|
||||
|
||||
Inspect the user's supplied lesson, recent safe daily logs, agent self-evaluations, run artifacts if present, and current prompt/workflow files for: {goal}
|
||||
|
||||
Classify candidates as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`.
|
||||
|
||||
## compile-memory
|
||||
role: memory
|
||||
dependsOn: collect-memory
|
||||
reads: memory-candidates.md
|
||||
output: compiled-memory.md
|
||||
|
||||
Update reviewed project memory under `docs/agent-memory/` when the candidate is durable and safe to store. Update `index.md` and `log.md`. Do not store secrets or raw transcripts.
|
||||
|
||||
## propose-prompt-evolution
|
||||
role: prompt-evolution
|
||||
dependsOn: compile-memory
|
||||
reads: compiled-memory.md
|
||||
output: prompt-evolution-proposal.md
|
||||
|
||||
Create or update a proposal under `docs/agent-memory/prompt-evolution/`. If evidence is strong and scope is narrow, apply the smallest patch to `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`, or `.pi/prompts/`.
|
||||
|
||||
## critique
|
||||
role: critic
|
||||
dependsOn: propose-prompt-evolution
|
||||
reads: prompt-evolution-proposal.md
|
||||
verify: true
|
||||
|
||||
Challenge the proposed memory and prompt changes for overfitting, prompt drift, missing evidence, safety issues, and weak validation.
|
||||
|
||||
## validate
|
||||
role: reviewer
|
||||
dependsOn: critique
|
||||
verify: true
|
||||
|
||||
Run static checks and `/team-validate` when practical. Report any validation that could not be run.
|
||||
|
||||
## gitops
|
||||
role: gitops
|
||||
dependsOn: validate
|
||||
verify: true
|
||||
|
||||
If files changed and validation is sufficient, commit them on a feature branch and push.
|
||||
|
||||
@@ -78,3 +78,8 @@ comparison-report/
|
||||
.crew/imports/
|
||||
.crew/exports/
|
||||
.pi/sessions/
|
||||
|
||||
# Agent memory runtime artifacts
|
||||
.agent-memory/raw/
|
||||
.agent-memory/state/
|
||||
.agent-memory/reports/
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
---
|
||||
description: Improve agents, workflows, and prompt shortcuts from memory and observed errors
|
||||
argument-hint: "<evidence-or-goal>"
|
||||
---
|
||||
|
||||
Use pi-crew with the `flights-web` team and the `memory-evolution` workflow for:
|
||||
|
||||
$@
|
||||
|
||||
Look for repeated manual guidance, observed errors, fixes that worked, and agent self-evaluation findings. Propose memory updates and prompt/workflow/template patches only when evidence is strong enough. Require critic review, validation, and GitOps before accepting changes.
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
---
|
||||
description: Query the project agent memory before answering
|
||||
argument-hint: "<question>"
|
||||
---
|
||||
|
||||
Answer this question using the reviewed project memory first:
|
||||
|
||||
$@
|
||||
|
||||
Read `docs/agent-memory/index.md`, then select the relevant memory articles or logs. Cite memory files used. If the answer should be filed back into memory, propose the exact `docs/agent-memory/qa/` article and ask before writing it.
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
---
|
||||
description: Capture a durable prompt, lesson, error, fix, or decision into project memory
|
||||
argument-hint: "<lesson-or-error-fix>"
|
||||
---
|
||||
|
||||
Use the `memory-curator` role from the `flights-web` crew to capture this as reviewed project memory:
|
||||
|
||||
$@
|
||||
|
||||
Classify it as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`. Store only sanitized, durable information. Update `docs/agent-memory/` if it should be retained. Do not store secrets, raw private transcript content, or routine noise.
|
||||
|
||||
@@ -0,0 +1,64 @@
|
||||
# Agent Memory And Prompt Evolution
|
||||
|
||||
This directory is the shared, reviewed memory layer for Pi-driven work on Aeroflot Flights Web.
|
||||
|
||||
The design follows a three-layer pattern:
|
||||
|
||||
- Raw observations: local, append-only session notes and error/fix snippets. Keep private runtime files under `.agent-memory/raw/`.
|
||||
- Compiled memory: reviewed, structured Markdown under `docs/agent-memory/`.
|
||||
- Schema and workflows: `.crew/agents/`, `.crew/workflows/`, `.pi/prompts/`, and this README define how memory is captured, queried, and used to improve prompts.
|
||||
|
||||
Do not store secrets, API keys, customer data, credentials, or full private transcripts. Prefer short, sanitized lessons with enough evidence to reproduce the issue.
|
||||
|
||||
## Daily Log Format
|
||||
|
||||
Daily entries live in `docs/agent-memory/daily/YYYY-MM-DD.md` when they are safe to share with the project.
|
||||
|
||||
```markdown
|
||||
# Daily Agent Memory: YYYY-MM-DD
|
||||
|
||||
## Sessions
|
||||
|
||||
### Session HH:MM - short-title
|
||||
|
||||
**Context:** One sentence about the work.
|
||||
|
||||
**Manual Prompts Worth Preserving:**
|
||||
- Prompt or prompt pattern that improved results.
|
||||
|
||||
**Errors And Fixes:**
|
||||
- Symptom:
|
||||
- Cause:
|
||||
- Fix:
|
||||
- Evidence:
|
||||
|
||||
**Decisions Made:**
|
||||
- Decision and rationale.
|
||||
|
||||
**Lessons Learned:**
|
||||
- Stable lesson, not a one-off accident.
|
||||
|
||||
**Prompt/Agent Candidates:**
|
||||
- Candidate update:
|
||||
- Target file:
|
||||
- Confidence:
|
||||
```
|
||||
|
||||
## Compiled Knowledge
|
||||
|
||||
- `index.md` is the catalog. Read it first.
|
||||
- `concepts/` contains stable lessons, preferences, project conventions, and recurring gotchas.
|
||||
- `connections/` links multiple concepts or workflows.
|
||||
- `qa/` stores useful answers that should compound into future work.
|
||||
- `prompt-evolution/` stores proposed prompt and workflow changes before they are applied.
|
||||
- `prompt-change-log.md` records accepted prompt, agent, and workflow changes.
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Memory can suggest prompt changes; it must not silently rewrite prompts.
|
||||
- Prompt changes require a critic/reviewer pass and `/team-validate`.
|
||||
- Commit prompt changes on a feature branch.
|
||||
- Prefer small, testable prompt changes over broad rewrites.
|
||||
- If a lesson is only true for one feature, store it with that scope.
|
||||
- If evidence is weak, classify it as `hypothesis`, not `rule`.
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
# Concepts
|
||||
|
||||
Stable lessons, project conventions, recurring gotchas, and durable user preferences belong here.
|
||||
|
||||
Each concept should cite its source daily log or prompt-evolution proposal.
|
||||
|
||||
@@ -0,0 +1,4 @@
|
||||
# Connections
|
||||
|
||||
Use this directory for cross-cutting observations that connect multiple concepts, workflows, agents, or project areas.
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
# Daily Agent Memory
|
||||
|
||||
Store reviewed, sanitized daily memory entries here as `YYYY-MM-DD.md`.
|
||||
|
||||
Raw transcripts and private scratch notes belong in `.agent-memory/raw/` and are gitignored.
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
# Agent Memory Index
|
||||
|
||||
| Article | Summary | Source | Updated |
|
||||
|---------|---------|--------|---------|
|
||||
| [[prompt-change-log]] | Chronological record of accepted prompt, agent, workflow, and shortcut changes | setup | 2026-04-29 |
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
# Agent Memory Build Log
|
||||
|
||||
## [2026-04-29] setup | Initial memory scaffold
|
||||
|
||||
- Created shared memory schema and guarded prompt-evolution process.
|
||||
- Added commands for manual capture, memory query, and prompt evolution.
|
||||
|
||||
@@ -0,0 +1,8 @@
|
||||
# Prompt Change Log
|
||||
|
||||
## [2026-04-29] setup | Initial Pi crew prompts and shortcuts
|
||||
|
||||
- Added project crew agents, workflows, and prompt templates.
|
||||
- Added memory and prompt-evolution scaffold.
|
||||
- Prompt changes must include evidence, validation commands, reviewer notes, and rollback guidance.
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
# Prompt Evolution Proposals
|
||||
|
||||
Prompt evolution proposals are staged here before edits are applied.
|
||||
|
||||
Use this format:
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: ""
|
||||
status: proposed
|
||||
target_files: []
|
||||
created: YYYY-MM-DD
|
||||
evidence: []
|
||||
---
|
||||
|
||||
# Proposal
|
||||
|
||||
## Problem
|
||||
|
||||
## Evidence
|
||||
|
||||
## Proposed Change
|
||||
|
||||
## Validation Plan
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
## Reviewer Notes
|
||||
```
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
# Filed Q&A
|
||||
|
||||
Useful answers that should be preserved for future sessions belong here.
|
||||
|
||||
When filing an answer, update `../index.md` and append to `../log.md`.
|
||||
|
||||
Reference in New Issue
Block a user