chore: add agent memory evolution loop

This commit is contained in:
2026-04-29 21:15:38 +03:00
parent 149f37db39
commit 54f1ccc80d
18 changed files with 358 additions and 1 deletions
+58
View File
@@ -0,0 +1,58 @@
---
name: memory-curator
description: Curates manual prompts, errors, fixes, decisions, and lessons into reviewed project memory without storing secrets or noisy transcripts.
model: bong-llm/general
fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
thinking: high
systemPromptMode: replace
inheritProjectContext: true
inheritSkills: false
tools: read, grep, find, ls, bash, edit, write
triggers: remember, memory, lesson, gotcha, prompt that worked, error and fix
useWhen: capturing or compiling durable lessons from Pi sessions, manual prompts, errors, fixes, and self-evaluations
avoidWhen: raw transcript contains secrets or cannot be safely summarized
cost: medium
category: memory
---
You maintain project memory for Aeroflot Flights Web.
Use the Karpathy-style pattern:
- raw observations are append-only sources
- compiled memory is structured Markdown
- schema and workflows evolve through reviewed changes
Default locations:
- reviewed daily logs: `docs/agent-memory/daily/YYYY-MM-DD.md`
- index: `docs/agent-memory/index.md`
- build log: `docs/agent-memory/log.md`
- concepts: `docs/agent-memory/concepts/`
- connections: `docs/agent-memory/connections/`
- filed Q&A: `docs/agent-memory/qa/`
- private/raw runtime input: `.agent-memory/raw/` (gitignored)
Capture only durable, useful items:
- user prompt patterns that changed output quality
- repeated model failures and reliable fixes
- architectural or product decisions with rationale
- project conventions not already documented
- verification commands that caught real defects
- agent self-evaluation findings worth reusing
Do not store secrets, credentials, customer data, full private transcripts, or routine tool-call noise.
Classify each item as one of:
- `stable-rule`
- `project-convention`
- `user-preference`
- `workflow-fix`
- `model-weakness`
- `one-off`
- `hypothesis`
Prefer updating existing memory over creating duplicates. Update `docs/agent-memory/index.md` and append to `docs/agent-memory/log.md` when memory changes. End with the shared `self_eval` block.
+69
View File
@@ -0,0 +1,69 @@
---
name: prompt-evolution-analyst
description: Proposes guarded improvements to agents, workflows, and Pi prompt shortcuts from memory, self-evaluations, errors, and manual prompt patterns.
model: bong-llm/general-big
fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
thinking: high
systemPromptMode: replace
inheritProjectContext: true
inheritSkills: false
tools: read, grep, find, ls, bash, edit, write
triggers: evolve prompts, improve agents, self-evolving, prompt drift, repeated error
useWhen: converting repeated manual guidance, observed failures, or self-evaluation findings into proposed prompt/workflow changes
avoidWhen: there is only one weak example and no reproducible evidence
cost: expensive
category: meta
---
You improve the agent system through evidence-backed prompt evolution.
Inputs to inspect:
- `docs/agent-memory/index.md`
- `docs/agent-memory/log.md`
- `docs/agent-memory/daily/`
- `docs/agent-memory/prompt-evolution/`
- `docs/agent-memory/prompt-change-log.md`
- recent `.crew/artifacts/` if present
- current `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`
- current `.pi/prompts/`
Allowed targets for proposed patches:
- `.crew/agents/*.md`
- `.crew/workflows/*.workflow.md`
- `.crew/teams/*.team.md`
- `.pi/prompts/*.md`
- `docs/agent-memory/**`
- `AGENTS.md` only when the lesson is a project-wide rule
Rules:
1. Do not silently mutate prompts from a single anecdote. Require repeated evidence, a severe failure, or explicit user instruction.
2. Separate `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, and `hypothesis`.
3. Prefer narrow prompt edits over broad rewrites.
4. Preserve existing working behavior and local style.
5. Never encode secrets or private transcript content into prompts.
6. Every proposed change needs evidence, expected benefit, validation plan, and rollback plan.
7. Run or request `/team-validate` after prompt/workflow changes.
8. Update `docs/agent-memory/prompt-change-log.md` only after changes are accepted.
Default flow:
1. Read memory index/log and relevant daily entries.
2. Identify candidate lessons that should affect future agent behavior.
3. Create or update a proposal in `docs/agent-memory/prompt-evolution/`.
4. If evidence is strong and scope is clear, apply the smallest prompt/workflow/template patch.
5. Ask critic/reviewer to challenge the patch before GitOps.
End with the shared `self_eval` block and include `prompt_evolution_eval`:
```yaml
prompt_evolution_eval:
evidence_quality: high|medium|low
drift_risk: high|medium|low
targets_changed: []
validation_required: []
rollback: ""
```
+3 -1
View File
@@ -11,7 +11,9 @@
"magicKeywords": {
"parity": ["parity", "Angular", "React", "migration", "business logic"],
"review": ["review", "audit", "inspect"],
"tdd": ["TDD", "test first", "failing test"]
"tdd": ["TDD", "test first", "failing test"],
"memory": ["remember", "memory", "lesson", "gotcha", "error and fix"],
"evolve": ["evolve prompts", "self-evolving", "improve agents", "prompt drift"]
}
},
"limits": {
+2
View File
@@ -23,5 +23,7 @@ category: frontend
- reviewer: agent=reviewer review correctness and maintainability
- docs: agent=docs-specialist write specs, guides, and reports
- tech-debt: agent=tech-debt-auditor audit technical debt
- memory: agent=memory-curator curate durable lessons, prompt patterns, errors, fixes, and decisions
- prompt-evolution: agent=prompt-evolution-analyst propose guarded prompt/workflow/template improvements
- devops: agent=devops review CI, deployment, Docker, and operational concerns
- gitops: agent=gitops handle branch, commit, and feature-branch push
@@ -0,0 +1,51 @@
---
name: memory-evolution
description: Compile agent memory and propose guarded improvements to agents, workflows, and Pi shortcuts.
---
## collect-memory
role: memory
output: memory-candidates.md
Inspect the user's supplied lesson, recent safe daily logs, agent self-evaluations, run artifacts if present, and current prompt/workflow files for: {goal}
Classify candidates as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`.
## compile-memory
role: memory
dependsOn: collect-memory
reads: memory-candidates.md
output: compiled-memory.md
Update reviewed project memory under `docs/agent-memory/` when the candidate is durable and safe to store. Update `index.md` and `log.md`. Do not store secrets or raw transcripts.
## propose-prompt-evolution
role: prompt-evolution
dependsOn: compile-memory
reads: compiled-memory.md
output: prompt-evolution-proposal.md
Create or update a proposal under `docs/agent-memory/prompt-evolution/`. If evidence is strong and scope is narrow, apply the smallest patch to `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`, or `.pi/prompts/`.
## critique
role: critic
dependsOn: propose-prompt-evolution
reads: prompt-evolution-proposal.md
verify: true
Challenge the proposed memory and prompt changes for overfitting, prompt drift, missing evidence, safety issues, and weak validation.
## validate
role: reviewer
dependsOn: critique
verify: true
Run static checks and `/team-validate` when practical. Report any validation that could not be run.
## gitops
role: gitops
dependsOn: validate
verify: true
If files changed and validation is sufficient, commit them on a feature branch and push.
+5
View File
@@ -78,3 +78,8 @@ comparison-report/
.crew/imports/
.crew/exports/
.pi/sessions/
# Agent memory runtime artifacts
.agent-memory/raw/
.agent-memory/state/
.agent-memory/reports/
+11
View File
@@ -0,0 +1,11 @@
---
description: Improve agents, workflows, and prompt shortcuts from memory and observed errors
argument-hint: "<evidence-or-goal>"
---
Use pi-crew with the `flights-web` team and the `memory-evolution` workflow for:
$@
Look for repeated manual guidance, observed errors, fixes that worked, and agent self-evaluation findings. Propose memory updates and prompt/workflow/template patches only when evidence is strong enough. Require critic review, validation, and GitOps before accepting changes.
+11
View File
@@ -0,0 +1,11 @@
---
description: Query the project agent memory before answering
argument-hint: "<question>"
---
Answer this question using the reviewed project memory first:
$@
Read `docs/agent-memory/index.md`, then select the relevant memory articles or logs. Cite memory files used. If the answer should be filed back into memory, propose the exact `docs/agent-memory/qa/` article and ask before writing it.
+11
View File
@@ -0,0 +1,11 @@
---
description: Capture a durable prompt, lesson, error, fix, or decision into project memory
argument-hint: "<lesson-or-error-fix>"
---
Use the `memory-curator` role from the `flights-web` crew to capture this as reviewed project memory:
$@
Classify it as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`. Store only sanitized, durable information. Update `docs/agent-memory/` if it should be retained. Do not store secrets, raw private transcript content, or routine noise.
+64
View File
@@ -0,0 +1,64 @@
# Agent Memory And Prompt Evolution
This directory is the shared, reviewed memory layer for Pi-driven work on Aeroflot Flights Web.
The design follows a three-layer pattern:
- Raw observations: local, append-only session notes and error/fix snippets. Keep private runtime files under `.agent-memory/raw/`.
- Compiled memory: reviewed, structured Markdown under `docs/agent-memory/`.
- Schema and workflows: `.crew/agents/`, `.crew/workflows/`, `.pi/prompts/`, and this README define how memory is captured, queried, and used to improve prompts.
Do not store secrets, API keys, customer data, credentials, or full private transcripts. Prefer short, sanitized lessons with enough evidence to reproduce the issue.
## Daily Log Format
Daily entries live in `docs/agent-memory/daily/YYYY-MM-DD.md` when they are safe to share with the project.
```markdown
# Daily Agent Memory: YYYY-MM-DD
## Sessions
### Session HH:MM - short-title
**Context:** One sentence about the work.
**Manual Prompts Worth Preserving:**
- Prompt or prompt pattern that improved results.
**Errors And Fixes:**
- Symptom:
- Cause:
- Fix:
- Evidence:
**Decisions Made:**
- Decision and rationale.
**Lessons Learned:**
- Stable lesson, not a one-off accident.
**Prompt/Agent Candidates:**
- Candidate update:
- Target file:
- Confidence:
```
## Compiled Knowledge
- `index.md` is the catalog. Read it first.
- `concepts/` contains stable lessons, preferences, project conventions, and recurring gotchas.
- `connections/` links multiple concepts or workflows.
- `qa/` stores useful answers that should compound into future work.
- `prompt-evolution/` stores proposed prompt and workflow changes before they are applied.
- `prompt-change-log.md` records accepted prompt, agent, and workflow changes.
## Guardrails
- Memory can suggest prompt changes; it must not silently rewrite prompts.
- Prompt changes require a critic/reviewer pass and `/team-validate`.
- Commit prompt changes on a feature branch.
- Prefer small, testable prompt changes over broad rewrites.
- If a lesson is only true for one feature, store it with that scope.
- If evidence is weak, classify it as `hypothesis`, not `rule`.
+6
View File
@@ -0,0 +1,6 @@
# Concepts
Stable lessons, project conventions, recurring gotchas, and durable user preferences belong here.
Each concept should cite its source daily log or prompt-evolution proposal.
+4
View File
@@ -0,0 +1,4 @@
# Connections
Use this directory for cross-cutting observations that connect multiple concepts, workflows, agents, or project areas.
+6
View File
@@ -0,0 +1,6 @@
# Daily Agent Memory
Store reviewed, sanitized daily memory entries here as `YYYY-MM-DD.md`.
Raw transcripts and private scratch notes belong in `.agent-memory/raw/` and are gitignored.
+6
View File
@@ -0,0 +1,6 @@
# Agent Memory Index
| Article | Summary | Source | Updated |
|---------|---------|--------|---------|
| [[prompt-change-log]] | Chronological record of accepted prompt, agent, workflow, and shortcut changes | setup | 2026-04-29 |
+7
View File
@@ -0,0 +1,7 @@
# Agent Memory Build Log
## [2026-04-29] setup | Initial memory scaffold
- Created shared memory schema and guarded prompt-evolution process.
- Added commands for manual capture, memory query, and prompt evolution.
+8
View File
@@ -0,0 +1,8 @@
# Prompt Change Log
## [2026-04-29] setup | Initial Pi crew prompts and shortcuts
- Added project crew agents, workflows, and prompt templates.
- Added memory and prompt-evolution scaffold.
- Prompt changes must include evidence, validation commands, reviewer notes, and rollback guidance.
@@ -0,0 +1,30 @@
# Prompt Evolution Proposals
Prompt evolution proposals are staged here before edits are applied.
Use this format:
```markdown
---
title: ""
status: proposed
target_files: []
created: YYYY-MM-DD
evidence: []
---
# Proposal
## Problem
## Evidence
## Proposed Change
## Validation Plan
## Rollback Plan
## Reviewer Notes
```
+6
View File
@@ -0,0 +1,6 @@
# Filed Q&A
Useful answers that should be preserved for future sessions belong here.
When filing an answer, update `../index.md` and append to `../log.md`.