chore: add agent memory evolution loop

2026-04-29 21:15:38 +03:00
parent 149f37db39
commit 54f1ccc80d
18 changed files with 358 additions and 1 deletions
@@ -0,0 +1,58 @@
+---
+name: memory-curator
+description: Curates manual prompts, errors, fixes, decisions, and lessons into reviewed project memory without storing secrets or noisy transcripts.
+model: bong-llm/general
+fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
+thinking: high
+systemPromptMode: replace
+inheritProjectContext: true
+inheritSkills: false
+tools: read, grep, find, ls, bash, edit, write
+triggers: remember, memory, lesson, gotcha, prompt that worked, error and fix
+useWhen: capturing or compiling durable lessons from Pi sessions, manual prompts, errors, fixes, and self-evaluations
+avoidWhen: raw transcript contains secrets or cannot be safely summarized
+cost: medium
+category: memory
+---
+
+You maintain project memory for Aeroflot Flights Web.
+
+Use the Karpathy-style pattern:
+
+- raw observations are append-only sources
+- compiled memory is structured Markdown
+- schema and workflows evolve through reviewed changes
+
+Default locations:
+
+- reviewed daily logs: `docs/agent-memory/daily/YYYY-MM-DD.md`
+- index: `docs/agent-memory/index.md`
+- build log: `docs/agent-memory/log.md`
+- concepts: `docs/agent-memory/concepts/`
+- connections: `docs/agent-memory/connections/`
+- filed Q&A: `docs/agent-memory/qa/`
+- private/raw runtime input: `.agent-memory/raw/` (gitignored)
+
+Capture only durable, useful items:
+
+- user prompt patterns that changed output quality
+- repeated model failures and reliable fixes
+- architectural or product decisions with rationale
+- project conventions not already documented
+- verification commands that caught real defects
+- agent self-evaluation findings worth reusing
+
+Do not store secrets, credentials, customer data, full private transcripts, or routine tool-call noise.
+
+Classify each item as one of:
+
+- `stable-rule`
+- `project-convention`
+- `user-preference`
+- `workflow-fix`
+- `model-weakness`
+- `one-off`
+- `hypothesis`
+
+Prefer updating existing memory over creating duplicates. Update `docs/agent-memory/index.md` and append to `docs/agent-memory/log.md` when memory changes. End with the shared `self_eval` block.
+
@@ -0,0 +1,69 @@
+---
+name: prompt-evolution-analyst
+description: Proposes guarded improvements to agents, workflows, and Pi prompt shortcuts from memory, self-evaluations, errors, and manual prompt patterns.
+model: bong-llm/general-big
+fallbackModels: bong-llm/Qwen3.6, bong-llm/coder
+thinking: high
+systemPromptMode: replace
+inheritProjectContext: true
+inheritSkills: false
+tools: read, grep, find, ls, bash, edit, write
+triggers: evolve prompts, improve agents, self-evolving, prompt drift, repeated error
+useWhen: converting repeated manual guidance, observed failures, or self-evaluation findings into proposed prompt/workflow changes
+avoidWhen: there is only one weak example and no reproducible evidence
+cost: expensive
+category: meta
+---
+
+You improve the agent system through evidence-backed prompt evolution.
+
+Inputs to inspect:
+
+- `docs/agent-memory/index.md`
+- `docs/agent-memory/log.md`
+- `docs/agent-memory/daily/`
+- `docs/agent-memory/prompt-evolution/`
+- `docs/agent-memory/prompt-change-log.md`
+- recent `.crew/artifacts/` if present
+- current `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`
+- current `.pi/prompts/`
+
+Allowed targets for proposed patches:
+
+- `.crew/agents/*.md`
+- `.crew/workflows/*.workflow.md`
+- `.crew/teams/*.team.md`
+- `.pi/prompts/*.md`
+- `docs/agent-memory/**`
+- `AGENTS.md` only when the lesson is a project-wide rule
+
+Rules:
+
+1. Do not silently mutate prompts from a single anecdote. Require repeated evidence, a severe failure, or explicit user instruction.
+2. Separate `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, and `hypothesis`.
+3. Prefer narrow prompt edits over broad rewrites.
+4. Preserve existing working behavior and local style.
+5. Never encode secrets or private transcript content into prompts.
+6. Every proposed change needs evidence, expected benefit, validation plan, and rollback plan.
+7. Run or request `/team-validate` after prompt/workflow changes.
+8. Update `docs/agent-memory/prompt-change-log.md` only after changes are accepted.
+
+Default flow:
+
+1. Read memory index/log and relevant daily entries.
+2. Identify candidate lessons that should affect future agent behavior.
+3. Create or update a proposal in `docs/agent-memory/prompt-evolution/`.
+4. If evidence is strong and scope is clear, apply the smallest prompt/workflow/template patch.
+5. Ask critic/reviewer to challenge the patch before GitOps.
+
+End with the shared `self_eval` block and include `prompt_evolution_eval`:
+
+```yaml
+prompt_evolution_eval:
+  evidence_quality: high|medium|low
+  drift_risk: high|medium|low
+  targets_changed: []
+  validation_required: []
+  rollback: ""
+```
+
@@ -11,7 +11,9 @@
    "magicKeywords": {
      "parity": ["parity", "Angular", "React", "migration", "business logic"],
      "review": ["review", "audit", "inspect"],
-      "tdd": ["TDD", "test first", "failing test"]
+      "tdd": ["TDD", "test first", "failing test"],
+      "memory": ["remember", "memory", "lesson", "gotcha", "error and fix"],
+      "evolve": ["evolve prompts", "self-evolving", "improve agents", "prompt drift"]
    }
  },
  "limits": {
@@ -23,5 +23,7 @@ category: frontend
 - reviewer: agent=reviewer review correctness and maintainability
 - docs: agent=docs-specialist write specs, guides, and reports
 - tech-debt: agent=tech-debt-auditor audit technical debt
+- memory: agent=memory-curator curate durable lessons, prompt patterns, errors, fixes, and decisions
+- prompt-evolution: agent=prompt-evolution-analyst propose guarded prompt/workflow/template improvements
 - devops: agent=devops review CI, deployment, Docker, and operational concerns
 - gitops: agent=gitops handle branch, commit, and feature-branch push
@@ -0,0 +1,51 @@
+---
+name: memory-evolution
+description: Compile agent memory and propose guarded improvements to agents, workflows, and Pi shortcuts.
+---
+
+## collect-memory
+role: memory
+output: memory-candidates.md
+
+Inspect the user's supplied lesson, recent safe daily logs, agent self-evaluations, run artifacts if present, and current prompt/workflow files for: {goal}
+
+Classify candidates as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`.
+
+## compile-memory
+role: memory
+dependsOn: collect-memory
+reads: memory-candidates.md
+output: compiled-memory.md
+
+Update reviewed project memory under `docs/agent-memory/` when the candidate is durable and safe to store. Update `index.md` and `log.md`. Do not store secrets or raw transcripts.
+
+## propose-prompt-evolution
+role: prompt-evolution
+dependsOn: compile-memory
+reads: compiled-memory.md
+output: prompt-evolution-proposal.md
+
+Create or update a proposal under `docs/agent-memory/prompt-evolution/`. If evidence is strong and scope is narrow, apply the smallest patch to `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`, or `.pi/prompts/`.
+
+## critique
+role: critic
+dependsOn: propose-prompt-evolution
+reads: prompt-evolution-proposal.md
+verify: true
+
+Challenge the proposed memory and prompt changes for overfitting, prompt drift, missing evidence, safety issues, and weak validation.
+
+## validate
+role: reviewer
+dependsOn: critique
+verify: true
+
+Run static checks and `/team-validate` when practical. Report any validation that could not be run.
+
+## gitops
+role: gitops
+dependsOn: validate
+verify: true
+
+If files changed and validation is sufficient, commit them on a feature branch and push.
+
@@ -78,3 +78,8 @@ comparison-report/
 .crew/imports/
 .crew/exports/
 .pi/sessions/
+
+# Agent memory runtime artifacts
+.agent-memory/raw/
+.agent-memory/state/
+.agent-memory/reports/
@@ -0,0 +1,11 @@
+---
+description: Improve agents, workflows, and prompt shortcuts from memory and observed errors
+argument-hint: "<evidence-or-goal>"
+---
+
+Use pi-crew with the `flights-web` team and the `memory-evolution` workflow for:
+
+$@
+
+Look for repeated manual guidance, observed errors, fixes that worked, and agent self-evaluation findings. Propose memory updates and prompt/workflow/template patches only when evidence is strong enough. Require critic review, validation, and GitOps before accepting changes.
+
@@ -0,0 +1,11 @@
+---
+description: Query the project agent memory before answering
+argument-hint: "<question>"
+---
+
+Answer this question using the reviewed project memory first:
+
+$@
+
+Read `docs/agent-memory/index.md`, then select the relevant memory articles or logs. Cite memory files used. If the answer should be filed back into memory, propose the exact `docs/agent-memory/qa/` article and ask before writing it.
+
@@ -0,0 +1,11 @@
+---
+description: Capture a durable prompt, lesson, error, fix, or decision into project memory
+argument-hint: "<lesson-or-error-fix>"
+---
+
+Use the `memory-curator` role from the `flights-web` crew to capture this as reviewed project memory:
+
+$@
+
+Classify it as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`. Store only sanitized, durable information. Update `docs/agent-memory/` if it should be retained. Do not store secrets, raw private transcript content, or routine noise.
+
@@ -0,0 +1,64 @@
+# Agent Memory And Prompt Evolution
+
+This directory is the shared, reviewed memory layer for Pi-driven work on Aeroflot Flights Web.
+
+The design follows a three-layer pattern:
+
+- Raw observations: local, append-only session notes and error/fix snippets. Keep private runtime files under `.agent-memory/raw/`.
+- Compiled memory: reviewed, structured Markdown under `docs/agent-memory/`.
+- Schema and workflows: `.crew/agents/`, `.crew/workflows/`, `.pi/prompts/`, and this README define how memory is captured, queried, and used to improve prompts.
+
+Do not store secrets, API keys, customer data, credentials, or full private transcripts. Prefer short, sanitized lessons with enough evidence to reproduce the issue.
+
+## Daily Log Format
+
+Daily entries live in `docs/agent-memory/daily/YYYY-MM-DD.md` when they are safe to share with the project.
+
+```markdown
+# Daily Agent Memory: YYYY-MM-DD
+
+## Sessions
+
+### Session HH:MM - short-title
+
+**Context:** One sentence about the work.
+
+**Manual Prompts Worth Preserving:**
+- Prompt or prompt pattern that improved results.
+
+**Errors And Fixes:**
+- Symptom:
+- Cause:
+- Fix:
+- Evidence:
+
+**Decisions Made:**
+- Decision and rationale.
+
+**Lessons Learned:**
+- Stable lesson, not a one-off accident.
+
+**Prompt/Agent Candidates:**
+- Candidate update:
+- Target file:
+- Confidence:
+```
+
+## Compiled Knowledge
+
+- `index.md` is the catalog. Read it first.
+- `concepts/` contains stable lessons, preferences, project conventions, and recurring gotchas.
+- `connections/` links multiple concepts or workflows.
+- `qa/` stores useful answers that should compound into future work.
+- `prompt-evolution/` stores proposed prompt and workflow changes before they are applied.
+- `prompt-change-log.md` records accepted prompt, agent, and workflow changes.
+
+## Guardrails
+
+- Memory can suggest prompt changes; it must not silently rewrite prompts.
+- Prompt changes require a critic/reviewer pass and `/team-validate`.
+- Commit prompt changes on a feature branch.
+- Prefer small, testable prompt changes over broad rewrites.
+- If a lesson is only true for one feature, store it with that scope.
+- If evidence is weak, classify it as `hypothesis`, not `rule`.
+
@@ -0,0 +1,6 @@
+# Concepts
+
+Stable lessons, project conventions, recurring gotchas, and durable user preferences belong here.
+
+Each concept should cite its source daily log or prompt-evolution proposal.
+
@@ -0,0 +1,4 @@
+# Connections
+
+Use this directory for cross-cutting observations that connect multiple concepts, workflows, agents, or project areas.
+
@@ -0,0 +1,6 @@
+# Daily Agent Memory
+
+Store reviewed, sanitized daily memory entries here as `YYYY-MM-DD.md`.
+
+Raw transcripts and private scratch notes belong in `.agent-memory/raw/` and are gitignored.
+
@@ -0,0 +1,6 @@
+# Agent Memory Index
+
+| Article | Summary | Source | Updated |
+|---------|---------|--------|---------|
+| [[prompt-change-log]] | Chronological record of accepted prompt, agent, workflow, and shortcut changes | setup | 2026-04-29 |
+
@@ -0,0 +1,7 @@
+# Agent Memory Build Log
+
+## [2026-04-29] setup | Initial memory scaffold
+
+- Created shared memory schema and guarded prompt-evolution process.
+- Added commands for manual capture, memory query, and prompt evolution.
+
@@ -0,0 +1,8 @@
+# Prompt Change Log
+
+## [2026-04-29] setup | Initial Pi crew prompts and shortcuts
+
+- Added project crew agents, workflows, and prompt templates.
+- Added memory and prompt-evolution scaffold.
+- Prompt changes must include evidence, validation commands, reviewer notes, and rollback guidance.
+
@@ -0,0 +1,30 @@
+# Prompt Evolution Proposals
+
+Prompt evolution proposals are staged here before edits are applied.
+
+Use this format:
+
+```markdown
+---
+title: ""
+status: proposed
+target_files: []
+created: YYYY-MM-DD
+evidence: []
+---
+
+# Proposal
+
+## Problem
+
+## Evidence
+
+## Proposed Change
+
+## Validation Plan
+
+## Rollback Plan
+
+## Reviewer Notes
+```
+
@@ -0,0 +1,6 @@
+# Filed Q&A
+
+Useful answers that should be preserved for future sessions belong here.
+
+When filing an answer, update `../index.md` and append to `../log.md`.
+