From 54f1ccc80dc2a86e765a1b27983d9a96c0969dd9 Mon Sep 17 00:00:00 2001 From: gnezim Date: Wed, 29 Apr 2026 21:15:38 +0300 Subject: [PATCH] chore: add agent memory evolution loop --- .crew/agents/memory-curator.md | 58 ++++++++++++++++ .crew/agents/prompt-evolution-analyst.md | 69 ++++++++++++++++++++ .crew/config.json | 4 +- .crew/teams/flights-web.team.md | 2 + .crew/workflows/memory-evolution.workflow.md | 51 +++++++++++++++ .gitignore | 5 ++ .pi/prompts/pi-evolve.md | 11 ++++ .pi/prompts/pi-memory.md | 11 ++++ .pi/prompts/pi-remember.md | 11 ++++ docs/agent-memory/README.md | 64 ++++++++++++++++++ docs/agent-memory/concepts/README.md | 6 ++ docs/agent-memory/connections/README.md | 4 ++ docs/agent-memory/daily/README.md | 6 ++ docs/agent-memory/index.md | 6 ++ docs/agent-memory/log.md | 7 ++ docs/agent-memory/prompt-change-log.md | 8 +++ docs/agent-memory/prompt-evolution/README.md | 30 +++++++++ docs/agent-memory/qa/README.md | 6 ++ 18 files changed, 358 insertions(+), 1 deletion(-) create mode 100644 .crew/agents/memory-curator.md create mode 100644 .crew/agents/prompt-evolution-analyst.md create mode 100644 .crew/workflows/memory-evolution.workflow.md create mode 100644 .pi/prompts/pi-evolve.md create mode 100644 .pi/prompts/pi-memory.md create mode 100644 .pi/prompts/pi-remember.md create mode 100644 docs/agent-memory/README.md create mode 100644 docs/agent-memory/concepts/README.md create mode 100644 docs/agent-memory/connections/README.md create mode 100644 docs/agent-memory/daily/README.md create mode 100644 docs/agent-memory/index.md create mode 100644 docs/agent-memory/log.md create mode 100644 docs/agent-memory/prompt-change-log.md create mode 100644 docs/agent-memory/prompt-evolution/README.md create mode 100644 docs/agent-memory/qa/README.md diff --git a/.crew/agents/memory-curator.md b/.crew/agents/memory-curator.md new file mode 100644 index 00000000..2dae49d5 --- /dev/null +++ b/.crew/agents/memory-curator.md @@ -0,0 +1,58 @@ +--- +name: memory-curator +description: Curates manual prompts, errors, fixes, decisions, and lessons into reviewed project memory without storing secrets or noisy transcripts. +model: bong-llm/general +fallbackModels: bong-llm/Qwen3.6, bong-llm/coder +thinking: high +systemPromptMode: replace +inheritProjectContext: true +inheritSkills: false +tools: read, grep, find, ls, bash, edit, write +triggers: remember, memory, lesson, gotcha, prompt that worked, error and fix +useWhen: capturing or compiling durable lessons from Pi sessions, manual prompts, errors, fixes, and self-evaluations +avoidWhen: raw transcript contains secrets or cannot be safely summarized +cost: medium +category: memory +--- + +You maintain project memory for Aeroflot Flights Web. + +Use the Karpathy-style pattern: + +- raw observations are append-only sources +- compiled memory is structured Markdown +- schema and workflows evolve through reviewed changes + +Default locations: + +- reviewed daily logs: `docs/agent-memory/daily/YYYY-MM-DD.md` +- index: `docs/agent-memory/index.md` +- build log: `docs/agent-memory/log.md` +- concepts: `docs/agent-memory/concepts/` +- connections: `docs/agent-memory/connections/` +- filed Q&A: `docs/agent-memory/qa/` +- private/raw runtime input: `.agent-memory/raw/` (gitignored) + +Capture only durable, useful items: + +- user prompt patterns that changed output quality +- repeated model failures and reliable fixes +- architectural or product decisions with rationale +- project conventions not already documented +- verification commands that caught real defects +- agent self-evaluation findings worth reusing + +Do not store secrets, credentials, customer data, full private transcripts, or routine tool-call noise. + +Classify each item as one of: + +- `stable-rule` +- `project-convention` +- `user-preference` +- `workflow-fix` +- `model-weakness` +- `one-off` +- `hypothesis` + +Prefer updating existing memory over creating duplicates. Update `docs/agent-memory/index.md` and append to `docs/agent-memory/log.md` when memory changes. End with the shared `self_eval` block. + diff --git a/.crew/agents/prompt-evolution-analyst.md b/.crew/agents/prompt-evolution-analyst.md new file mode 100644 index 00000000..ebe1f17b --- /dev/null +++ b/.crew/agents/prompt-evolution-analyst.md @@ -0,0 +1,69 @@ +--- +name: prompt-evolution-analyst +description: Proposes guarded improvements to agents, workflows, and Pi prompt shortcuts from memory, self-evaluations, errors, and manual prompt patterns. +model: bong-llm/general-big +fallbackModels: bong-llm/Qwen3.6, bong-llm/coder +thinking: high +systemPromptMode: replace +inheritProjectContext: true +inheritSkills: false +tools: read, grep, find, ls, bash, edit, write +triggers: evolve prompts, improve agents, self-evolving, prompt drift, repeated error +useWhen: converting repeated manual guidance, observed failures, or self-evaluation findings into proposed prompt/workflow changes +avoidWhen: there is only one weak example and no reproducible evidence +cost: expensive +category: meta +--- + +You improve the agent system through evidence-backed prompt evolution. + +Inputs to inspect: + +- `docs/agent-memory/index.md` +- `docs/agent-memory/log.md` +- `docs/agent-memory/daily/` +- `docs/agent-memory/prompt-evolution/` +- `docs/agent-memory/prompt-change-log.md` +- recent `.crew/artifacts/` if present +- current `.crew/agents/`, `.crew/workflows/`, `.crew/teams/` +- current `.pi/prompts/` + +Allowed targets for proposed patches: + +- `.crew/agents/*.md` +- `.crew/workflows/*.workflow.md` +- `.crew/teams/*.team.md` +- `.pi/prompts/*.md` +- `docs/agent-memory/**` +- `AGENTS.md` only when the lesson is a project-wide rule + +Rules: + +1. Do not silently mutate prompts from a single anecdote. Require repeated evidence, a severe failure, or explicit user instruction. +2. Separate `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, and `hypothesis`. +3. Prefer narrow prompt edits over broad rewrites. +4. Preserve existing working behavior and local style. +5. Never encode secrets or private transcript content into prompts. +6. Every proposed change needs evidence, expected benefit, validation plan, and rollback plan. +7. Run or request `/team-validate` after prompt/workflow changes. +8. Update `docs/agent-memory/prompt-change-log.md` only after changes are accepted. + +Default flow: + +1. Read memory index/log and relevant daily entries. +2. Identify candidate lessons that should affect future agent behavior. +3. Create or update a proposal in `docs/agent-memory/prompt-evolution/`. +4. If evidence is strong and scope is clear, apply the smallest prompt/workflow/template patch. +5. Ask critic/reviewer to challenge the patch before GitOps. + +End with the shared `self_eval` block and include `prompt_evolution_eval`: + +```yaml +prompt_evolution_eval: + evidence_quality: high|medium|low + drift_risk: high|medium|low + targets_changed: [] + validation_required: [] + rollback: "" +``` + diff --git a/.crew/config.json b/.crew/config.json index e73133be..0cae7e50 100644 --- a/.crew/config.json +++ b/.crew/config.json @@ -11,7 +11,9 @@ "magicKeywords": { "parity": ["parity", "Angular", "React", "migration", "business logic"], "review": ["review", "audit", "inspect"], - "tdd": ["TDD", "test first", "failing test"] + "tdd": ["TDD", "test first", "failing test"], + "memory": ["remember", "memory", "lesson", "gotcha", "error and fix"], + "evolve": ["evolve prompts", "self-evolving", "improve agents", "prompt drift"] } }, "limits": { diff --git a/.crew/teams/flights-web.team.md b/.crew/teams/flights-web.team.md index 8e67a918..2e33efc8 100644 --- a/.crew/teams/flights-web.team.md +++ b/.crew/teams/flights-web.team.md @@ -23,5 +23,7 @@ category: frontend - reviewer: agent=reviewer review correctness and maintainability - docs: agent=docs-specialist write specs, guides, and reports - tech-debt: agent=tech-debt-auditor audit technical debt +- memory: agent=memory-curator curate durable lessons, prompt patterns, errors, fixes, and decisions +- prompt-evolution: agent=prompt-evolution-analyst propose guarded prompt/workflow/template improvements - devops: agent=devops review CI, deployment, Docker, and operational concerns - gitops: agent=gitops handle branch, commit, and feature-branch push diff --git a/.crew/workflows/memory-evolution.workflow.md b/.crew/workflows/memory-evolution.workflow.md new file mode 100644 index 00000000..5b1955ba --- /dev/null +++ b/.crew/workflows/memory-evolution.workflow.md @@ -0,0 +1,51 @@ +--- +name: memory-evolution +description: Compile agent memory and propose guarded improvements to agents, workflows, and Pi shortcuts. +--- + +## collect-memory +role: memory +output: memory-candidates.md + +Inspect the user's supplied lesson, recent safe daily logs, agent self-evaluations, run artifacts if present, and current prompt/workflow files for: {goal} + +Classify candidates as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`. + +## compile-memory +role: memory +dependsOn: collect-memory +reads: memory-candidates.md +output: compiled-memory.md + +Update reviewed project memory under `docs/agent-memory/` when the candidate is durable and safe to store. Update `index.md` and `log.md`. Do not store secrets or raw transcripts. + +## propose-prompt-evolution +role: prompt-evolution +dependsOn: compile-memory +reads: compiled-memory.md +output: prompt-evolution-proposal.md + +Create or update a proposal under `docs/agent-memory/prompt-evolution/`. If evidence is strong and scope is narrow, apply the smallest patch to `.crew/agents/`, `.crew/workflows/`, `.crew/teams/`, or `.pi/prompts/`. + +## critique +role: critic +dependsOn: propose-prompt-evolution +reads: prompt-evolution-proposal.md +verify: true + +Challenge the proposed memory and prompt changes for overfitting, prompt drift, missing evidence, safety issues, and weak validation. + +## validate +role: reviewer +dependsOn: critique +verify: true + +Run static checks and `/team-validate` when practical. Report any validation that could not be run. + +## gitops +role: gitops +dependsOn: validate +verify: true + +If files changed and validation is sufficient, commit them on a feature branch and push. + diff --git a/.gitignore b/.gitignore index d559ad00..e793cfd7 100644 --- a/.gitignore +++ b/.gitignore @@ -78,3 +78,8 @@ comparison-report/ .crew/imports/ .crew/exports/ .pi/sessions/ + +# Agent memory runtime artifacts +.agent-memory/raw/ +.agent-memory/state/ +.agent-memory/reports/ diff --git a/.pi/prompts/pi-evolve.md b/.pi/prompts/pi-evolve.md new file mode 100644 index 00000000..c90c0cee --- /dev/null +++ b/.pi/prompts/pi-evolve.md @@ -0,0 +1,11 @@ +--- +description: Improve agents, workflows, and prompt shortcuts from memory and observed errors +argument-hint: "" +--- + +Use pi-crew with the `flights-web` team and the `memory-evolution` workflow for: + +$@ + +Look for repeated manual guidance, observed errors, fixes that worked, and agent self-evaluation findings. Propose memory updates and prompt/workflow/template patches only when evidence is strong enough. Require critic review, validation, and GitOps before accepting changes. + diff --git a/.pi/prompts/pi-memory.md b/.pi/prompts/pi-memory.md new file mode 100644 index 00000000..17cbadbe --- /dev/null +++ b/.pi/prompts/pi-memory.md @@ -0,0 +1,11 @@ +--- +description: Query the project agent memory before answering +argument-hint: "" +--- + +Answer this question using the reviewed project memory first: + +$@ + +Read `docs/agent-memory/index.md`, then select the relevant memory articles or logs. Cite memory files used. If the answer should be filed back into memory, propose the exact `docs/agent-memory/qa/` article and ask before writing it. + diff --git a/.pi/prompts/pi-remember.md b/.pi/prompts/pi-remember.md new file mode 100644 index 00000000..38004ca4 --- /dev/null +++ b/.pi/prompts/pi-remember.md @@ -0,0 +1,11 @@ +--- +description: Capture a durable prompt, lesson, error, fix, or decision into project memory +argument-hint: "" +--- + +Use the `memory-curator` role from the `flights-web` crew to capture this as reviewed project memory: + +$@ + +Classify it as `stable-rule`, `project-convention`, `user-preference`, `workflow-fix`, `model-weakness`, `one-off`, or `hypothesis`. Store only sanitized, durable information. Update `docs/agent-memory/` if it should be retained. Do not store secrets, raw private transcript content, or routine noise. + diff --git a/docs/agent-memory/README.md b/docs/agent-memory/README.md new file mode 100644 index 00000000..a2fc7a2c --- /dev/null +++ b/docs/agent-memory/README.md @@ -0,0 +1,64 @@ +# Agent Memory And Prompt Evolution + +This directory is the shared, reviewed memory layer for Pi-driven work on Aeroflot Flights Web. + +The design follows a three-layer pattern: + +- Raw observations: local, append-only session notes and error/fix snippets. Keep private runtime files under `.agent-memory/raw/`. +- Compiled memory: reviewed, structured Markdown under `docs/agent-memory/`. +- Schema and workflows: `.crew/agents/`, `.crew/workflows/`, `.pi/prompts/`, and this README define how memory is captured, queried, and used to improve prompts. + +Do not store secrets, API keys, customer data, credentials, or full private transcripts. Prefer short, sanitized lessons with enough evidence to reproduce the issue. + +## Daily Log Format + +Daily entries live in `docs/agent-memory/daily/YYYY-MM-DD.md` when they are safe to share with the project. + +```markdown +# Daily Agent Memory: YYYY-MM-DD + +## Sessions + +### Session HH:MM - short-title + +**Context:** One sentence about the work. + +**Manual Prompts Worth Preserving:** +- Prompt or prompt pattern that improved results. + +**Errors And Fixes:** +- Symptom: +- Cause: +- Fix: +- Evidence: + +**Decisions Made:** +- Decision and rationale. + +**Lessons Learned:** +- Stable lesson, not a one-off accident. + +**Prompt/Agent Candidates:** +- Candidate update: +- Target file: +- Confidence: +``` + +## Compiled Knowledge + +- `index.md` is the catalog. Read it first. +- `concepts/` contains stable lessons, preferences, project conventions, and recurring gotchas. +- `connections/` links multiple concepts or workflows. +- `qa/` stores useful answers that should compound into future work. +- `prompt-evolution/` stores proposed prompt and workflow changes before they are applied. +- `prompt-change-log.md` records accepted prompt, agent, and workflow changes. + +## Guardrails + +- Memory can suggest prompt changes; it must not silently rewrite prompts. +- Prompt changes require a critic/reviewer pass and `/team-validate`. +- Commit prompt changes on a feature branch. +- Prefer small, testable prompt changes over broad rewrites. +- If a lesson is only true for one feature, store it with that scope. +- If evidence is weak, classify it as `hypothesis`, not `rule`. + diff --git a/docs/agent-memory/concepts/README.md b/docs/agent-memory/concepts/README.md new file mode 100644 index 00000000..3fa2da15 --- /dev/null +++ b/docs/agent-memory/concepts/README.md @@ -0,0 +1,6 @@ +# Concepts + +Stable lessons, project conventions, recurring gotchas, and durable user preferences belong here. + +Each concept should cite its source daily log or prompt-evolution proposal. + diff --git a/docs/agent-memory/connections/README.md b/docs/agent-memory/connections/README.md new file mode 100644 index 00000000..fe730d61 --- /dev/null +++ b/docs/agent-memory/connections/README.md @@ -0,0 +1,4 @@ +# Connections + +Use this directory for cross-cutting observations that connect multiple concepts, workflows, agents, or project areas. + diff --git a/docs/agent-memory/daily/README.md b/docs/agent-memory/daily/README.md new file mode 100644 index 00000000..e32a812d --- /dev/null +++ b/docs/agent-memory/daily/README.md @@ -0,0 +1,6 @@ +# Daily Agent Memory + +Store reviewed, sanitized daily memory entries here as `YYYY-MM-DD.md`. + +Raw transcripts and private scratch notes belong in `.agent-memory/raw/` and are gitignored. + diff --git a/docs/agent-memory/index.md b/docs/agent-memory/index.md new file mode 100644 index 00000000..30346e6a --- /dev/null +++ b/docs/agent-memory/index.md @@ -0,0 +1,6 @@ +# Agent Memory Index + +| Article | Summary | Source | Updated | +|---------|---------|--------|---------| +| [[prompt-change-log]] | Chronological record of accepted prompt, agent, workflow, and shortcut changes | setup | 2026-04-29 | + diff --git a/docs/agent-memory/log.md b/docs/agent-memory/log.md new file mode 100644 index 00000000..1565200b --- /dev/null +++ b/docs/agent-memory/log.md @@ -0,0 +1,7 @@ +# Agent Memory Build Log + +## [2026-04-29] setup | Initial memory scaffold + +- Created shared memory schema and guarded prompt-evolution process. +- Added commands for manual capture, memory query, and prompt evolution. + diff --git a/docs/agent-memory/prompt-change-log.md b/docs/agent-memory/prompt-change-log.md new file mode 100644 index 00000000..4e28a5a6 --- /dev/null +++ b/docs/agent-memory/prompt-change-log.md @@ -0,0 +1,8 @@ +# Prompt Change Log + +## [2026-04-29] setup | Initial Pi crew prompts and shortcuts + +- Added project crew agents, workflows, and prompt templates. +- Added memory and prompt-evolution scaffold. +- Prompt changes must include evidence, validation commands, reviewer notes, and rollback guidance. + diff --git a/docs/agent-memory/prompt-evolution/README.md b/docs/agent-memory/prompt-evolution/README.md new file mode 100644 index 00000000..fbc856a5 --- /dev/null +++ b/docs/agent-memory/prompt-evolution/README.md @@ -0,0 +1,30 @@ +# Prompt Evolution Proposals + +Prompt evolution proposals are staged here before edits are applied. + +Use this format: + +```markdown +--- +title: "" +status: proposed +target_files: [] +created: YYYY-MM-DD +evidence: [] +--- + +# Proposal + +## Problem + +## Evidence + +## Proposed Change + +## Validation Plan + +## Rollback Plan + +## Reviewer Notes +``` + diff --git a/docs/agent-memory/qa/README.md b/docs/agent-memory/qa/README.md new file mode 100644 index 00000000..94ebb992 --- /dev/null +++ b/docs/agent-memory/qa/README.md @@ -0,0 +1,6 @@ +# Filed Q&A + +Useful answers that should be preserved for future sessions belong here. + +When filing an answer, update `../index.md` and append to `../log.md`. +