142 lines
5.7 KiB
Markdown
142 lines
5.7 KiB
Markdown
# Agent Memory And Prompt Evolution
|
|
|
|
This directory is the shared, reviewed memory layer for Pi-driven work on Aeroflot Flights Web.
|
|
|
|
The design follows a three-layer pattern:
|
|
|
|
- Raw observations: local, append-only session notes and error/fix snippets. Keep private runtime files under `.agent-memory/raw/`.
|
|
- Compiled memory: reviewed, structured Markdown under `docs/agent-memory/`.
|
|
- Schema and workflows: `.pi/teams/agents/`, `.pi/teams/workflows/`, `.pi/teams/teams/`, `.pi/prompts/`, and this README define how memory is captured, queried, and used to improve prompts.
|
|
|
|
Do not store secrets, API keys, customer data, credentials, or full private transcripts. Prefer short, sanitized lessons with enough evidence to reproduce the issue.
|
|
|
|
## Daily Log Format
|
|
|
|
Daily entries live in `docs/agent-memory/daily/YYYY-MM-DD.md` when they are safe to share with the project.
|
|
|
|
```markdown
|
|
# Daily Agent Memory: YYYY-MM-DD
|
|
|
|
## Sessions
|
|
|
|
### Session HH:MM - short-title
|
|
|
|
**Context:** One sentence about the work.
|
|
|
|
**Manual Prompts Worth Preserving:**
|
|
- Prompt or prompt pattern that improved results.
|
|
|
|
**Errors And Fixes:**
|
|
- Symptom:
|
|
- Cause:
|
|
- Fix:
|
|
- Evidence:
|
|
|
|
**Decisions Made:**
|
|
- Decision and rationale.
|
|
|
|
**Lessons Learned:**
|
|
- Stable lesson, not a one-off accident.
|
|
|
|
**Prompt/Agent Candidates:**
|
|
- Candidate update:
|
|
- Target file:
|
|
- Confidence:
|
|
```
|
|
|
|
## Compiled Knowledge
|
|
|
|
- `index.md` is the catalog. Read it first.
|
|
- `concepts/` contains stable lessons, preferences, project conventions, and recurring gotchas.
|
|
- `connections/` links multiple concepts or workflows.
|
|
- `qa/` stores useful answers that should compound into future work.
|
|
- `prompt-evolution/` stores proposed prompt and workflow changes before they are applied.
|
|
- `prompt-change-log.md` records accepted prompt, agent, and workflow changes.
|
|
|
|
## Automation
|
|
|
|
The project-local Pi extension lives at `.pi/extensions/agent-memory.ts`.
|
|
|
|
It does these things automatically:
|
|
|
|
- injects the reviewed memory index and prompt change log into each agent turn through `before_agent_start`
|
|
- records private raw metrics and session snapshots under `.agent-memory/raw/`
|
|
- creates a reviewable pending memory candidate under `.agent-memory/review/pending/` after each completed agent turn
|
|
- records provider header latency, agent duration, turn duration, tool counts, prompt submission gaps, and compaction/shutdown events
|
|
- detects likely agent questions, starts a blocked-on-user wait timer, and stops it on your next interactive prompt
|
|
- exposes memory and timing commands
|
|
|
|
Raw metrics and review candidates are intentionally gitignored. They are evidence for later compilation, not reviewed memory.
|
|
|
|
The extension cannot infer true keystroke-level active typing time from Pi's current public extension events. It records explicit active work blocks plus automatic gap metrics:
|
|
|
|
- `activeWorkMs`: time between `/prompt-start` and `/prompt-stop`, excluding `/prompt-pause` to `/prompt-resume`
|
|
- `pauseInclusiveGapMs`: time between previous agent completion and next prompt submission
|
|
- `idleExcludedMs`: the portion above the 5-minute idle cap
|
|
- `blockedOnUserMs`: waiting time between a detected agent question and your next prompt; this is not counted as active user effort
|
|
- `activeAnswerMs`: explicit time between `/answer-start` and `/answer-stop` while you are composing an answer to an agent question
|
|
- `agentDurationMs`: prompt submission to `agent_end`
|
|
- `turnDurationMs`: one LLM/tool turn duration
|
|
- `headerLatencyMs`: provider request to HTTP response headers
|
|
|
|
Use `npx @ccusage/pi@latest session` and LiteLLM metrics for token/cost and provider-side inference timing.
|
|
|
|
## Commands
|
|
|
|
Inside Pi:
|
|
|
|
```text
|
|
/memory-status
|
|
/memory-capture [note]
|
|
/memory-compile [goal]
|
|
/memory-review
|
|
/memory-show [filename-fragment]
|
|
/memory-approve [filename-fragment]
|
|
/memory-discard [filename-fragment]
|
|
/memory-clear
|
|
/prompt-start [label]
|
|
/prompt-pause
|
|
/prompt-resume
|
|
/prompt-stop
|
|
/answer-start [label]
|
|
/answer-stop
|
|
/blocked-status
|
|
/time-report
|
|
/pi-remember <lesson-or-error-fix>
|
|
/pi-memory <question>
|
|
/pi-evolve <evidence-or-goal>
|
|
```
|
|
|
|
`/memory-compile` captures a private snapshot and then asks Pi to run `/pi-evolve` against the latest evidence.
|
|
|
|
After each agent turn, inspect pending candidates:
|
|
|
|
```text
|
|
/memory-review
|
|
/memory-show
|
|
```
|
|
|
|
Approve the latest or matching candidate:
|
|
|
|
```text
|
|
/memory-approve
|
|
/memory-approve 20260429T120000Z
|
|
```
|
|
|
|
Approval moves the candidate to `.agent-memory/review/approved/` and launches `/pi-evolve` so reviewed memory and prompt changes can be proposed. Discard candidates with `/memory-discard` or clear all pending candidates with `/memory-clear`.
|
|
|
|
Use `/prompt-start` when you begin an active prompting/planning block and `/prompt-stop` when you finish. Use `/prompt-pause` before leaving for an extended wait and `/prompt-resume` when you return.
|
|
|
|
When the agent asks a question and you need time to compose the answer, use `/answer-start` before you start thinking/writing and `/answer-stop` after you send the answer. The extension also starts a blocked-on-user wait timer automatically when the final assistant message looks like a direct question. That wait time stops on your next interactive prompt and is reported separately from active user work time.
|
|
|
|
`/blocked-status` shows whether the automatic blocked-on-user timer is currently running. `/time-report` writes a private report under `.agent-memory/reports/`.
|
|
|
|
## Guardrails
|
|
|
|
- Memory can suggest prompt changes; it must not silently rewrite prompts.
|
|
- Prompt changes require a critic/reviewer pass and `/team-validate`.
|
|
- Commit prompt changes on a feature branch.
|
|
- Prefer small, testable prompt changes over broad rewrites.
|
|
- If a lesson is only true for one feature, store it with that scope.
|
|
- If evidence is weak, classify it as `hypothesis`, not `rule`.
|