Mark's Reports

All reports · published 2026-04-19

Our Memory Setup vs The Field — Honest Comparison

*Written 2026-04-19 in response to your question after listening to NLW's Agent Madness episode where he flagged "the memory gap holding the whole field back."*

What we built — two complementary layers

Layer 1: Git-Based Markdown Memory (the deliberate one)

Layer 2: Real-time Vector Memory (TP3 Neural Stack)


How this stacks up vs the field

NLW called it right on Agent Madness: memory is the gap holding the agent field back. Most production agents reset every session. The leading frameworks trying to fix this:

FrameworkStrengthWhere it lags vs ours
Mem0 (managed SaaS)Auto fact-extraction from conversations, 3-tier scoping, SOC 2/HIPAA, polished APIVendor lock-in, ongoing cost, your data leaves your machine
Zep + GraphitiTemporal knowledge graph — knows when facts were true. Scores 63.8% on LongMemEval vs Mem0's 49%Heavier infra, requires graph thinking, not human-readable
Letta (MemGPT)Agent self-manages OS-style memory hierarchy (working vs long-term)Black-box decisions, you can't audit "what does it remember" easily
CogneeOpen-source knowledge graph layer, precision retrievalLess mature, requires you to model entities

Where ours LEADS

1. Truly cross-AGENT — Claude Code, Cursor, Antigravity, Jules, future tools all see the SAME memory. Most "memory frameworks" assume one agent. We did the dirty work (SQLite injection into Cursor's user-rules) to make it work cross-IDE. Almost nobody else has this.

2. Truly cross-MACHINE — G16 + Apex + mobile in sync within ~seconds. Most frameworks assume single deployment.

3. Git history as audit trail — full diff/blame on every memory change. Nothing else has this.

4. You can read it yourself — plain markdown. You can open any .md file and audit "what does Claude know about me." Mem0/Zep store memories as opaque records.

5. Zero recurring cost, zero data leaves your control.


Where ours LAGS (and what we're closing)

1. No auto fact-extraction — I have to manually write memory files when I learn something new. Mem0 watches conversations and proposes entries automatically. → Building tonight: daily fact-extractor agent. Like your morning report, but in agent terms — what did Mark say today, what new preferences/decisions, what should become memory.

2. No temporal reasoning — Zep can answer "what did Mark say about Legacy Soil last month vs this month." We just have git timestamps. → Building tonight: temporal frontmatter (valid_from, superseded_by fields).

3. No semantic search across the memory files themselves — TP3 does semantic search on OMI transcripts, but the 100+ memory .md files are only retrievable by filename + index lines. → Building tonight: vector-index the memory .md files into TP3 alongside OMI rows.

4. No benchmarking — LongMemEval exists; we don't measure if memory is helping vs hurting. → Building tonight: baseline run, see how accurate I am about you. This is the one you flagged most — "how much you're learning about me."


Honest verdict

For solo-user multi-agent continuity (your specific need): we're ahead of what's available off-the-shelf.

For production AI assistant memory at scale (Mem0/Zep's target): we'd lag — but you don't have those needs.

After tonight's 4 gap-closers: we'll have what NO open framework currently has — cross-agent + cross-machine + audit-readable memory + auto-extraction + temporal awareness + semantic retrieval + measurable accuracy.

That's a real moat for your specific use case (Digital Twin Architect, single-user, multi-agent stack).


Sources