#1 on LoCoMo10 — outperforms every published AI memory system ~2,900 tokens/query vs 33,490 with OpenClaw's built-in memory · upgrade it with MemClaw
LLM-as-a-Judge evaluation on LoCoMo10 (long-conversation memory QA). All results use the same judge methodology.
* OpenViking tested on 1,540 questions (10 samples). Cortex Memory tested on 152 questions (conv-26 sample). Same LLM-as-a-Judge methodology throughout. Cortex Memory Intent OFF score is estimated from measured delta on the same dataset.
LoCoMo10 tests four distinct memory capabilities. Cortex Memory (Intent ON) leads in every category.
Cortex Memory's Intent Analysis classifies each query before retrieval, routing to the right memory scope. The improvement is most pronounced on multi-hop and temporal questions.
A side-by-side look at what each system supports.
| Capability | Cortex Memory + OpenClaw | OpenViking + OpenClaw | OpenClaw built-in |
|---|---|---|---|
| LoCoMo10 Best Score | 68.42% 🏆 | 52.08% | 35.65% |
| Avg Tokens / Question | ~2,900 | ~2,769 | ~15,982 |
| Hierarchical Memory (L0 / L1 / L2) |
✓ 3 layered context loading | ✓ 3 layered context loading | — |
| Intent-Driven Retrieval | ✓ multi intent types | ✓ multi intent types | — |
| Open Source | 🦀 Rust (Efficient & Secure) | 🐢 Python (Slow) | 🗑️ JavaScript (Bloated) |
Cortex Memory's hierarchical L0/L1/L2 architecture means you only pay for the context you actually need — precision retrieval without the bloat.
The architecture choices that drive superior benchmark performance.
All raw outputs, judge reports, and methodology details are publicly available in the repository.