Backboard.io announced state-of-the-art performance across the two leading AI memory benchmarks, LoCoMo and LongMemEval, reinforcing its position as a foundational AI stack for production-grade and agentic systems.
An independent evaluation conducted by NewMathData, a Texas-based engineering firm and AWS Rising Star Partner of the Year, measured Backboard’s performance on the LongMemEval benchmark using the benchmark’s original academic specification. Backboard achieved 93.4% overall accuracy, the highest publicly reported result under consistent methodology and a material margin ahead of other reported systems.
During post-evaluation review, Backboard and the independent evaluator identified multiple instances where Backboard’s responses were marked incorrect despite being more precise and semantically accurate than the benchmark’s expected answer. In these cases, Backboard answered the question as written, incorporating factual context already present in the interaction, while the benchmark’s “gold” answer reflected a narrower or alternate interpretation of the prompt. As a result, the reported LongMemEval score should be considered a conservative lower bound on performance rather than an upper limit.
These results build on Backboard’s previously published 90.1% accuracy on the LoCoMo benchmark, with results publicly available and reproducible via GitHub. Achieving state-of-the-art performance on both benchmarks is uncommon, as most systems optimize for either short-horizon precision or long-horizon persistence, but not both.
Importantly, Backboard did not set out to optimize for benchmarks. The LongMemEval evaluation was initiated and run independently, and the LoCoMo benchmark was explored simply to understand where Backboard fit relative to academic research. The results reflect system-level behavior, not benchmark-specific tuning.
We didn’t build Backboard to chase benchmarks, we built it to solve real problems that show up when AI systems run for a long time, across multiple agents, under real constraints. The benchmarks just happened to confirm what we were already seeing in practice.
Rob Imbeault, Co-Founder, CEO of Backboard.io
A Complete AI Stack, Not a Bolt-On Component
Backboard is not a router, a wrapper, or a memory plugin. It is a unified AI infrastructure stack designed to serve as the starting point for modern AI systems.
From a single API, Backboard provides:
- Persistent long-term memory
- Native embeddings and vectorization
- Retrieval-augmented generation (RAG)
- Shared memory across agents
- Access to more than 17,000 large language models, including a ‘Bring your own API Key’ option
By integrating memory, embeddings, retrieval, and model access into one system, Backboard eliminates the need for enterprises to stitch together fragile chains of open-source components. Memory is treated as first-class infrastructure, not application logic.
This architecture allows systems to evolve without breaking:
- Models can be swapped without losing continuity
- Agents can coordinate while sharing state
- Retrieval strategies can change without rewrites
- Systems remain coherent as complexity grows
Independent Validation of What “Memory” Really Means
In a recent article published by the Ottawa Business Journal, Adyasha Maharana, creator of the LoCoMo benchmark and research scientist at Databricks, commented on Backboard.io’s performance and clarified an important distinction often lost in AI evaluations.
The dataset is designed to examine not just an LLM but any LLM-based system’s capabilities and blindspots in a fine-grained manner. Raw human performance is somewhere around 88 percent. Breaking the 90-percent threshold requires superhuman consistency in recall and reasoning. Most high-performing frontier models currently score around 80 percent on LoCoMo. The system built by Backboard.io is a far better attempt at simulating memory as it manifests in humans. It is practical, cheaper, scalable and doesn’t rely solely on brute-force LLM processing for answers.
Adyasha Maharana, creator of the LoCoMo benchmark and research scientist at Databricks
This distinction underscores why Backboard’s results reflect more than model capacity. They demonstrate a system-level approach to memory that persists, evolves, and remains reliable over time.
Making Agentic AI Practical
As interest in agentic AI accelerates, many systems fail to move beyond isolated demos because memory is treated as an afterthought. Without reliable, shared memory, agents fragment, hallucinate, and reset.Backboard addresses this constraint directly by enabling persistent, shared memory across countless agents, even when those agents operate on different underlying models. When memory is solved, agentic behavior emerges naturally rather than being scripted.
“Agentic AI doesn’t become meaningful because you call something an agent,” said Imbeault. “It becomes meaningful when agents can remember, coordinate, and operate over time. Solving memory is the prerequisite.”
Backboard’s architecture is built around Active Temporal Resonance, a memory framework designed to preserve meaning and continuity as interactions unfold. By maintaining temporal coherence rather than reconstructing state through static graphs or repeated retrieval, Backboard enables systems that remain consistent, auditable, and trustworthy at scale.
Built by a Founder Enterprises Already Trust
Imbeault previously founded Assent, a platform trusted by Fortune 100 companies to manage complex supply chain and regulatory compliance workflows. That experience informed Backboard’s focus on durability, correctness, and trust from day one.
“Enterprise systems don’t get to reset,” said Imbeault. “If they lose context or trust, they fail. That mindset shaped how we built Backboard.”
Important Note and What Comes Next
All other memory providers are stateless which limits their ability to run the LongMemEval benchmark in a manner that would replicate real world AI exchanges. We have replicated the benchmark using the original academic specification and achieved 93.4%. Since we offer more than just memory, we offer fully stateful functionality, we are running the same benchmarks that will properly emulate how it would take place in the real world.
With foundational memory validated across independent and academic benchmarks, Backboard is turning its attention to how teams evaluate and reason about complex AI systems in practice.
The company will soon introduce Switchboard, a new capability designed to help developers and enterprises better understand how different AI system configurations behave under real-world constraints. Additional details will be shared in the coming weeks.
The future of AI isn’t about clever tricks or bolt-ons, it’s about building systems that can be trusted over time. Memory is the foundation, and that’s where enterprises should start.
Rob Imbeault
Related News:
Safer Internet Day 2026: From Awareness to Action in Internet Safety
Data Privacy Day: Earn Permission To Operate