Breakthrough CXL Memory Solution Targets AI Inference Workloads

0
As organizations rapidly adopt large language models (LLMs), generative AI, and real-time inference applications, memory scale, bandwidth, and latency have become critical bottlenecks. XConn Technologies and MemVerge announced a joint demonstration of a Compute Express Link® (CXL®) memory pool designed to overcome the AI memory wall. The live demonstration will be held at Supercomputing 2025 (SC25) in St. Louis, November 16–21, 2025, at booth #817, stations 2 and 8.

Academic and industry analysts agree that memory bandwidth growth has lagged far behind compute performance. While server FLOPS have surged, DRAM and interconnect bandwidth have scaled much more slowly, making memory the dominant bottleneck for many AI inference workloads. Experts warn that AI growth is already hitting a memory wall forcing a rapid need for memory and interconnect architectures to evolve. The memory-intensive nature of retrieval-augmented generation, vector search, agentic AI, and large language model inference is pushing traditional DDR and HBM-based server architectures to their limits, creating both performance and TCO challenges.

“As AI workloads and model sizes explode, the limiting factor is no longer just GPU count, it’s how much memory can be shared, how fast it can be accessed, and how cost-efficiently it can scale,” said Gerry Fan, CEO of XConn Technologies. “Our collaboration with MemVerge demonstrates that CXL memory pooling at 100 TiB and beyond is production-ready, not theoretical. This is the architecture that makes large-scale AI inference truly feasible.”

To address these challenges, XConn and MemVerge are demonstrating a rack-scale CXL memory pooling solution built around XConn’s Apollo hybrid CXL/PCIe switch and MemVerge’s Gismo technology, optimized for NVIDIA’s Dynamo architecture and NIXL software stack. The demo showcases how AI inference workloads can offload and share massive KV cache resources dynamically across GPUs and CPUs, achieving greater than 5× performance improvements compared with SSD-based caching or RMDA-based KV cache offloading, while reducing total cost of ownership. The demo particularly shows a scalable memory architecture for AI inference workloads where there is a disaggregation of prefill and decode work stages.

“Memory has become the new frontier of AI infrastructure innovation,” said Charles Fan, CEO and co-founder of MemVerge. “By using MemVerge GISMO with XConn’s Apollo switch, we’re showcasing software-defined, elastic CXL memory that delivers the performance and flexibility needed to power the next wave of agentic AI and hyperscale inference. Together, we’re redefining how memory is provisioned and utilized in AI data centers.”

As AI becomes increasingly data-centric and memory-bound, rather than compute-bound, traditional server architectures can no longer keep up. CXL memory pooling addresses these limitations by enabling dynamic, low-latency memory sharing across CPUs, GPUs, and accelerators. It scales up to hundreds of terabytes of shared memory, reduces TCO through better utilization, reduces over-provisioning and enhances throughput for inference-first workloads, generative AI, real-time analytics, and in-memory databases.

SC25 attendees can experience the joint demo featuring a CXL memory pool dynamically shared across CPUs and GPUs, with inferencing benchmarks illustrating significant performance and efficiency gains for KV cache offload and AI model execution.

Visit XConn Technologies and MemVerge at SC25 to see how their CXL memory pool is transforming AI inference workloads.

Related News:

CXL Memory Pool Powers KV Cache in XConn and MemVerge AI Demo

ScaleFlux MC500 Powers XConn Partnership to Boost CXL 3.1 Interoperability

Share.

About Author

Taylor Graham, marketing grad with an inner nature to be a perpetual researchist, currently all things IT. Personally and professionally, Taylor is one to know with her tenacity and encouraging spirit. When not working you can find her spending time with friends and family.