Breakthrough CXL Memory Solution Targets AI Inference Workloads

As organizations rapidly adopt large language models (LLMs), generative AI, and real-time inference applications, memory scale, bandwidth, and latency have become critical bottlenecks. XConn Technologies and MemVerge announced a joint demonstration of a Compute Express Link® (CXL®) memory pool designed to overcome the AI memory wall. The live demonstration will be held at Supercomputing 2025 (SC25) in St. Louis, November 16–21, 2025, at booth #817, stations 2 and 8.

Academic and industry analysts agree that memory bandwidth growth has lagged far behind compute performance. While server FLOPS have surged, DRAM and interconnect bandwidth have scaled much more slowly, making memory the dominant bottleneck for many AI inference workloads. Experts warn that AI growth is already hitting a memory wall forcing a rapid need for memory and interconnect architectures to evolve. The memory-intensive nature of retrieval-augmented generation, vector search, agentic AI, and large language model inference is pushing traditional DDR and HBM-based server architectures to their limits, creating both performance and TCO challenges.

“As AI workloads and model sizes explode, the limiting factor is no longer just GPU count, it’s how much memory can be shared, how fast it can be accessed, and how cost-efficiently it can scale,” said Gerry Fan, CEO of XConn Technologies. “Our collaboration with MemVerge demonstrates that CXL memory pooling at 100 TiB and beyond is production-ready, not theoretical. This is the architecture that makes large-scale AI inference truly feasible.”

Have an announcement? Click to submit:

To address these challenges, XConn and MemVerge are demonstrating a rack-scale CXL memory pooling solution built around XConn’s Apollo hybrid CXL/PCIe switch and MemVerge’s Gismo technology, optimized for NVIDIA’s Dynamo architecture and NIXL software stack. The demo showcases how AI inference workloads can offload and share massive KV cache resources dynamically across GPUs and CPUs, achieving greater than 5× performance improvements compared with SSD-based caching or RMDA-based KV cache offloading, while reducing total cost of ownership. The demo particularly shows a scalable memory architecture for AI inference workloads where there is a disaggregation of prefill and decode work stages.

“Memory has become the new frontier of AI infrastructure innovation,” said Charles Fan, CEO and co-founder of MemVerge. “By using MemVerge GISMO with XConn’s Apollo switch, we’re showcasing software-defined, elastic CXL memory that delivers the performance and flexibility needed to power the next wave of agentic AI and hyperscale inference. Together, we’re redefining how memory is provisioned and utilized in AI data centers.”

As AI becomes increasingly data-centric and memory-bound, rather than compute-bound, traditional server architectures can no longer keep up. CXL memory pooling addresses these limitations by enabling dynamic, low-latency memory sharing across CPUs, GPUs, and accelerators. It scales up to hundreds of terabytes of shared memory, reduces TCO through better utilization, reduces over-provisioning and enhances throughput for inference-first workloads, generative AI, real-time analytics, and in-memory databases.

SC25 attendees can experience the joint demo featuring a CXL memory pool dynamically shared across CPUs and GPUs, with inferencing benchmarks illustrating significant performance and efficiency gains for KV cache offload and AI model execution.

Visit XConn Technologies and MemVerge at SC25 to see how their CXL memory pool is transforming AI inference workloads.

ScaleFlux MC500 Powers XConn Partnership to Boost CXL 3.1 Interoperability

Submit your AI news today:

Vaughn and Parrish Joins Keeper to Strengthen Federal Cybersecurity Leadership

Auvik Integrates with ServiceNow to Streamline CMDB and Network Management

Auvik Integrates with ServiceNow to Streamline CMDB and Network Management

Hospital CISOs Struggle to Secure Networked Medical Devices

Survey Highlights Errors in Data Feeding Across U.S. Organizations

KSGC Achieves FedRAMP High Authorization for Critical Federal Systems

Breakthrough CXL Memory Solution Targets AI Inference Workloads

About Author

Vaughn and Parrish Joins Keeper to Strengthen Federal Cybersecurity Leadership

Auvik Integrates with ServiceNow to Streamline CMDB and Network Management

Auvik Integrates with ServiceNow to Streamline CMDB and Network Management

Hospital CISOs Struggle to Secure Networked Medical Devices

Survey Highlights Errors in Data Feeding Across U.S. Organizations

KSGC Achieves FedRAMP High Authorization for Critical Federal Systems

About Author

Related Posts