TL;DR
- New Chip: NVIDIA announced the Vera CPU, an 88-core Arm processor purpose-built for orchestrating agentic AI workloads in data centers.
- Key Specs: Vera delivers 2.4x the memory bandwidth of its predecessor Grace, with 1.5 TB of LPDDR5X memory and Spatial Multithreading across 176 threads.
- Platform Integration: The chip anchors a six-component Vera Rubin platform powering the NVL72 rack, which NVIDIA rates at 14.4 exaFLOPS of FP4 performance.
- Market Context: Bank of America forecasts the data center CPU market will double from $27 billion to $60 billion by 2030, with AMD and Intel offering competing architectures.
- Availability: Vera-based systems from Dell, HPE, Lenovo, and Supermicro ship in the second half of 2026, with Meta, Oracle, and Alibaba among early adopters.
NVIDIA is shipping an 88-core CPU designed not as a general-purpose processor but as a dedicated orchestration engine for agentic AI workloads.
Announced at GTC 2026 on March 16, the Vera CPU is now in full production, purpose-built for agentic AI workloads. According to Bank of America, the data center CPU market will double from $27 billion to $60 billion by 2030, driven by orchestration demands that traditional processors cannot deliver.
Vera CPU Architecture
At the heart of the Vera CPU are 88 custom Olympus cores built on Arm v9.2. Each core uses a wide, deep microarchitecture with improved branch prediction and prefetching, and introduces Spatial Multithreading, a technique that runs two hardware threads per core by physically partitioning resources rather than time-slicing. Across all 88 cores, Vera provides 176 threads.
In practice, Spatial Multithreading also introduces a run-time tradeoff between performance and efficiency. Operators can tune CPU behavior dynamically for multi-tenant AI factory environments.
Rather than adopting a chiplet design, NVIDIA built Vera on a single monolithic compute die. Its second-generation Scalable Coherency Fabric connects all 88 cores to a shared L3 cache and memory subsystem, sustaining over 90% of peak memory bandwidth under load. By avoiding chiplet boundaries, Vera delivers consistent latency across its entire die, a design choice that contrasts with multi-chiplet competitors from AMD and Intel.
As a result, architectural gains over NVIDIA’s own prior silicon are substantial. Compared to its predecessor Grace, Vera delivers 2.4x higher memory bandwidth and 3x greater memory capacity.
Furthermore, memory support reaches 1.5 TB of LPDDR5X at 1.2 TB/s bandwidth using SOCAMM modules. According to NVIDIA, Vera delivers twice the efficiency and 50% faster single-threaded performance than traditional rack-scale CPUs, with 3x more memory bandwidth per core and 2x the energy efficiency of modern x86 processors.
Taken together, Vera’s combination of monolithic die, Arm compatibility, and AI-specific optimizations positions it as a fundamentally different product from the high-core-count x86 chips offered by AMD and Intel. Where competitors pursue general-purpose versatility, NVIDIA is betting that the next wave of CPU demand will come from workloads that prize memory bandwidth and orchestration throughput above all else.
Vera Rubin Platform and NVL72
The Vera CPU sits at the center of a broader six-chip Vera Rubin platform, which also includes Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch. All six chips were co-designed to function as a single system rather than optimized in isolation, reflecting NVIDIA’s philosophy that the data center itself has become the unit of compute.
At the top of the lineup is the NVL72 rack-scale system, housing 72 Rubin GPUs connected through 36 NVLink 6 switches in a full all-to-all topology. NVIDIA rates the NVL72 at 14.4 exaFLOPS of FP4 performance per rack.
Each Rubin GPU delivers 50 PFLOPS of NVFP4 inference performance and supports up to 288 GB of HBM4 memory, nearly tripling bandwidth compared to the Blackwell Ultra AI factory platform.
Moreover, NVLink 6 switches provide 260 TB/s of aggregate bandwidth per rack, with in-network SHARP compute reducing all-reduce communication traffic by up to 50%.
Binding these components together, the second-generation NVLink-C2C interconnect provides 1.8 TB/s of coherent bandwidth between Vera CPUs and Rubin GPUs, seven times faster than PCIe Gen 6. By enabling a unified address space across CPU and GPU memory, applications can treat LPDDR5X and HBM4 as a single coherent memory pool, a capability key for KV-cache offload in inference workloads.
Building on this integration, cable-free, fanless compute trays enable up to 18x faster assembly than previous designs, paired with direct liquid cooling throughout. NVIDIA built the platform on its third-generation MGX rack design, supported by more than 80 manufacturing partners worldwide.
A larger Vera Rubin NVL144 CPX rack configuration, building on the Rubin CPX for long-context inference, is also planned for the end of 2026.
For data center operators, however, the six-chip co-design approach raises the switching cost: once a facility standardizes on Vera Rubin racks, replacing any single component independently becomes impractical. NVIDIA is using tight silicon integration to lock in the full compute stack, from processor to network switch.
Why the CPU Matters for Agentic AI
Specifications gain context when set against a shift in how data centers allocate compute. Agentic AI workloads are reshaping data center economics as models scale from hundreds of billions to trillions of parameters and token generation grows exponentially. CPUs must orchestrate data movement across thousands of GPUs in real time.
Accordingly, upcoming AI frontiers require delivering tokens 15 times faster with 10 times larger models, according to figures NVIDIA shared at the keynote.
Jensen Huang noted during the presentation that NVIDIA has achieved 40 million times more compute in the past decade.
“CPUs are becoming the bottleneck in terms of growing out this AI and agentic workflow,” NVIDIA’s Dion Harris, head of AI infrastructure, told CNBC.
Creative Strategies analyst Ben Bajarin framed the opportunity in infrastructure terms:
“This is new infrastructure: Greenfield expansion of racks of CPUs whose only job is to run agentic AI. Your software is going to sit elsewhere, your accelerators are just going to run tokens, but something has to sit in the middle and orchestrate that.”
Ben Bajarin, Analyst at Creative Strategies (via CNBC)
In effect, Bajarin’s framing positions the CPU not as a passive host but as an active orchestration layer, a role NVIDIA designed Vera to fill from the ground up. A single 256-CPU Vera rack supports more than 22,500 concurrent CPU environments containing up to 400 TB of LPDDR5X memory, according to specifications shared at GTC.
In addition, NVIDIA claims a Vera CPU rack delivers 2x greater performance across scripting, text conversion, code compilation, and data analytics compared to a Grace CPU rack. Each Vera Rubin NVL72 rack can generate approximately $45 per million tokens, representing 10x more revenue from trillion-parameter models compared to Blackwell NVL72, according to figures shared at press briefings.
A Highly Competitive Market
Not all competitors agree that specialization is the right strategy. AMD sees the trade-off differently. NVIDIA has “optimized their chips very well, I think, for feeding their GPUs,” AMD data center head Forrest Norrod told CNBC, adding that the chips are “not well optimized for general-purpose applications.”
In contrast, Mercury Research data shows Intel still holds 60% of the server CPU market, with AMD at 24.3% and NVIDIA at 6.2% as of Q4 2025. AMD EPYC and Intel Xeon processors typically offer 128 cores, compared to Grace’s 72, though Vera’s 88 cores close that gap while focusing on AI-specific throughput rather than general-purpose compute.
Meanwhile, supply constraints beyond chip design may also slow adoption. Bajarin warned that CPU wafer capacity is constrained industry-wide, with delivery lead times stretching to six months for some customers.
Nevertheless, NVIDIA first announced the Vera Rubin roadmap a year ago at GTC 2025, part of a yearly AI chip release cadence the company committed to in 2024, and supports both Arm and x86 ecosystems through NVLink partnerships with Intel, Qualcomm, Fujitsu, and others. Vera enters production as AMD’s AI performance push intensifies, with hyperscalers building their own Arm-based CPUs, including Google’s Axion, Amazon’s Graviton, and Microsoft’s Cobalt, adding further competitive pressure from customers who are also potential rivals.
Despite these headwinds, early adoption signals are strong. Hyperscalers including Meta, Oracle Cloud Infrastructure, and Alibaba are among Vera’s first adopters, with Dell, HPE, Lenovo, and Supermicro building Vera-based systems.
On the software side, Cursor CEO Michael Truell said the company will use Vera CPUs to improve throughput and efficiency for its coding agent infrastructure, according to NVIDIA, while Redpanda reported 5.5x lower latency on Kafka workloads running on Vera. National laboratories including TACC and Los Alamos are deploying Vera for scientific computing, and cloud partners Cloudflare, Crusoe, Together.AI, and Vultr are also adopting the platform.
Looking ahead, NVIDIA struck a multiyear deal with Meta for its first large-scale standalone CPU deployment using Grace, and Vera is positioned to expand that footprint. Vera-based systems ship in the second half of 2026.

