By Tim Symons, Storage Architect, Microchip
Gen-Z is a modern, memory-semantic fabric that enables communication to remote systems and pools of memory, accelerators, and storage. Memory is the most valuable resource within a server system and provisioning the correct amount of memory and bandwidth varies depending upon the workload and applications being executed. As new technologies introduce higher performance or greater capacity, upgrading can be expensive given system requirements to physically shut down and replace components to accommodate the new demand. So, how can you mitigate these expenses in a resourceful and seamless way?
Pooled and Disaggregated Memory
Pooled memory contains addressable memory devices that are connected to a memory fabric and can be allocated by the fabric manager to any unique host device. The flexibility of Gen-Z fabrics allows devices to be removed, replaced, and even upgraded on the fabric, and later reallocated by the fabric manager to where they are most needed. Rather than incurring upgrading costs, pooling resources on a memory fabric allows dynamic re-allocation of memory and other resources without disrupting system operations.
Figure A – Gen-Z fabric with pooled resources
Figure A (above) shows servers (top of the diagram) connected to a Gen-Z fabric which is also connected to pooled resources – including DRAM memory, Persistent (non-volatile) memory, accelerators, and SSD storage. The Gen-Z protocol initially identifies all devices available on the fabric and then a fabric manager allocates resources to be interconnected based on processing requirements, performance and security. Device telemetry gives the fabric manager details about the attached devices such as performance, latency, bandwidth, capacity and type of memory (e.g., persistent memory or DRAM). If an application requires more memory, then the fabric manager can allocate suitable devices from the resource pool. This is done without resetting the system or rebooting the processors. Security, domain management, error reporting and containment are all critical aspects that ensure the systems operate seamlessly at memory performant latencies.
Figure B – A pooled Gen-Z DRAM memory module (ZMM) allocated to an ARM SoC.
It is also possible to share bandwidth by allocating portions of the same pooled memory device, this is likely when there is a high capacity memory module (e.g., DIMM) in the pool.
Figure C – ZMM pooled device being shared with two ARM SoCs resulting in shared bandwidth for the memory Gen-Z interface
When a server’s application load changes unused resources may be de-allocated from the processor and returned to the pool. This helps to eliminate under-utilized or stranded resources, but also has the benefit that hardware can be updated to newer higher performance devices by adding or replacing them in the pool as required.
Another benefit of pooling memory is that it reduces the cost of memory for larger systems. For example, a server that requires 4TB of DRAM memory can be deployed with 32 128GB DIMMs directly attached to the processor with a memory cost of approximately $32 thousand dollars. Alternatively, using pooled memory, the server could have 16 32GB DIMMs directly attached and the pooled memory would have 112 32GB DIMMs with a memory cost of about $16 thousand dollars.
Along with pooled memory, Gen-Z also takes advantage of shared memory, an addressable memory device connected to a memory fabric that can be allocated by the fabric manager to one or more host devices simultaneously. A pooled device may be partitioned into address ranges that are unique for each host and a shared device allows two or more hosts to access the same overlapping memory space. Shared memory is memory that is shared by more than one processor at a time.
It is easiest to envision one piece of data that needs to be referenced by multiple operations, but not changed. An example could be for a shared database where different processors are filtering the data to find different information, or maybe a self-driving car that has different processing functions for analyzing video input e.g., one processor for detecting road surface for ice, potholes or dry conditions, another is looking at the same video but detecting obstacles such as pedestrians or parked cars, while another is analyzing speed, and road junction location etc.
Gen-Z provides equal access to memory for all compute functions which allows data to remain in one location rather than wasting time and power attempting to move it between compute engines. The Gen-Z protocol also enables peer-to-peer access, meaning that any device may be given access to communicate with any other device. Systems can be configured to share resources such as smart Network Interface Cards (NICs) and data may be shared to multiple destinations by creating shared resources.
Gen-Z: Always Adapting
The ability to leverage pooled and shared memory tactics with Gen-Z fabrics demonstrates how cost-effective and responsive this technology truly is, particularly since the demands of applications are constantly changing to support new resources. By utilizing pooled and shared memory methods, scaling and upgrading can be seamless for compliant devices, and new technologies can be added independently. As a result, end-users are able to pay as they go and grow as they need with confidence in their investment in Gen-Z fabrics.
Learn more about the capabilities offered by Gen-Z fabrics.