AMD patents a chip GPU design quite different from Nvidia and Intel

Something to expect: AMD published its first patent on chip GPU designs. In typical AMD style, they are trying not to rock the boat. Chiplet GPUs are just beginning to emerge. Intel was straightforward about its development process and confirmed the use of chips in its discrete first generation GPUs. Nvidia, although modest about the details, has published several research articles on the subject. AMD was the last resistance – which only adds to the intrigue.

The chips, as the name suggests, are smaller and less complex chips, which should work together on more powerful processors. They are arguably the inevitable future for all high-performance components and, in some cases, the gift of success; The use of AMD’s CPU chip designs was brilliant.

In the new patent dated December 31, AMD describes a chip design designed to mimic a monolithic design as closely as possible. His hypothetical model uses two chips connected by an inactive high-speed intermediary called a crosslink.

A cross connection is between the L2 cache and the L3 cache in the memory hierarchy. Everything below it, like the cores and L1 cache and L2 cache, are aware of their separation from the other chip. Everything above, including the L3 cache and the GDDR memory, are shared between the chips.

This design is beneficial because it is conventional. AMD claims that computing units can access the low-level cache on other chips almost as fast as they can access the local low-level cache. If this is true, the software does not need to be updated.

The same cannot be said of the designs by Intel and Nvidia. Intel intends to use two new technologies, EMIB (built-in multi-die interconnect bridge) and Foveros. The latter is an active mediator that uses silicon pathways, something that AMD explicitly states that they will not use. Intel’s design allows the GPU to host a system-accessible cache that powers a new memory structure.

Nvidia has not revealed everything, but has indicated some directions that they can follow. A 2017 research article describes a four-chip design and a location-aware architecture and NUMA (non-uniform memory access). It also experiments with a new L1.5 cache, which exclusively stores remote data accesses and is bypassed during accesses to local memory.

AMD’s approach may sound less imaginative, but it also seems practical. And if the story proves anything, it is that the ease for the developer is a big advantage.

Below are additional diagrams of the patent.

Figure 2 is a cross-sectional view that descends from two chips to the circuit board. The two chips (106-1 and 106-2) are stacked vertically in the passive crosslink (118) and use dedicated conductive structures to access the crosslink traces (206) and subsequently communicate with each other. Conductive structures not attached to the crosshair (204) connect to the circuit board for power and other signaling.

Figure 3 describes the cache hierarchy. WGPs (workgroup processors) (302), which are collections of shader cores, and GFXs (fixed function units) (304), which are dedicated processors for single purposes, connect directly to a channel’s L1 cache (306). Each chip contains several L2 cache banks (308) that are individually addressable and also coherent on a single chip. Each chip also contains several L3 cache banks (310) that are consistent across the GPU.

The GDF (graphical data mesh) (314) connects the L1 cache banks to the L2 cache banks. The SDF (scalable data fabric) (316) combines the L2 cache banks and connects them to the crosslink (118). The crosslink connects to SDFs on all chips, as well as L3 cache banks on all chips. The GDDR memory bands (written as PHY Memory) (312) connect to the L3 cache banks.

For example, if a WGP on one chip required data from one GDDR bank on another chip, that data would be sent via an L3 cache bank, then across the crosslink to an SDF, then to an L2 bank, and finally , through a GDF for an L1 bank.

Figure 4 is an aerial view of a chip. It more accurately shows the locations and potential scales of various components. The HBX controller (404) manages the crosslink, to which the chip is connected by HBX PHY conductors (406). The small square in the lower left corner (408) is a potential additional connection for the crosshair to connect more chips.

Source