A close look at the Arm Immortalis-G720 and its 5th Gen graphics

8 months ago 235

In addition to Arm’s 2023 CPU cores, we’re taking a deep dive into what Arm has built into its recently announced 5th Gen mobile graphics architecture that will inevitably power future high-end mobile games. Before getting into the fine details, Arm’s 2023 GPU architecture comes in three product varieties — the Immortalis-G720, Mali-G720, and Mali-G620.

Like last year’s Immortalis-G715, Immortalis-G720 is the flagship product designed with ray tracing capabilities in hand. The Mali-G720 and G620 sport the same architectural capabilities, just with fewer cores and no mandatory ray tracing for more affordable product lines. As in previous Arm GPUs, the graphics core count remains key to scaling performance. So expect to see the Immortalis-G720 in flagship chipsets, the Mali-G720 in the upper-mid-range, and the G620 in more budget-oriented products. The table below highlights the key differences.

Arm 5th-Gen GPUsImmortalis-G720Mali-G720Mali-G620
Arm 5th-Gen GPUs

Shader core count

Immortalis-G720

10-16 cores

Mali-G720

7-9 cores

Mali-G620

1-6 cores

Arm 5th-Gen GPUs

Deferred Vertex Shading?

Immortalis-G720

Yes

Mali-G720

Yes

Mali-G620

Yes

Arm 5th-Gen GPUs

Hardware Ray Tracing?

Immortalis-G720

Yes

Mali-G720

No (optional)

Mali-G620

No (optional)

Arm 5th-Gen GPUs

Variable Rate Shading?

Immortalis-G720

Yes

Mali-G720

Yes

Mali-G620

Yes

Arm 5th-Gen GPUs

L2 cache slices
(up to 1,024kb)

Immortalis-G720

2 or 4

Mali-G720

2 or 4

Mali-G620

1, 2, or 4

Key talking points with Arm’s 5th Gen architecture include a 15% performance per watt gain over the previous generation, 40% less memory bandwidth usage to save on power consumption, and twice the HDR rendering capabilities with 64-bit-per-pixel texturing. All this fits into a GPU core that’s just 2% larger than last-gen.

Arm 5th Gen GPU Improvements

The key to these eye-catching numbers is, in part, down to the adoption of Deferred Vertex Shading (DVS) in the GPU core, making it the heart of Arm’s latest architecture across all three products. Let’s get into how it works.

Deferred Vertex Shading explained

The long and short of DVS is that it reduces memory bandwidth usage, thereby saving on that all-important DRAM power consumption. This also frees up shared system memory to accommodate more complex geometry and also means a bigger power budget for potentially more GPU cores too. The examples Arm shared with us include 26% less bandwidth used in Fortnite up and 33% less bandwidth for Genshin Impact when compared to its last-gen GPU. The implication is that this is a valuable change for real-world games and not just benchmarks.

To accomplish this, Arm extended its long-running use of deferred rendering to delay vertex as well as fragment shading. Arm bamboozled us all with the following graphic to demonstrate how it all works, but we’ll walk you through it.

Arm Deferred Vertex Shading Graphic

First, let’s quickly recap the basics of a graphics rendering pipeline. Vertex rendering comes first, which involves morphing geometry and triangles (think creating water ripples). Next comes rasterization, essentially calculating which triangles can be seen and which “pixel” grid they fall into. Then fragment processing applies color (textures, lighting, depth, etc.) to finalize the frame. The deferred part of a rendering pipeline comes by waiting to do the fragment shading until you’ve culled all the out-of-view triangles. This avoids re-shading triangles multiple times compared to forward shading, which might run multiple lighting calculations on the same geometry.

So performance can increase, but so does the memory requirement to store the deferred data. It can’t all be held in cache-like forward shading, so it is put into an external vertex buffer. That can be costly in terms of power. It’s equally important to appreciate that Arm, like most other mobile GPU designers, uses tile-based rendering, splitting the render frame into much smaller tiles. This saves on local memory and increases performance as fewer pixels are rendered at a given time. However, deferred information must still be stored and returned from memory when it’s time for fragment shading, which consumes power and bandwidth.

The important thing is that DVS reduces memory bandwidth, improving power consumption.

However, if a triangle fits entirely into a small number of tiles, there’s scope to defer part of the vertex shading process until much closer to fragment shading. In this instance, vertex data kept in a local cache and processed closer in time to fragment shading. The result is far fewer memory reads and writes, and therefore a notable saving in power consumption. The smart thing about Arm’s implementation is that positional information is gathered as part of the tiling process, making it possible to cull triangles early and defer rendering if they fit in the tile. For larger triangles, forward vertex rendering is used and the data is stored in an external buffer. After all the triangles are processed, they are recalled from memory for rasterization and fragment shading.

Importantly, this feature is handled completely in hardware, saving memory bandwidth in certain scenarios (particularly models with very high geometry detail or many small distant triangles) without any input from software developers.

That’s a lot to take in (it’s taken me many tries). The key to understanding it is basically that, where possible, Arm’s 5th-Gen architecture holds off on vertex shading in addition to traditional fragment shading to cut down on costly reads and writes to memory, which saves power.

There’s even more to Arm’s 5th Gen graphics architecture

Gaming Phones test playing Call of Duty Mobile lobby screen

Robert Triggs / Android Authority

DVS is just part of Arm’s latest GPU architecture. Ray tracing support returns, of course, which is mandatory in the Immortalis branded G720. But there’s also now support for 2x Multi-Sampling Anti-Aliasing (MSAA), in addition to previously supported 4x, 8x, and 16x options. 4x MSAA has little overhead with tile-based pipelines, but Arm has seen that developers want to drive even higher frame rates in their games to improve fidelity. Hence it’s latest architecture supports 2x MSAA as well.

The latest GPUs also improve performance in 4×2 and 4×4 fragment shading rates used in VRS. A niche use case, to be sure, but one that will give the graphics core extra futureproofing for upcoming games.

At a deeper level, Arm supports implementing two power rails for higher core counts (six and above), enabling higher clock frequencies for the same voltage as before. Speaking of power, the G720 duo and G620 have additional clock, voltage, and power domain configuration options for fine-grain energy control.


So what does this all mean for next-generation smartphone graphics chips? Well, improved power consumption is the big gain, thanks to memory savings and other power improvements. That’s not just significant for battery life; it also means that Arm’s partners could increase their core count for additional performance while remaining within existing power budgets. Even if core counts don’t grow, that 15% typical energy saving can be put towards additional performance itself, which will translate to better frame rates in the latest high-end mobile games.

Read Entire Article