Arm Announces The Mali-G78 GPU: Evolution to 24 Cores
by Andrei Frumusanu on May 26, 2020 9:00 AM ESTToday as part of Arm’s 2020 TechDay announcements, alongside the release of the brand-new Cortex-A78 and Cortex-X1 CPUs, Arm is also revealing its brand-new Mali-G78 and Mali-G68 GPU IPs.
Last year, Arm had unveiled the new Mali-G77 which was the company’s newest GPU design based on a brand-new compute architecture called Valhall. The design promised major improvements for the company’s GPU IP, shedding some of the disadvantages of past iterations and adapting the architectures to more modern workloads. It was a big change in the design, with implementations seen in chips such as the Samsung Exynos 990 or the MediaTek Dimensity 1000.
The new Mali-G78 in comparison is more of an iterative update to the microarchitecture, making some key improvements in the matter of scalability of the configuration as well as balance of the design for workloads, up to some more radical changes such as a complete redesign of its FMA units.
On the scalability side, the new Mali-G78 now goes up to 24 cores in an implementation, which is a 50% increase in core count compared to the maximum MP16 configuration of the Mali-G77. To date, the biggest configuration we’ve seen in the wild of the G77 was the M11 setup of the Exynos 990, with MediaTek employing an MP9 setup.
In a projected end-device solution comparison between 2020 and 2021 devices, Arm is projecting the new Mali-G78 to achieve 25% better performance, which includes both microarchitectural as well as process node improvements. That’s generally the reasonable target that vendors are able to achieve on newer generation IPs, but it’s also going to be strongly depending on the exact process node improvements that are projected here – as GPUs generally scale better with improves process density rather than just frequency and power improvements of the silicon.
At an ISO-process node under similar implementation area conditions, the Mali-G78 is claimed to improve performance density by 15%. This is referring to the either performing 15% better at the same area, or shaving off 15% area for the same performance, given that this can be done linearly by just adjusting the amount of GPU cores implemented.
Power efficiency sees a more meagre 10% improvement, which honestly isn’t too fantastic and not that big of a leap to the Mali-G77. ML performance is also said to be improved by 15% thanks to some new microarchitectural tweaks.
Seemingly, the Mali-G78 doesn’t look like too much of an upgrade compared to the vast new redesign we saw last year with the G77 – and in a sense, that does seem somewhat reasonable. Still, the G78 does some interesting changes to its microarchitecture, let’s dwell a bit deeper into what’s changed…
36 Comments
View All Comments
tkSteveFOX - Wednesday, May 27, 2020 - link
Apart from MTK and Huawei most will drop using Mali Cores as the architecture doesn't scale well at all.Anything over 7-8 cores and you start to lose performance and get the consumption up.
When Samsung finally unveil their RDNA powered GPU, even Apple's cores might lose their crown.
I doubt it will be very power efficient though, just like Apple's.
lightningz71 - Wednesday, May 27, 2020 - link
Haven't the various mobile RISC cores gotten very close to hitting the wall with respect to memory bandwidth? Feeding the G78 in a full-house config with enough data to allow it to reach it's full throughput potential would require fairly massive amounts of RAM bandwidth. All that bandwidth will require some very wide channels and a lot of memory ICs on the phone motherboards, or, it'll require some quite power hungry HBM stacks. At best, we get a couple of channels of low power DRAM that spends as much time as possible in low power mode. I just don't see it being very useful on a mobile device. At the very best, if it's used in an ARM Windows laptop, and if it gets a solid memory subsystem attached to it, it MAY be competitive with other iGPU solutions available in the market. However, once you go down that road, you have to ask yourself, is it worth putting that many resources into the CPU and its memory subsystem when there are available low power dGPU solutions out there that will still run rings around it in performance and not cost any more per unit to integrate into your solution? Even if it costs a bit more power to do so, in a laptop, you have a much larger form factor and much larger power budgets to play with.ballsystemlord - Thursday, May 28, 2020 - link
Spelling error:"The core's cache shave also had they cache maintenance algorithms improved with better dependency tracking,..."
"the" not "they":
"The core's cache shave also had the cache maintenance algorithms improved with better dependency tracking,..."
Lobstermobster - Saturday, June 6, 2020 - link
How can we compare this new mobile GPU to others made by Qualcomm, Nvidia and Imagination? How many teraflops do these mobile GPUs have? I know the Switch uses a Tegra chip that can go up to 1 teraflops in dock modeiphonebestgamephone - Sunday, June 7, 2020 - link
Whats the use of knowing the flops anyway.IUU - Friday, October 2, 2020 - link
"Whats the use of knowing the flops anyway." I believe it is one of the most important metrics to know. Because a chip will always perform a certain percentage of its theoretical performance, often about 60 to 70% of theoretical. So , if a chip's theoretical performance is say X5 compared to another chip, no-one can fool you with the usual nonsense, "yes but it is real world performance that matters" . Because a x5 theoretical performance wins hands down in real world scenarios, no matter what marketing gimicks would want you to believe.That said , just consider , the modern fashion of hiding details about architecture , of a lot of companies, lately even by Intel, and you will see , there is an effort to go by marketing only to hide potential weaknesses.