Arm Delivers New Mobile Cores

2022-06-30 07:27:22 By : Mr. Maping DENG

Arm's VP of Client Products Paul Williamson and legendary game developer John Romero

As Arm positions itself for IPO next year, the company is positioning itself as a leader in mobile, which includes not just smartphones and tablets (its traditional markets) but increasingly Chromebooks and PCs as well. To that end, the company just released a new series of mobile cores for the class of 2022.

As a reminder, Arm doesn’t make its own chips, at least not yet. Licensees, such as Apple, Mediatek, Qualcomm, and Samsung, build chips using Arm’s intellectual property. Some companies such as MediaTek, use the CPU and GPU cores designed by Arm in their chips. MediaTek has been Arm’s lead partner for mobile designs and its Dimensity 9000 is based on Arm’s signature Armv9 platform. Other companies, such as Apple, just license the Arm CPU instruction set, but design their own differentiated cores.

While there were no announced design wins for the new cores, we expect to see them in silicon within a year and new products early next year. It should be noted that MediaTek sent a special video congratulating Arm on the announcement, so its fair to say they are bought in to Arm’s strategy.

Arm’s focus for mobile is increasingly around gaming – a category of content that stresses every aspect of the SoC design: CPU, GPU, caches, and main memory, as well as system power management. And a category where up to 52% of gamers play on mobile platforms.

Arm also believes its architecture will be the foundation of metaverse client development. And as such, it opened its event with a virtual meeting between legionary game developer John Romero and Paul Williamson, SVP & GM of Arm’s Client Line of Business

A big part of the new announcement focused on GPU features and new branding. The new edition of the Arm Valhall GPU architecture delivers up to 15% more energy efficiency and 15% better performance, but also has additional capabilities such as double the machine learning (ML) performance.

Arm's new GPU IP cores for 2022

While this is the fourth generation of the Valhall graphics architecture, Arm added new instructions for matrix multiplication (also call GEMM in some Arm literature) which can accelerate ML workloads running on the GPU. Arm also has added Variable Rate Shading (VRS) to reduce energy consumption and boost performance by optimizing graphics rendering focus the detailed rendering on the parts of a scene where the action is occurring. Parts of the scene are less critical, such as background scenery, are rendered with coarser pixel granularity, reducing rendering time and energy consumed.

The new brand to accompany the ray-tracing feature is “Immortalis.” The Immortalis G715 architecture adds hardware ray tracing acceleration and a higher (10-16) core count. The ray tracing hardware acceleration adds only about 4% extra area but yields a 300% improvement in ray tracing performance. This will be the first Arm GPU to support ray tracing, which is supported in the Vulkan graphics API.

The merely mortal Mali G715 has no ray-tracing accelerators and 7 to 9 cores. While the Immortalis and the premium Mali GPU cores have the same G715 alphanumerical designation, they have very different capabilities. This is a devious attempt to force everyone to use the new brand to differentiate the two offerings. A lower end Mali G615 with 1 to 6 cores is also available.

Arm continues to push CPU performance but balanced with power efficiency. Arm offers three types of cores that can be mixed and matched: ultimate performance, balanced performance, and efficient. Each can be thought of as a different gear for delivering the right performance and power core for the right workload.

Arm’s new CPUs offer different power vs. performance curves

The new ultimate performance core is the X3. The new balanced performance core is the Cortex-A715. The new efficient core is actually an improved version of last year’s Cortex-A510 efficient core.

Arm uses the three levels of performance to allow each core to be optimized for different workloads and/or power levels. The ultimate performance X3 core allows responsiveness and delivers peak single-thread performance for specific workloads. An example of such a workload is web browsing – it needs intensive, peak single-threaded performance for responsive behavior.

The balanced performance core, the A715, does a lot of the heavy lifting on performance workloads like gaming where sustained performance, thermal dissipation, and power efficiency all must be balanced.

The A510 efficiency core is good for running lightweight background tasks that don’t require a lot of performance. The operating system can offload background threads to the A510. For example, media playback/consumption focuses on energy efficiency and uses dedicated accelerators to handle the heavy lifting of the video decode while the A510 cores manage the overhead.

The Cortex-X3 has the most improvements and reaches new performance heights. Arm beefed up branch prediction for greater accuracy and lower latency. The front-end pipeline was shortened and improved to reduce instruction execution stalls (bubbles). The core instruction decoder width and the number of ALUs were increased. Data prefetching coverage and accuracy improvements allow more efficient execution. Overall, Arm achieved about an 11% improvement in instructions executed per clock cycle (IPC), but the cores also attain higher frequency operation compared with the X2 on the same manufacturing process node. The Cortex-A715 received similar improvements, but with a much greater focus on die area and power efficiency.

One optimization Arm has made to the new cores is removing the Arm 32-bit mode, called Aarch32. While the removals don’t save a lot of silicon, they do streamline the execution pipeline and permit higher clock speeds. There’s little need for the 32-bit mode in modern mobile applications. The A510, though, can be configured with Aarch32 because its often used in embedded and legacy designs that require this mode.

The CPU interconnect bus DSU–110 now supports up to 12 cores, which can be any mix of X3, A715, and A510 CPU cores. A typical design might use one X3 core, three A715 cores, and four A510 cores, but there’s no requirement for the number of different types of cores. Arm even hinted at a twelve-core design that mixes just ultimate and performance cores for desktop-level performance.

With up to 12 CPU cores, Arm’s new platform scales to new performance heights

Arm also has an optimized platform strategy call Total Compute Solutions (the 2022 version is TCS22) that optimizes the CPU, GPUs, Interconnect, and reduces DRAM access bandwidth requirements along with compute libraries to build a more efficient Arm platform. The goal is to build a better holistic solution.

Arm’s plan is to keep pace with the x86 vendors is by offering a new set of cores every year. Arm plans to add a set of new features to the entire Arm architecture every year and revealed its roadmap to 2024.

Arm Platform TCS Roadmap through 2024

For 2022, the company added matrix multiplication to GPUs for improved AI performance, the Arm Compute Library (ACL) optimizations, ability to update ArmNN (neural net libraries) and ACL via Google Play Services, and a new Cortex-M85 real-time controller with the Helium machine learning extensions. Arm will also bolster its security story with a new Privileged Access Never (PAN) mode.

As I hinted at the beginning of the article, Arm could be preparing to build its own silicon. When asked about whether Arm would build chiplets in the future during a recent question and answer session, CEO Rene Haas said it’s too early to talk about that, but he didn’t rule it out. You could imagine Arm building chiplets with CPU complexes that would allow faster time to market and quick customization.

The new Arm chips are designed to keep Arm at the forefront of mobile processing. While Arm often talks about competing with AMD and Intel x86 processors, the biggest competition in this market segment is actually other Arm licensees such as Apple and Qualcomm. Apple designs its own Arm cores and GPUs, which it has now scaled up beyond the iPhone and iPad to replace Intel x86 processors in the Mac lineup. Qualcomm also has its own GPU and bought the Arm processor design house Nuvia to build its own CPU cores to compete directly with Apple.

Is Arm moving fast enough to keep pace with its own customers? This question can only be answered when the new Arm IP appears in the final silicon, probably sometime later this year.

Tirias Research tracks and consults for companies throughout the electronics ecosystem from semiconductors to systems and sensors to the cloud. Members of the Tirias Research team have consulted for Arm, Intel, MediaTek, Nvidia, Qualcomm, Synopsys, and other companies throughout the mobile and IP ecosystems.