Chapter 2 Architecture

Architecture

This chapter talks about some introductory knowledge you should have to start with CPU/GPU Architecture. I recommend you skip or skim thie chapter first - but you will need to come back for details in order to fully understand everything.

CPU Architecture

What is a CPU? Where is it?

CPU, i.e., Central Processing Unit, is the brain of a modern computer. It is usually a chip on the motherboard. If you ever assembled a desktop workstation, you know what I mean.

Figure 2.1 A Photo of A CPU (I find it on google, I don't know what model it is).

Every thing that can be considered as a computer should have a CPU: consoles, desktop, laptop, mobile phones... because CPU is tasked to explain and excute the instructions in a program.

What are the missions?

We want to look at the game programming context. A CPU (usually) is usually designed for completing general tasks in games, such as:

Game Logic: Physics, script execution...
Scheduling: coordinate with GPU, Memory, I/O such as keybords, network, disks...
System service: the Operation System runs on the CPU, it also determines the resources the game can use.

Anyways, as literally what it is called, it is central, and processing.

How are the missions completed?

To be concise, 4 steps.

Fetch: fetch instructions from memory.
Decode: understand the instruction by decoding.
Execute: run the instruction.
Write-back: store the results.

Now you may have the questions: so what is an instruction? When did I ever send the instruction?

That's right - you are the programmer, you write codes. Codes are translated into machine code: a string of binary bits. After you write codes (in whatever language, C++, C#, Python, Java...), they are compiled into assembly codes; these codes are stored in the memory. For example, an x86 instruction may look like

mov eax, 3

mov ebx, 4

add eax, ebx

These codes are translated from the following C code:

int a = 3;

int b = 4;

int c = a + b;

Its corresponding machine codes may look something like:

B8 03 00 00 00

BB 04 00 00 00

01 D8

So the CPU executes just as what I mentioned above:

Fetch: the CPU fetches the instruction (for example, B8 03 00 00 00 on the first line).
Decode: the CPU decodes B8, and understand it means mov eax, imm32.
Execute: the CPU writes 03 00 00 00 (which is 3) into the register eax.
Write-back: the CPU stores eax = 3.

And then it proceeds with the next instruction.

The above steps will loop forever for the instructions sent to the CPU.

Architecture

Now it comes to the most important (and might also be the most complicated) part, the hardware architecture of a CPU. I don't want to go way too deep at this part, as we only want to make things faster. To understand why there's speed difference, you need to know how the data flows.

Figure 2.2 A simplified diagram showing CPU architecture.

The above diagram shows a typical 4-core CPU architecture. Modern CPUs usually have multiple cores. In each core, we may have an L1 Data Cache (L1D), an L1 Instruction Cache (L1I), and a L2 Cache. And they have a common, shared L3 Cache. As what we can see from the diagram, L3 Cache is usually the "largest", and L1 is the "smallest"; on the other hand, L1 is usually the fastest, while L3 cache is the slowest.

Commonly asked question is, why do we want 3 levels of cache?

There are multiple reasons.

Firstly, the cache is made of SRAM. The larger size means that the word lines and bit lines are longer, a single access drives more capacity, it results in a rise of energy cost, which also results in a longer resistance capacity latency. By nature it means, the larger the cache is, the slower it would be.

Second, a high performance core usually need to send multiple load/save within a clock cycle. For example a CPU might have a frequency of 3 GHz, which means it has 3 billion clock cycles in a second. It has to be very fast, otherwise we can't get the result from the next clock cycle.

Therefore, we have two smaller but fast L1 Cache (L1D and L1I) and a little slower L2 Cache, with a slower but larger L3 shared Cache - we can also use the shared cache to exchange data among different cores.

What is special about modern CPU?

We as tech geeks spend a lot of money on selecting a performant CPU when getting a new computer. That means CPU does have performance differences among models.

Stronger Paralleling Ability

A modern CPU typically has multiple cores which can run simultaneously at some tasks. Powerful CPUs have more cores for computing.

Out-of-Order Instruction Execution

Early CPUs can handle 3-4 micro-ops (μops) at once, and the scheduling has to respect the order. Powerful CPUs can handle more μops, while we may have a different execution order. For example, the simple instruction

a = b + c;

d = e + f;

g = h + i;

will be executed one by one on early CPUs, while on modern CPUs, they might be scheduled to run simultaneously on different cores.

Larger Cache - Lower Latency

From what we know above, if we can guarantee the speed, the cache should be as large as possible. Modern CPUs have larger caches on all the levels. On the other hand, if we have a cache miss (which happens more frequently on smaller caches), we will need to fetch data from DRAM, which will result in a latency for roughly several hundreds of clock cycles.

Cleverer Branch Prediction

When CPU handles branching codes (for example, if/else, for-loop), it usually predict the results. If it makes a wrong prediction, that will waste tens of clock cycles. Modern powerful CPUs have better predictions.

Well, we definitely have more things to know about CPUs, and you may be interested in the architecture of some really famous modern CPUs, such as Apple M4 Max, Intel Core i9 14900 KF, etc. You should definitely look into more technical details about why they are so good - otherwise, these knowledge would be good enough for now; we may dive into more details when we look at CPU bottlenecks or mobile device optimization.

GPU Architecture

CPU vs. GPU: What are different?

Architecture

CPU Architecture

What is a CPU? Where is it?

What are the missions?

How are the missions completed?

Architecture

Architecture

What is special about modern CPU?

GPU Architecture

CPU vs. GPU: What are different?

​陶令恒

陶令恒