top of page

CPU Optimization

This chapter talks about some strategies and philosophy that we usually find helpful when we are sure that the bottleneck comes from CPU.

Spotting CPU Bottlenecks
Spotting CPU Bottlenecks
What is happening in a frame?

A frame is being shown on the screen as a collaboration of CPU and GPU. We may encounter different types of collaborations, but usually, in large engines like Unity and Unreal, CPU and GPU are working simultaneously (on different frames though, we will see).

 In Unity, a good abstract of CPU-GPU collaboration looks like something below.

Frame 185.png

Figure 3.1 Frame Anatomy

Specifically, Unity have two threads on CPU, the Main Thread (in charge of Engine and Script in the above diagram) and Render Thread (in charge of Render in the above diagram). While CPU is already preparing for the (N+1)-th frame content, GPU is still processing the Frame N: of course, since CPU has to prepare the data for GPU to render.

What can be the bottleneck on CPU?

With the above diagram, it's very easy to understand that: if anything happening on CPU stalls the entire game, it can only comes from one of the thread parts, engine, script or render. Since we can do little about the engine codes, we would need to check if the bottleneck comes from scripts or render thread. 

A main thread bound means that the main thread is still processing the data, and the process takes so much time that it cannot prepare the data for rendering in time. Usually, this indicates that some of the scripts we added to the Game Objects, or physics, or garbage collection, or UI, or animations... may need to be checked.

render thread bound means that there are too many things for the CPU to pack for the GPU. CPU send DrawCalls to GPU, asking the GPU to draw the things it sends. Basically that's it! We are having way too many DrawCalls, and all the things that result in an increase of DrawCall will need to be inspected: vertex count, batch count, mesh count, textures... 

Indicator that says we have a bottleneck at CPU?

Unity provides a powerful frame analyzer - the Unity Profiler.

image.png

Figure 3.2 Unity Profiler. You should be able to find it through Window > Analysis > Profiler.

Especially in Unity 6, we have some more good new features like CPU/GPU highlights on the top, which makes it even easier for you to decide if the bound comes from CPU or GPU.

Understand Your Budgets
Understand Your Budgets
Setup budgets

Budget - it means we need to respect the abilities of our platforms, and reasonably set up a goal: but how do we setup a goal? Typically we have two targets we want to have: Frame Rate (Frame Per Second, or FPS) and Frame Delta Time (the time between two frames). 

You may ask why bothering having two? Don't we simply want to keep the game as smooth as possible?

 

Take a unreasonably ideal scenario as an example. 60FPS implies average frame delta time equals 1000/60 ≈ 16.67ms. That means that if each frame takes less than 16.67ms, we can guarantee the 60FPS. However, let's say what if we have 59 frames that take 0ms to be finished but the 60-th frame takes 1000ms? In this case, the FPS is still 60FPS, but the player will need to suffer an extremely noticable lagging every second.

Therefore, not only we need to provide a target FPS for the game — we also need a target budget frame delta time.

VSync and Screen Tearing

In cases your game is well-optimized and runs much faster than your target FPS and frame delta time, what will happen? 

 

By default Unity won't set a fixed frame rate, but you can always setup this value by calling Application.targetFrameRate.  For example, if you are targeting at a frame rate of 30FPS (33.33ms average delta time), and say that a single frame takes 20ms to finish; in this case, Unity will wait until it reaches 33ms. You can see a WaitForTargetFPS marker the Unity Profiler. This marker indicates that we have no CPU-bound: everything's within the budget.

A similar case appears when you have a monitor that runs at a different refresh rate as your game. The mechanics to synchronize the application's frame rate with the monitor's refresh rate is called VSync. For example, if you have a high-end monitor that runs at 120Hz, and your game runs within 8.33ms budget, the engine will force the game frames to wait until it reaches 8.33ms for a single frame. VSync is an important technique to avoid an artiface called screen tearing.

image.png

Figure 3.3 Screen tearing caused by VSync Off.

To understand the reason why we may see screen tearing when VSync is off, we need to understand how the monitor is refreshed and the mechanics of how GPU is submitting the frames.

Monitor Refreshing

Traditionally, monitor refreshes by scanning-out: starting from the left-top corner all the way to the right-bottom corner, display the pixels line by line. Each refresh is called a Vertical Refresh. The period is the frequency of the screen (for example, 60Hz means that every 16.67ms a single frame is being scanned).

GPU Frame Buffer Submission

On the other hand, rendering results are stored in frame buffers. The monitor reads the result from the buffer and display the colors on the screen. If the GPU replaces the frame buffer with a new frame before the monitor display the complete frame, we will see the upper half of the screen comes from the previous frame, while the lower half comes from the new frame. This is called screen tearing.

Main Thread Optimization
Main Thread Optimization
Identifying Main Thread Bound

Main thread bound, as said above, means that CPU has not enough time to prepare the things for the render thread. 

The easiest way to identify a main thread bound is to find the WaitForCommandsFromMainThread marker in the profiler. It means that the render thread is ready, but you might be waiting for a bottleneck on the main thread.

Let's use an extremely expensive dummy script to demonstrate this issue. For example, we can place an empty GameObject in the scene, meaning that there's nothing to be drawn (well, in fact the camera still needs to draw something but this is already a minimal setup).  The following script, provided as a reference, can illustrate what an expensive CPU overhead can bring.

Now that with only this script attached to an empty object in the scene, let's look at what is going on in the profiler.

image.png

We can see a Gfx.WaitForCommandsFromMainThread in the Render Thread, which takes almost the same time as the CPU time. Also notice that in the Highlights on the top, CPU shows a noticable red color. 

From Hierarchy, we confirm that 99% of the time was consumed at ExpensiveScript.Update().

image.png
Tips to optimize your code

Scripts are the main possible reasons that you are seeing a main thread bound. Of course before anything you need to be fully familiar with the MonoBehavior life cycle. Once you understand the execution order of the Unity's frame loop, you can follow the checklist below.

  1. Minimize Update(), FixedUpdate() and LateUpdate()

  2. Avoid empty Update() and Start() (and any other MonoBehavior functions). 

  3. Avoid heavy logic in Start() and Awake(). 

  4. If you do not need to update every frame, consider using a counter and update every several frames. You can do this with the help of Time.frameCount

  5. Remove all the Debug.Log() statements. A better manner would be making a conditional attribute by adding a ENABLE_LOG symbol to the engine.

  6. Do not compare string using =.

  7. Understand how string works. Building, comparing or editing string at runtime can all be expensive. You may want to take a look at StringBuilder class.

  8. Avoid AddComponent<>() at runtime.

  9. Use member variable to cache objects and components. Avoid using local variables and GetComponent<>() at runtime, especially in Update().

  10. Use object pool.

  11. Store and pass values with ScriptableObject, instead of MonoBehaviour classes.

Beside Scripts

Beside scripts, Physics, Animation, UI, Garbage Collection can all be a potential reason for a main thread bound. It covers way too many topics. I will list some suggestions for each topics, but in real life projects, you will definitely meet many other situations.

Physics

  1. Simplify your colliders. Prioritizing sphere/box colliders to mesh colliders.

  2. In the player settings, check Prebake Collision Meshes whenever possible.

  3. Reduce your simulation frequency. Consider 30Hz on low-end platforms such as mobile phones.

  4. Decrease the maximum allowed timestep. 

  5. Reuse Collision Callbacks (i.e. OnCollisionXXX, OnTriggerXXX function calls). These callbacks take a collision/collider as an input parameter, and it will be allocated on the managed heap, which results in a garbage collection. To reduce the amount of garbage generated, enable Physics.reuseCollisionCallbacks

  6. ...​

Animation

  1.  Avoid using Animator to work with UI elements or any other simple values. Animators are intended for humanoid characters.

  2. Update only when visible.

  3. Use generic rather than humanoid whenever possible. 

  4. Use hashes instead of strings to query the animator.

  5. Ensure that animating hierarchies do not share a common parent, unless that parent is the root of the scene.

  6. Avoid using component-based constraints on deep hierarchies.

  7. Avoid scale curves; they are more expensive than translation and rotation curves.

  8. ...

UI

  1. Hide invisible UI elements.

  2. Disable Raycast Target whenever possible.

  3. Avoid using layout groups at anytime, especially don't use it for runtime construction. You can create a view using Layout Group, but use your code to disable this component after it sets up the UI.

  4. Lower the Application.targetFrameRate during a fullscreen UI or any static UI.

  5. Do not put all the stuff in a huge canvas; this case, if you update anything inside, the entire canvas will be forced to update. Try to divide into several canvases.

  6. Remove the default GraphicRaycaster from the top Canvas in the hierarchy. 

  7. ... 

Memory

​We will dive into more details in Chapter 9. For now, a summarized list of ideas include:

  1. Use Destroy() to remove unused objects.

  2. Set references to null when they are no longer needed.

  3. Apply object pooling.

  4. Whenever you are sure that a garbage collection freeze does not affect the game experience, manually trigger a System.GC.Collect

  5. Use the incremental garbage collector to split the GC workload. 

  6. Use Resources.UnloadUnusedAsset() to free up memory occupied by unused assets.

  7. Defer the loading of resources until they are actually needed.

  8. Be aware of string behaviours.

  9. Be aware of boxing behaviours! This can be a huge topic and can also cause problems that are very hard to be detected. Try to provide concrete overrides with the value type you want to pass in to avoid undesired boxing.

  10. Reuse coroutines.

  11. ...

Render Thread Optimization
Render Thread Optimization

​陶令恒

© 2025 by Lingheng Tony Tao. 

  • Facebook
  • Twitter
  • Instagram
  • Linkedin
bottom of page