Metal #21 [Appendix] Compute Shaders

#GameEngine,#ComputerGraphics,#TechnicalArt

This note mainly covers the content related to Compute Shaders in Metal.

Using the GPU for Computation

Parallel Computation

The advantage of GPUs lies in parallel computation (parallelism). For thread-safe loops, we can transfer them from the CPU to the GPU for computation.

For example, the example used in Metal's official documentation:

Plain Textvoid add_arrays(const float *inA, const float *inB, 
			  float* result, int length) 
{
	for (int i = 0; i < length; i++) 
	{
		result[i] = inA[i] + inB[i];
	}
}

In this example, each element in the array is read independently, so the order of execution does not affect the calculation result. This type of loop is well-suited for parallel computation—each iteration of the loop can be assigned to a separate compute unit for processing.

Converting the above code to GPU code using Metal would look like this:

Plain Textkernel void add_arrays(device const float *inA, device const float *inB, device float *result, uint i [[thread_position_in_grid]])
{
	result[i] = inA[i] + inB[i];
}

The key changes Metal makes are highlighted above. Points to note:

The loop structure is gone. Now add_arrays is responsible for only one iteration, and the specific iteration tasks are distributed across different compute units.
The kernel keyword. kernel marks a public GPU function. Only public GPU functions can be seen by our application, but even public functions cannot be called by other shader functions. kernel also marks the function as a compute function—one that can be executed in parallel by independent threads.
The device keyword. device indicates that these pointers reside in device address space.
The [[thread_position_in_grid]] attribute. This uses C++ attribute syntax. This keyword declares that Metal needs to compute a unique index value i for each thread and pass it to the function as an argument.

Finding the GPU Device

We have already introduced this concept in the Metal#1 initialization section. We need an abstraction to represent our graphics device; otherwise device would be meaningless. In Metal, the GPU abstraction is MTLDevice. To review, the way to obtain the default GPU is:

Plain Textid<MTLDevice> device = MTLCreateSystemDefaultDevice();

Although macOS supports using multiple GPUs on a Mac, Metal will select one of them as the default device returned by this function.

Metal also provides abstractions for other related concepts such as shaders, buffers, and textures, returning them as objects. We only need to call the relevant MTLDevice methods.

Obtaining Metal Function References

The initializer first loads the function and prepares it to run on the GPU. When we compile our App, Xcode adds this function (such as the add_arrays above) to Metal's default function library and embeds it in the App. We can then use MTLLibrary and MTLFunction objects to obtain references to the Metal function library and the functions it contains. (We also mentioned this concept in Metal#1 Initialization.)

To obtain information about a specific function (such as add_arrays), we have MTLDevice create an MTLLibrary object, then ask that object to return an MTLFunction object representing the shader function.

Plain TextMe: Device, can you give me add_arrays?
Device: I'll ask the Library. (turning around) Library, give me add_arrays.
Library: Hello, here's add_arrays, packed for you in an MTLFunction wrapper.
Device: Ok. (turning around) Here, this is add_arrays.
Me: okk, thanks.

This is easy to understand. Converting the above into code gives us the method to obtain add_arrays:

Plain Text// Me: Device, can you give me add_arrays?
- (instanceType) initWithDevice: (id<MTLDevice>) device
{
	self = [super.init];
	if (self) {
	{
		_mDevice = device;
		NSError *error = nil;
		// Device: I'll ask the Library.
		id<MTLLibrary> defaultLibrary = [_mDevice newDefaultLibrary]; 
		// (turning around) Library, give me add_arrays.
		if (defaultLibrary == nil) {
			NSLog(@"Failed to find the default lib.");
			return nil;
		}

		// Library: Hello, here's add_arrays, packed for you in an MTLFunction wrapper.
		id<MTLFunction> addFunction = [defaultLibrary newFunctionWithName: @"add_arrays"];
		if (addFunction == nil) {
			NSLog(@"Failed to find the adder function.");
			return nil;
		}
		// Device: Ok. (turning around) Here, this is add_arrays.
		// Me: okk, thanks.
}

Rendering Pipeline Modifications

In Metal#2 Rendering Pipeline we have already introduced concepts related to the Metal rendering pipeline. Here we review the relationships between the following concepts: pipeline, pipeline state, command queue, command buffer, command encoding, and pass.

Reference:

https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu

Plain Textvoid add_arrays(const float *inA, const float *inB, 
			  float* result, int length) 
{
	for (int i = 0; i < length; i++) 
	{
		result[i] = inA[i] + inB[i];
	}
}

Plain Textkernel void add_arrays(device const float *inA, device const float *inB, device float *result, uint i [[thread_position_in_grid]])
{
	result[i] = inA[i] + inB[i];
}

Plain Textid<MTLDevice> device = MTLCreateSystemDefaultDevice();