This note mainly covers mathematical concepts that may appear in rendering, CG, and Shader. It does not cover various basic mathematical concepts in full, and there are almost no formulas—it focuses more on conceptual understanding and possible applications in rendering.
Linear Algebra
Vectors
The basic concept of vectors will not be elaborated here. The following concepts are commonly used:
- Dot product: Because the dot product is computed as the product of the magnitudes multiplied by the cosine of the angle between them, if both vectors are unit vectors, the dot product result equals cosθ. Therefore, when discussing unit vectors, we can treat the dot product as a proxy for the angle.
- Cross product: An important property of the cross product is that the result is perpendicular to both operands. In a right-handed coordinate system, curl your fingers from vector a to vector b, and your thumb points in the direction of a × b. The cross product can be used to find normals and to determine triangle face orientation.
Vectors and points are both expressed as ordered pairs of numbers. In the case of 3D vectors/points, using pure mathematical notation alone, we cannot tell whether (1,2,3) refers to a point or a vector. We will later use homogeneous coordinates to resolve this.
Matrices
Again, conceptual details are omitted. The computation of matrix multiplication is also omitted. The following concepts are commonly used:
- Inverse matrix: Whether a matrix is invertible is determined by its determinant. If detM ≠0, then matrix M is invertible. In Shader we have ample third-party libraries to compute the determinant, so the calculation is not detailed here. The inverse matrix is used to apply the reverse of a transformation.
- Order of computation: For a transformation CBAv, we first apply A to vector v, then B, then C. Here, v must be a column vector. Note that in most cases composite transformations do not commute, so the order above cannot be rearranged.
- In the vast majority of cases, we scale first, then rotate, then translate.
Homogeneous Coordinates
Homogeneous coordinates add a fourth value w after the three spatial coordinates to enable computation, turning affine transformations into linear transformations. Homogeneous coordinates were introduced to handle non-linear transformations such as translation.
- Linear transformation: A transformation f satisfying both of these conditions is called linear: f(x) + f(y) = f(x+y) and kf(x) = f(kx)
- Affine transformation: Any transformation of the form f(x) = Ax + b is called an affine transformation.
Note: For points, the extra component w = 1; for vectors, w = 0. Thus, translating a point yields a point; translating a vector still yields a vector, and in practice has no effect. So points and vectors are formally distinct in homogeneous coordinate space.
Coordinate Space Transformations
In Shader, coordinate space transformations are perhaps the most important. Suppose we want to convert between coordinate spaces A and B.
To transform point x_A from coordinate system A to B, we use the transformation matrix MATRIX_A2B:
x_B = MATRIX_A2B * x_A
To transform point y_B from coordinate system B to A, we use the transformation matrix MATRIX_B2A:
y_A = MATRIX_B2A * y_B
This must be understood clearly. The transformation matrix is a transformation and must be placed on the left of the vector for left multiplication. The transformation matrix goes from the source coordinate system to the target coordinate system. Hence the results above.
In general, the transformation matrices for various coordinate systems we encounter in Shader are already built-in. We mainly need to know how many different coordinate systems we may encounter when writing Shaders.
Vertex Transformation Pipeline
In Unity and other computer graphics rendering systems, a vertex on a model must go through multiple transformations before it appears on screen and is rendered in a pixel.
In this process, it starts in model space, is then transformed to world space, then to view space, then to clip space, and finally to screen space.
Model Space
Model Space (also called Local Space) is the space in which the artist models. A certain point is chosen as the origin, and the positive directions of the x, y, z axes are defined. In Unity, when you select an object, the three axes and pivot that appear represent the model space axes and origin. Every point on the object has its own coordinates relative to this origin and these axes.
World Space and Model-to-World Transform
In general, world space is the "outermost" space in Unity. If a GameObject has no parent, its Transform Position is its world space coordinate.
Model-to-World Transform / Model Transform
The Model Transform (M) converts vertices from model space to world space. In Unity, the predefined variable _Object2World in UnityShaderVariables.cginc, when left-multiplied with model space coordinates, converts them to world space. We can abbreviate this matrix as the M matrix.
View Space and World-to-View Transform
View Space (also called Camera Space) uses the camera as the origin, with +x to the camera's right, +y upward, and +z pointing behind the camera. In other words, the camera's forward direction is the -z direction in view space, which means view space uses a different handedness than the previous two spaces. The first two are left-handed, while view space is right-handed.
World-to-View Transform / View Transform
The View Transform (V) converts vertices from world space to view space. In Unity, the predefined variable UNITY_MATRIX_V in UnityShaderVariables.cginc, when left-multiplied with world space coordinates, converts them to view space. We can abbreviate this matrix as the V matrix.
Clip Space and Projection Transform
Clip Space (P), also called homogeneous clip space, is used to cull primitives outside the view frustum and to clip primitives that intersect the frustum boundaries.
The View Frustum is the camera's visible range in Unity, bounded by 6 planes. The plane closest to the camera is the Near Clip plane, and the farthest is the Far Clip plane.
Unity has two rendering modes: in Orthographic mode, the near and far planes have the same size; in Perspective mode, the far plane is larger than the near plane.
As mentioned earlier, for a vertex, the homogeneous w component is 1. In Unity, you can left-multiply view space coordinates by UNITY_MATRIX_P to transform them from view space to clip space. Orthographic and perspective projection use different projection matrices. The key differences are:
- In orthographic projection, the w component is transformed by the projection matrix to -z, with different scaling applied to xyz.
- In perspective projection, the w component remains 1.
The perspective matrix changes the handedness of space from right-handed to left-handed, so the farther from the camera, the larger z becomes.
The perspective matrix depends on the camera's near and far plane positions, aspect ratio, and field of view.
Clip Space to Screen Space
The final step is to transform vertices to 2D screen space. This is done in two steps:
Step 1: Homogeneous Division
Transform x, y, z to x/w, y/w, z/w. The resulting coordinates are called Normalized Device Coordinates (NDC). The view frustum becomes a cube after this transformation.
Step 2: Screen Mapping
In Unity, after the first step, the x and y coordinates fall within [-1, 1]. We only need to map (x, y) to screen space where x ∈ [0, width] and y ∈ [0, height].
At this stage, the z component may be used for the depth buffer.
