This Paper share is from Pixar's "RenderMan: An Advanced Path Tracing Architecture for Movie Rendering".
Introduction
RenderMan is a rendering engine developed by Pixar for rendering visual effects. The earliest RenderMan was based on Reyes' scanline rendering algorithm, and later added new features including ray tracing, subsurface scattering, and radiosity cache. The modern RenderMan has been rewritten as a path tracer, supporting bidirectional path tracing and real-time ray tracing. Path tracing was introduced around the same time as Reyes' algorithm (around 1986), when ray tracing was considered unlikely to be used for rendering movies, let alone real-time rendering—even though the algorithm was elegant, the performance cost was too high. However, with the development of technology over the past two decades, many problems have been overcome, leading to the emergence of real-time ray tracing.
Early RenderMan
Reyes Algorithm
The core steps of the early Reyes algorithm:
- Tiled-based rendering: The Reyes algorithm divides the image into small blocks (called "buckets"), then renders these blocks one by one. This tiled rendering approach allows loading only the geometry and texture data required for the current view when rendering each block, greatly reducing memory requirements.
- Micropolygons generation: The Reyes algorithm subdivides visible surfaces into smaller fragments, and eventually splits each fragment grid into micropolygons, which are typically pixel-sized. This not only provides high geometric detail but also facilitates the application of displacement maps.
- Shading and anti-aliasing: Each micropolygon vertex computes color and opacity through surface shaders, while anti-aliasing is achieved through distributed sampling of these micropolygons. Since this operates in 2D, the data has strong locality and anti-aliasing is relatively smooth.
- Motion blur and depth of field: The Reyes algorithm has advantages in efficiency and low noise when implementing motion blur and depth of field. It eliminates noise by increasing pixel sample count without additional shading computation cost.
Although Reyes' algorithm has excellent memory management capabilities, it is not good at handling reflections and shadows. Similar to traditional approaches, it still relies on shadow maps and reflection maps to create these effects.
Ray Tracing for Reflections
For better and more realistic reflections, RenderMan introduced ray tracing [Whitted 1980]. Ray tracing can produce high-quality shadows and ambient occlusion without maintaining shadow maps or worrying about shadow map resolution.
Ray tracing's memory access patterns are more complex than Reyes' algorithm: geometry can be very complex, especially when using displacement maps or high-resolution detail. Direct high-resolution ray tracing against all geometry would not only cause huge memory demands but also slow down rendering. Although primary camera rays, reflection rays from planar surfaces, and shadow rays toward small light sources have some access coherence, overall, reflection or shadow rays can be emitted in any direction at any time. Thus, access to scene geometry and textures becomes random. As a result, RenderMan made some optimizations for ray tracing geometry processing:
- Multi-resolution cache: Caching surfaces at different resolutions in memory and selecting appropriate resolution based on ray differentials.
- Parallel testing of rays against four triangles (two quads) for intersection.
Although ray tracing excels at fine effects, it has higher noise and computation cost when handling motion blur and depth of field; compared to Reyes, rendering costs for these scenarios remain high. Therefore, fully switching to ray tracing was not yet the best choice at that time.
Distributed Ray Tracing
RenderMan used it but not much: the cost was too high, only used for computing global illumination.
"Distributed" refers to having rays bounce multiple times between object surfaces to simulate diffuse reflection in the real world. However, directly shading all secondary rays is too expensive, so more efficient computation methods were needed. DreamWorks and RenderMan each made some improvements, but even with improvements it was still too slow.
Point-based Methods
To address these issues with distributed ray tracing, RenderMan introduced point cloud file-based methods: precomputing and storing lighting information before rendering to improve efficiency and reduce noise. Specifically:
- Before formal rendering, first compute and store direct lighting in the scene, saving it as Point Cloud files.
- By iterating through the precomputed point cloud, compute each point's contribution, then weight and accumulate these contributions according to scattering models (such as diffusion models) to generate final subsurface scattering, AO, or GI effects.
This method is indeed efficient with little noise, but the problem is many users were still dissatisfied with having to maintain point cloud files. Moreover, point cloud resolution and distribution affect rendering quality—if the point cloud is too sparse, it may cause other artifacts or aliasing.
Reintroducing Distributed Ray Tracing
New optimization: Radiosity Cache [Christensen et al. 2012]. Split shaders into view-independent parts (e.g., diffuse) and view-dependent parts (e.g., specular). Store view-independent results in the radiosity cache for reuse, while view-dependent parts are re-evaluated at each ray intersection. In the radiosity cache, expensive lighting computations and texture lookups only need to be done once and can be reused across multiple renders of the entire scene. This greatly reduces shading cost and improves rendering speed.
Although this method significantly improves efficiency, it still has some shortcomings:
- Lack of progressiveness: Distributed ray tracing results do not have progressive properties; quality cannot be gradually improved on initial render images.
- Cache consistency: Cache management may cause lighting result differences between frames, potentially causing flickering in animation sequences.
Volume Rendering
The Reyes algorithm did not originally include volume rendering; it was added later, using techniques similar to Houdini's renderer: dividing volumes into voxels, computing each voxel's color, opacity, etc., and rendering each micro-volume.
Limitations of Early Reyes Algorithm
Four main limitations mentioned in the paper:
- Inefficient handling of high-density geometry.
- No support for instancing.
- Low volume rendering efficiency.
- Difficult to scale beyond 16 threads on multi-core systems.
Overall, the second part of the paper demonstrates RenderMan's evolution from Reyes to path tracing, fully reflecting the tradeoff nature of CG: each stage's technical choices and optimizations reflect the constant pursuit of efficiency, performance, and image quality, ultimately leading to a more unified and flexible modern path tracing architecture that allows RenderMan to meet complex rendering demands in today's film production, while constantly trading off between time and space during optimization. Only when there are breakthroughs in graphics algorithms and hardware can many previously impossible techniques be realized.
Modern RenderMan
RenderMan's modernization focuses on path tracing because it provides a more unified, flexible rendering pipeline. Path tracing has the following advantages:
- Progressive and interactive rendering: Path tracing suits progressive and interactive rendering, quickly generating low-quality images for reference during editing and gradually improving quality.
- Handling geometric complexity: Path tracing can effectively manage geometric complexity through techniques like object instancing.
- Efficient denoising and sampling optimization: With improvements in denoising and sampling techniques, path tracing's noise and convergence speed issues have been resolved.
- Unified lighting model: Path tracing requires no additional preprocessing steps and can handle direct and indirect lighting in a single-pass rendering pipeline.
Additionally, its architecture is highly extensible—a product inspired by pbrt. Some key components:
- Material interface (bxdfs): RenderMan allows users to define custom materials called bxdfs, including various surface and volume scattering models. Users can access methods like EvaluateSample() and GenerateSample() through the API to implement custom lighting generation and evaluation.
- Light transport interface (integrators): RenderMan allows users to define their own light transport algorithms through the integrators interface. Integrators can manage ray generation and sampling in path tracing, such as bidirectional path tracing and volume lighting.
- Other plugin types: RenderMan also supports plugins for surface displacement, light sources, light filters, camera projection, and sample filtering; users can control various aspects of rendering through these interfaces.
This part is almost identical in implementation philosophy to pbrt, demonstrating the value of the pbrt book! A dual textbook for both industry and academia. Every graphics programmer should read it from cover to cover.
Material Interface bxdf
Two main interfaces:
- EvaluateSample(): Input: wi, wo. Output: rgb bxdf value, two probability density functions corresponding to forward and backward scattering.
- GenerateSample(): Input: wi. Output: wo (following some "random" generation method).
The paper mentions that current technical guidance makes creating materials much harder than before: one needs to know C++, graphics, probability theory, calculus, signal processing, sampling, optics... One can only say we feel the same.
RenderMan itself provides some pre-written bxdfs (basically material balls): e.g., Lambertian diffuse, ideal mirror, skin and hair under subsurface scattering. Advanced users can also define their own bxdfs.
Integrators
The integrator's main role is to (numerically) compute radiance. Two main interfaces:
- GetNearestHits(): Output: (version 1) returns HitInfo or (version 2) returns the shading group hit by the ray.
- GetTransmission(): Output: transmittance between two points (for volume rendering computation).
Scene Processing
As film imagery becomes more complex and scene complexity increases, the amount of data to process becomes enormous—some frames require dozens of GB. RenderMan adopts the following measures to handle scenes.
Handling Geometry
- RenderMan splits large meshes containing thousands of polygons into smaller parts to create BVHs with smaller leaf nodes, optimizing acceleration structures.
- RenderMan automatically selects appropriate geometry subdivision levels based on path differentials in the scene. High-resolution subdivision is used on objects near the camera, while lower resolution is used for objects far from the view or occupying small screen area.
- RenderMan uses instancing to efficiently manage large numbers of identical objects, like the sand grains in the short film "Piper," avoiding duplicate storage of the same geometry—only storing different configurations of transforms and materials.
Handling Textures
- RenderMan continues Peachey [1990]'s texture caching approach, improving multi-threaded lookup and MIP mapping support based on ray differentials, enabling efficient loading of required textures in complex scenes.
- RenderMan supports Ptex format, which allows independent MIP mapping per mesh face (though this format may cause cache issues).
- RenderMan also provides 3D brick maps [Christensen and Batali 2004], suitable for scenes requiring texture storage in volumes, such as clouds or smoke volume effects.
Handling Lighting
"Coco" has eight million (??????) point lights in one scene.
RenderMan optimizes lighting computation through subset sampling, avoiding sampling all light sources.
Additionally, combined with Multiple Importance Sampling (MIS) [Veach and Guibas 1995], it combines material and light source importance sampling results to reduce noise. For non-diffuse materials, RenderMan also applies Joint Importance (of 2 bxdf) Sampling to more accurately select light directions—recalling pbrt's MIS section reveals the elegance of joint importance sampling. Another appreciation of these genius mathematical discoveries in computer graphics.
Finally, RenderMan supports user-defined light filter plugins such as barn doors and gradients; these filters can dynamically adjust light effects in the scene.
Handling Parallelism
RenderMan's modern architecture is optimized for multi-core processing to run efficiently on modern processors with dozens of cores. RenderMan uses Intel's TBB (Thread Building Blocks) library to manage multi-threaded tasks, enabling better utilization of multi-core CPUs in high-complexity scenes. Additionally, the RenderMan team restructured computation tasks and eliminated bottlenecks, maintaining efficiency even on systems with up to 72 cores.
Data Processing
Surface Subdivision
RenderMan selects appropriate subdivision levels based on surface screen size, typically setting each micropolygon's target size to one pixel. It supports two subdivision methods:
- Full subdivision: Subdivides the entire surface when a ray first hits an object, ensuring adjacent surface fragments have matching vertices.
- On-demand subdivision: Subdivides only when rays need to access specific surface regions; this may cause subdivision mismatch at edges, potentially resulting in small visual defects (such as "pinhole" light leakage).
BVH
Large scenes certainly use BVH. In the BVH tree, each node contains a bounding box, and leaf nodes contain actual geometry. When traversing the BVH tree with a ray, only nodes relevant to the current ray need to be processed. To support asynchronous loading and dynamic geometry processing, RenderMan automatically rebuilds parts of the tree as needed when rays traverse the BVH.
Intersection Testing
RenderMan packs up to 64 rays with coherent direction into a ray bundle for parallel traversal in the BVH. At each leaf node, the ray bundle is tested for intersection with 4 to 8 triangular or quadrilateral surfaces; this batched operation fully utilizes vectorization for improved efficiency.
Additionally, RenderMan uses instancing to represent repeated objects, storing geometry data only once and managing different instances through transform matrices and material IDs.
When rendering fine geometry like hair and grass blades, RenderMan uses flat "ribbon" or cylindrical curve representations. Curves can be split into multiple segments with adaptive bounding boxes for acceleration. Rays are tested for intersection along curve segments to ensure rendering efficiency for complex structures like hair.
Numerical Precision
During ray tracing, RenderMan needs to adjust ray origin positions to avoid self-intersection errors caused by floating-point precision. RenderMan uses two offset methods:
- Geometry subdivision offset: When adjacent surface fragment subdivision is inconsistent, use offset to avoid light leakage at boundaries.
- Floating-point precision offset: For geometry far from the coordinate origin, ray origin offset must be larger than floating-point precision to avoid errors from numerical error.
Recall that in pbrt, ray definition includes tmin and tmax, used to determine intersection with the scene.
This is also a measure to prevent self-intersection and to avoid rounding error causing rays to intersect the surface they originated from.
The Perennial Motion Blur and Depth of Field
RenderMan implements motion blur by assigning a random time to each ray, while using stratified sampling of time within the same pixel to reduce noise. For depth of field, RenderMan uses 4D stratified sampling on pixel position and lens position to ensure clear, smooth depth of field effects.
Ray Differentials
RenderMan uses a simplified ray differential method, using only two floats to represent the ray's initial radius and the rate of change of propagation distance. This simplified ray differential computation is sufficient for scenes with multiple samples per pixel and reduces computation cost.
Progressive Rendering
Sample Sequences
RenderMan needs high-quality sample sequences to ensure progressive rendering efficiency and image quality. Common sampling methods include:
- Finite sample sets: Traditional jittered sampling and multi-jittered sampling belong to this category. These sample sets work well for final image rendering but in progressive rendering, initial sample quality may not be high enough, affecting real-time feedback for interactive rendering.
- Progressive sample sequences: This method generates an infinite-length sample sequence, with each level carefully distributed to ensure good distribution at every stage from few to many samples, suitable for progressive rendering. RenderMan uses progressive multi-jittered (pmj02) sample sequences with local shuffling for high-dimensional sampling, ensuring high-quality sampling in 3D, 4D, and even higher dimensions.
Adaptive Pixel Sampling
RenderMan automatically adjusts sampling density based on complexity in different image regions. Flat regions or empty backgrounds converge quickly, while high-frequency detail regions, penumbra, and highlights require higher sampling density. Key steps for adaptive sampling include:
- Contrast error measurement: RenderMan uses contrast to determine whether a region has converged; the contrast method is unaffected by lighting exposure, maintaining consistent noise levels under various exposure conditions.
- Adjacent pixel reset: Once a pixel's sampling result exceeds convergence criteria, re-sample that pixel and its adjacent region to more accurately capture visual detail.
- Threshold adjustment: During production, users can set stricter convergence criteria for specific objects and control sampling limits in dark areas to further optimize sampling efficiency when needed.
"Checkpoints"
Usually when we talk about progressive rendering, we mean a method that quickly generates low-quality images and gradually improves to high quality, especially suitable for interactive rendering needs. RenderMan's progressive rendering also supports "checkpoint" functionality—periodically generating viewable EXR format intermediate images during rendering.
Main advantages of checkpoints include:
- Checkpoint recovery: If rendering is unexpectedly interrupted, rendering progress can be resumed from a checkpoint, avoiding re-rendering the entire image.
- Time limits: Users can set maximum rendering time limits; RenderMan generates checkpoint images and safely exits after reaching the limit.
- Quick preview: Low-quality images generated at checkpoints provide artists with quick reference, helping timely adjustment of render settings.
Such rendering options are now supported by most rendering engines (Blender, V-Ray, etc.). RenderMan was ahead of the industry at that time.
Real-time Rendering
Path Selection
In path tracing, light path selection determines lighting computation efficiency and image quality. Choosing appropriate paths ensures rays preferentially pass through key regions, reducing unnecessary computation and improving rendering efficiency. RenderMan controls path generation and path sampling weight distribution through path selection and manipulation, enabling rays to more effectively capture important lighting information in the scene.
pbrt also has chapters introducing light paths, which can be expressed using Heckbert's regular notation. Although not critical here, RenderMan in some cases guides rays to preferentially explore regions with strong indirect lighting. For example, in complex scenes with indirect lighting and global illumination, guided paths can help rays find effective light sources, improving sampling efficiency (this should be Next Event Estimation, NEE).
Interactive Rendering
RenderMan introduced a lightweight scene graph to meet the rapid scene management needs of interactive rendering. This scene graph is designed to be streamlined for efficient handling of geometry, materials, and light adjustments in complex scenes without adding burden to the rendering pipeline. Lightweight scene graph features include:
- Fast updates and incremental changes: The lightweight scene graph allows local modifications in the scene without rebuilding the entire scene data. This incremental update is particularly suitable for interactive rendering, immediately reflecting lighting or material adjustments.
- Optimized memory management: The lightweight scene graph stores only necessary scene data, making memory use more efficient. This is crucial for interactive rendering, as complex scenes may contain large amounts of geometry and texture data; reducing unnecessary data storage improves response speed.
- Easy integration with other systems: The lightweight scene graph can integrate with scene data from DCC tools, making data conversion smoother and reducing conversion burden between different software.
It also provides the Riley low-level rendering interface, designed for interactive rendering with direct access to and control of the renderer. Riley allows developers to manage key elements of the rendering process through a programming interface, including geometry, materials, lights, and camera settings. Riley features include:
- Efficient scene operations: Riley supports direct, low-latency scene operations such as adding/removing geometry, modifying lights and materials. This enables artists to quickly get feedback when editing in real-time, improving workflow efficiency.
- Flexible rendering control: Riley provides rich API interfaces allowing developers to flexibly control rendering parameters and effects. For example, developers can use Riley to control sampling density for lighting and reflections, dynamically adjusting render quality to meet different interactive needs.
- Collaboration with lightweight scene graph: Riley works closely with the lightweight scene graph; data from the scene graph can be directly passed to Riley for rendering, achieving efficient data flow and faster render response.
These technologies have precedents from two previous research projects—Lpics and Lumiere. Lpics manages geometry, materials, lights, and other elements in the render scene, serving as a lightweight scene graph management system. Lumiere is a system focused on lighting management and setup.
Simplified Integrators
Using integrators that may not even need ray tracing can make image output extremely fast; this rendering approach can be used to quickly view albedo, depth, normals, etc. In particular, the Ambient Occlusion Integrator simulates the darkening effect in shadowed areas. The AO integrator generates simple shadow effects by detecting occlusion near object surfaces; this method has low computation cost but provides appropriate depth perception, often used for quick concept or test images.
Path Tracing
Integrated Techniques
Besides common optimization techniques (Russian roulette, NEE, etc.), RenderMan also has unique techniques such as setting per-object visibility (visible to camera, rays, shadow rays, etc.), disabling caustics, and setting initial bounce higher than 1.
Subsurface Scattering
Integrating Dipole Diffusion can achieve relatively stylized skin effects; Pixar later developed a brute-force path tracing random walk Monte Carlo method. Together they provide very realistic subsurface scattering rendering.
Handled using delta tracking. Rendering information descriptions (such as transmittance, scattering phase function, etc.) are basically read from OpenVDB or Field3D files.
Advanced Support
- Bidirectional path tracing: Advantages in indirect lighting and strong caustic scenes.
- Unified path sampling: Combination of bidirectional path tracing and progressive photon mapping. (The paper mentions the RenderMan development team believes photon mapping will certainly be part of the most advanced algorithms in the future, so RenderMan definitely supports photon mapping—recalling photon mapping development difficulties in pbrt, mainly needing to know how to randomly sample from light sources while also supporting the importance equation).
- Metropolis: Can find light paths that are hard to discover. But Metropolis isn't very useful yet; hopefully future papers will further explore it. When that day comes, RenderMan will support Metropolis.
References:
- Paper: https://graphics.pixar.com/library/RendermanTog2018/paper.pdf
