A tiny game built from scratch in the style of Minecraft
CPU based Path Tracer with a lot of features including:
Volumetric Rendering; Multiple Importance Sampling; BVH Acceleration; Multi-Threading;
Generated metaballs in real time using the marching cubes algorithm
A city built using shape grammar, that changes with every build
An L-System built in Maya, that operates on a variety of grammar rules.
A showcase of agent behaviours (group and solo)
C++, OpenGL
Monte Carlo Path tracing is a rendering technique that aims to represent global illumination as accurately as possible with respect to reality. When combined with physically accurate models of surfaces, accurate models light sources, and optically-correct cameras, path tracing can produce still images that are indistinguishable from photographs.
C++, Maya API, MEL
Group Project
A real-time simulation of particle-based rigid body dynamics based on the paper "Unified Particle Physics for Real-Time Applications" by Macklin, Muller, Chentanez, and Kim.
C/C++, CUDA
GPU Flocking Simulation written in CUDA
Tested on: Windows 10, i7-7700HQ @ 2.8GHz 32GB, GTX 1070(laptop GPU) 8074MB
C++, OpenGL, GLSL
Group Project
A tiny game built from scratch in the style of Minecraft.
Features I Implemented: Efficient Terrain Generation (interleaved VBO's and mesh hull drawing), general material property mapping scheme, weather, clouds, a day and night cycle, animated textures and more.
Javescript, WebGL
A procedural multi-layer dungeon generator that generates levels based on a dynamically generated voronoi like graph after it has been heavily modified by various filters.
Realistic fog, crumbling pathways, and terrain are added over the basic level layout to give it a unique mysterious foreign world aesthetic.
Javascript, WebGL
Simulation of 3D Metaballs, which are organic looking n-dimensional objects
C++, OpenGL
Monte Carlo Path tracing is a rendering technique that aims to represent global illumination as accurately as possible with respect to reality. The algorithm does this by approximating the integral over all the illuminance arriving at a point in the scene. A portion of this illuminance will go towards the viewpoint camera, and this portion of illuminance is determined by a surface reflectance function (BRDF). This integration procedure is repeated for every pixel in the output image. When combined with physically accurate models of surfaces, accurate models light sources, and optically-correct cameras, path tracing can produce still images that are indistinguishable from photographs.
Path tracing naturally simulates many effects that have to be added as special cases for other methods (such as ray tracing or scanline rendering); these effects include soft shadows, depth of field, motion blur, caustics, ambient occlusion, and indirect lighting, etc.
Due to its accuracy and unbiased nature, path tracing is used to generate reference images when testing the quality of other rendering algorithms. In order to get high quality images from path tracing, a large number of rays must be traced to avoid visible noisy artifacts.
Mario; 5k triangles;
With BVH at 1 sample/pixel: 4594 milliseconds;
Without BVH at 1 sample/pixel: 454041 milliseconds
Render Time Sped up 98.83347 times (or 9883.347% speed increase)
BVH Construction Time For Wahoo.obj: 156 milliseconds
If an implicit surface can be defined by an equation equating to zero, it can define a Signed Distance Field(SDF); I used SDF's during intersection tests to determine if we're inside, outside, or actually intersecting the object.
Thin Convex Camera Lenses can be mimicked by having rays shot out from the camera pass through some point sampled on the lens and then using this point to change the original ray based on the properties of the lens. The technique is just an approximation for the actual math and produces better renders with more samples/pixel.
Numerous Material models including Transmissive and Micorfacet material models were implemented. Each Material is defined by properties that give the path tracer a great deal of versatility and variety for composing scenes.
The differences amongst the 3 major integrators is pretty apparent on comparing the renders:
Naive Integrator: Renders scenes based on the Light Transport Equation, but implemented in a very naive sense, without any multiple important sampling, rusian roullete termination, etc.
Direct Lighting Integrator: Renders the scene without any global illumination by simply ignoring secondary bounces of the ray once it hits something in the scene.
Full Lighting Integrator: Renders the scene based upon the Light Transport Equation but takes into account Multiple Important Sampling, Russian Roullete Termination, Conditionals for various types of light sources, and is implemented in an iterative rather than recursive manner. All these features let it avoid blown out images that can occur in the naive integrator, keep global illumination, terminate rays early with very little impact on the render quality, etc.
A volumetric Integrator exists but it is essentially the full lighting integrator that will ray march through the scene in addition to everything else to account for the influence of the participating medium.
Cornell Box, 2 lights; WRONG!; No MIS;
Cornell Box, 2 lights; Correct!; With MIS;
The color at every point in the scene is correctly scaled via Multiple Important Sampling by accounting for the contribution of each incoming ray at that point with respect to the actual energy that the light rays bring in. This changes with the scene because it can depend on a lot of biases in terms of sampling that we introduce to get the image to converge faster. These biases include light important sampling, BSDF important sampling, consine weighted sampling, etc. Instead of relying on anyone of these biased forms of sampling we get the best result across all kinds of scenes by taking a weighted average of them.
Balance Heuristic
Power Heuristic
They are used to weight the energy contributions of rays to assist with Multiple Important Sampling. Power and Balance Heuristics are just different rules of thumb for the weighting
It is a technique employed to cut short the ray bounce depth once the energy being returned from the ray falls below a randomised threshold. The loss in energy is compensated by scaling up the existing energy value by an appropriate amount. When averaged over a bunch of samples, the loss in accuracy of the render is barely noticeable if at all.
The Path Tracer handles various primitives such as implicit surfaces, triangles for meshes, and the basic ones like cubes, spheres, discs, and squareplanes.
The sampler written for the path tracer is responsible for creating samples on the various primitives listed above. The samples generated for the shapes start from a uniform stratified distribution on a square plane which are then warped so as to resemble whatever shape we desire, also allowing for the introduction of biases such as cosine weighted sampling, in a controllable manner and from a common starting point.
Efficient Terrain Rendering:
Rendering only the hull of the terrain;
Using interleaved VBOs, i.e one VBO that holds all the data (position, color, normals, textures, etc);
Dividing the terrain into Chunks and generating the VBO for every chunk only once;
Generalized Scheme for Texture Mapping, Normal Mapping, Specular Mapping, and Animated Textures
Weather (Snow and Rain) and Cloud:
Snow and Rain weather effects were achieved via billboards that have a texture with transparency;
Volumetric Clouds, rendered in a way similar to the ground;
Day and Night Cycle:
Changing Sky Color;
Changing Light Color;
Changing Light Direction;
Changing Light Intensity;
Tiny Dynamic Compass
Javascript, WebGL
Metaballs are organic looking n-dimensional objects that are really popular for creating smooth and dynamic shapes. Each metaball is defined as a function having n-dimensions (in this case 3 dimensions). A threshold, also known as the iso-level is chosen to define the iso-surface formed from the combined influence of all the metaballs which defines the resulting shape.
Since the resulting iso-surface is defined by a density or influence function, the naive approach would be to evaluate that function for every point in space, where the resolution of the space would define the smoothness of the surface. This as one might expect is ridiculously expensive computationally speaking, but there are better techniques we can leverage:
To ease the computation involved, we voxelize the space up to a desired resolution and then for every voxel query the density function and use the reulting values to draw a 3D polygonal approximation of the surface. There are 256 (2^8) configurations for a voxel sampled at its 8 corners, however only 15 different configurations can be used to represent all 256 due to symmetry and with the help of rotations; and all of these configurations can be represented with upto 5 triangles/voxel.
The polygonal configurations above can be made a more accurate representation of the isosurface even at low resolutions by using the density values at the corners to find new points on the bounding box of the voxel to define the lines that make the triangles of the isosurface.
A 2D example showing how varying densities can effect the edge (isoline) that is formed.
The eight corners of the voxel can be represented by an 8-bit number, where 1 means the isovalue is above or below the isolevel based on some predefined threshold.
The twelve edges of the voxel can be represented as a 12-bit number, where 1 means that the isosurface intersects with this edge.
EDGE_TABLE: This table returns a 12-bit number that represents the edges intersected by the isosurface. For each intersected edge, we can compute the linearly interpolated vertex position on the edge according to the isovalue at each end corner of the edge.
TRI_TABLE: This table stores the triangle indices corresponding to the vertex positions on the edges identified above. Every 16 elements in the table represents a possible polygonizing configuration. Within each configuration, every three consecutive elements represents the indices of a triangle that should be created from the edges above.
Having implemented all of the above, our metaballs would look quite low poly at lower resolutions. An easy fix for this is to calculate the normal for all the points that makes the polygonal configuration of the voxel. This gives us per-vertex data for normals which when passed through a fragment shader results in a much smoother surface than before (it is the equivalent of the switch from having normals per face to having normals per vertex of the mesh).
Calculating the normal at an arbitrary point involves finding the gradient at that point. This can be expensive so a good approximation for the same is to sample neighboring points along the n-dimensions, and use their difference to approximate the slope.
A 1D representation would be:
For 3D density functions the normal would be represented as:
$$ \vec{n} = \begin{bmatrix} f(x+dx,y,z) - f(x-dx,y,z)\\ f(x,y+dy,z) - f(x,y-dy,z)\\ f(x,y,z+dz) - f(x,y,z-dz) \end{bmatrix} $$
Defined as the sum of the influences of all the metaballs within its vicinity, which have a inverse square fall-off in terms of intensity
$$Isovalue = \sum_{i=1}^{NumMetaballs} (metaballs[i].radius)^2/dist^2$$
The Marching cubes algorithm can achieve a effect similar to ray marching for rendering implicit surfaces, but in addition to the rendered image, you also retain geometry.
Common Use Cases:
In MRI scanning, to generate geometry for the scans.
To generate complex terrain for games. The resulting geometrical information can handily support collision and other physical calculation for game engines. For example, their bounding boxes can then be computed to construct the acceleration data structure for collisions.
The 'Interesting Level Generator' is a procedural multi-layer dungeon generator that generates levels based on a dynamically generated voronoi like graph after it has been heavily modified by various filters.
Realistic fog, crumbling pathways, and terrain are added over the basic level layout to give it a unique mysterious foreign world aesthetic.
Procedural 3D game level ( multiple stacks of 2D maps ).
Implicit surfaces to create terrain on the resulting map floors
Custom fog shader
The specifications below will be implemented if I have enough time in the final week
Collision handling so player can’t walk through geometry.
2D maps:
Using sampling and rejection to layout the slabs in a given space.
Then use a fake voronoi generation technique to create a graph. The fake voronoi technique consists of first starting with 3 connected nodes, and then for every new node you want to add to the graph you find the 2 closest neighbours from the existing graph and form edges between them.
We can improve the above technique a bit more by sorting the positions of the slabs along one axis. This makes the connections look more like a voronoi pattern.
This graph is highly connected so we randomly remove connections.
The graph can end up with intersecting edges in a few edge cases, so we carry out 2D line intersection tests and remove any intersections if they exist.
Walkways between slabs:
Because of the graph we created above we have edge and node data, in other words we know which points to connect. So between any 2 points we could lay out planes to connect them, but this is boring.
Instead we can use instancing to place many many tiny cubes along the line segment and then randomly remove cubes to make it look like a crumbling walkway.
We also need to give the walkway some width so we take the cross product of the direction of the line segment and the y axis (up) to get a horizontal ortho-normal direction for width. Add instanced cubes not just on the line segment but for a certain width for each line segment.
3D Level:
We can create multiple 2D maps at different heights.
Interlevel Connecting Paths:
This is a similar problem to the one we solved in “Walkway between slabs” section; But now it’s in 3D.
For every layer we pick a random node/slab as the starting point of our path between layers.
For the end point of that line segment we search through the nodes in the layer above for rooms that are beyond a certain distance from the randomly picked starting node (so paths don’t go straight up and there is more complexity in connections), and form a list of these “toNodes”.
Pick a random “toNode”, and using the random starting node we have a 3D line segment.
Create a similar instancing cubes setup for these paths as we did with the walkways.
Remove random lines, and also carry out 3D intersection tests and remove any intersecting paths, if they exist.
Path shifting for Paths:
To make the paths connecting walkways seem more organic and prevent janky looking paths that start at the center of each cell and end at the center of the other one ( this is a problem as a player can never go to a higher layer, they will be stuck underneath the “toNode” ).
We need to shift paths to the edges of the cells they are connecting. Simply offset by the width and length in the correct direction.
To add organic paths we should shift by both the width and length.
Fog:
Created in the shader, with global control of density, color, and a on/off switch.
A good approximation of fog fall-off is: e-(fogDensity2 x distance2)
Fog also appears to have different densities at the periphery of our vision, so we need to account for rimColor.
Resource: http://in2gpu.com/2014/07/22/create-fog-shader/
Terrain:
Terrain was created in the shader.
Create an elevation map and a moisture map using smoothed 2D noise with different seed values (or different noise functions).
Use the elevation map to deform vertices in the shader.
Create a moisture map ( similar to the elevation map ).
Use the float values from the elevation and moisture as uv values to determine colors from gradients.
Grid based Acceleration:
Takes too much memory, and so was never used for anything.
kD tree acceleration Structure for Collision Handling
Character with basic character controls
Collision handling for the character
Trampolines to fill up empty and negative space
Terrain: http://www.redblobgames.com/maps/terrain-from-noise/
Fog: http://in2gpu.com/2014/07/22/create-fog-shader/
"Art Of Collisions" is a real-time simulation of particle-based rigid body dynamics based on the paper "Unified Particle Physics for Real-Time Applications" by Macklin, Muller, Chentanez, and Kim.
Two reasons that real-time simulation of three dimensional objects has been difficult in recent years within the computer graphics industry are: 1. Having to create individual solvers for each object type, and 2. Simulating physical phenomena based off calculating forces. They focus mainly on accuracy of result rather than robustness and efficiency. We address some common problems with traditional particle-based methods and describe a parallel constraint solver based on position-based dynamics, a method that is efficient enough for real-time applications, and hence popular in computer games and interactive applications for its simplicity and stability.
A unified particle solver was too broad in scope to implement well in the month and a half we had. Thus, we implemented a good chunk of the frame work in a way that is easily extendable but works for only Rigid Bodies at the moment.
In recent years, To simulate objects accurately in real-time has usually meant that they must be simulated in isolation and required individual solvers for each object type. Particle based solvers can be easily unified as they work on a base object type.
Efficient real-time simulation of particle-based rigid body dynamics.
Built using a parallel constraint solver extended from position-based dynamics.
Represent rigid bodies and maintain particle configurations using shape matching constraints.
Unconditionally stable simulation based on semi-implicit methods to solve constraints.
Maya Interface has been made such that any arbitrary forces can be appended to the simulation along with multiple meshes, with a high degree of user control.
Framework for adding more constraints and extending the system to a unififed particle solver exists.
Intersection tests via ray marching were used to create particle filled representations of meshes.
Sphere instancing used to represent resultant particles.
Parallel constraint solver extended from position-based dynamics:
Shape Matching -- Represent rigid bodies and maintain particle configurations
Distance Constraint -- maintain Particle Separation/closeness
Semi-implicit Euler Integration for solving applied forces.
Callback functionality piggybacks on maya’s animation controller.
Redefining flags as output variables allowed for dynamic re-evaluation of the compute function, and hence updating of the simulation.
“Unified Particle Physics for Real-Time Applications”
By: Miles Macklin, Matthias Muller, Nuttapong Chentanez, Tae-Yong Kim
This project served as an introduction to CUDA kernels, how to use them, and how to analyze their performance. This was done by implementing the Reynolds Boids algorithm in parallel on the GPU.
Boids is a crowd simulation algorithm developed by Craig Reynolds in 1986 which is modeled after the flocking behaviors exibited by birds and fish.
The most basic version of the algorithm operates on three rules:
Cohesion: Boids will try to move towards the perceived center of mass of other boids around them.
Alignment: Boids will tend to steer in the perceived average velocity of other boids around them.
Separation: Boids will try to keep some amount of distance between them and other boids.
The naive approach for computing the new positions and velocities for the boids is to check each boid against every other boid and apply the three rules described above. This is extremely slow with a time complexity of O(N^2).
A simple way to implement this in CUDA is to have one thread for each boid. Each thread would loop over the entire position and velocity buffers (skipping itself) and compute a new velocity for that boid. Then, another CUDA kernel would apply the position update using the new velocity data.
It is quite obvious that a spatial data structure can greately improve performance for the algorithm. If we reduce the number of boids each boid needs to check against, we can decrease the amount of work each thread needs to do. Because the three rules only apply within a certain radius, organizing the boids on a uniformly spaced grid allows us to perform a very efficient neighbor search and only check a very limited number of boids. We choose our uniform grid cell width to be twice the maximum search radius. This makes it so that at most, a boid may be influenced by other boids in the 8 cells directly around it.
To create the "grid" we create an additional CUDA kernel which fills a buffer with respective grid cell indices. A parallel sorting algorithm, thrust::sort_by_key, is then used to sort boid indices by their corresponding grid cell indices. Furthermore, we store the "start" and "end" indicies for each grid cell in two new buffers by checking for transitions in the sorted grid cell buffer.
Now, instead of checking against all other boids, each thread in our velocity update can determine the boids grid index based on its position; this then allows us to determine the 8 neighboring cells in 3D space, and only apply rules for those boids that have an index between "start" and "end" of a neighboring cell.
We can make the uniform grid spatial structure better by making memory accesses more coherent. The uniform scattered grid implementation does unecessary hopping and "pointer"-chasing. Instead of using an additional sorted buffer to find the index into our particle data, we can shuffle the particle data so that it is also sorted by grid cell index. This allows the GPU to load in and cache our position and velocity data for the boids resulting in fewer global memory calls. This small change ends up making signficant performance improvements.
As would be expected, as the number of boids increases, all 3 approaches suffer slowdowns. However, the line graph above clearly shows drastic things. We can see that using the Uniform Scattered Grid greatly improves performance and then the Uniform Coherent Grid surpasses even that in terms of performance. Intuitively, this makes sense. Using the Uniform Scattered Grid we greatly reduced the number of boids that have to be checked against for each boid. Making the boid data more memory coherent allowed us to access data(memory) faster because of better caching and reduction in the number of calls to global memory. This made Uniform Coherent Grids Perform that much better.
Looking at the bar graph we can see that the blocksize really only effects one implementation, that is the Naive implementation beginning at a block size of 32. This might be because at 32 or fewer threads per block, there is only one warp (a group of 32 threads) in a block. These smaller blocks mean that we need a larger number of blocks. With every warp in its own block, we lose the performance benefits of shared memory within a block and instead need to allocate memory for each of our very many blocks.
It is curious that the blocksize doesn't affect Uniform Scattered and Uniform Coherent Grid implementations too much. At a blocksize of 8 however, looking at the datatable we can see a drop in framerate for both of them although it isn't as drastic as it is for the Naive implementation. My guess is that with even smaller blocksizes we would see worse and worse framerates for all the reasons described above, for all the implementations.
As can be seen in the table above increasing the number of neighboring grid cells you are checking while appropriately changing the width of the cells, results in a lower framerate. This is because we have essentially just increased the overhead for maintaining the grid without a significant reduction in the number of boids we check. If we manipulate the ratio of 'cell width' to 'the number of neighboring grid cells that have to be checked' we could possibly result in cases where the 27 neighboring grid cells would result in a better framerate.
If we zoom into to the first few data points, we can notice a sudden jump that almost doubles our framerate while increasing the number of boids. At first glance, this seems kind of absurd, but it is more likely that in some situations, the number of boids do not map well into the memory of the underlying architecture which leads to a frustrating reduction in framerate.
While trying to code this project, I ran into a ridiculous bug that made my Uniform Scattered and Uniform Coherent Grids perform significantly worse than my Naive implementation. I may have inadvertently coded up the worst-case scenario for both these methods. The issue arose because I had divided my boid's position along all three axes by the 'grid resolution'. And I then used that position to calculate an index into the gridcells. However, because I had divided by the 'grid resolution' the index that I obtained for every boid was the same index, i.e they were all in the same grid cell. This meant that I was basically running my Naive implementation inside one grid cell along with the overhead of creating and maintaining extra buffers for the grid structure, and doing a bunch of computation to determine which gridcells to look at. A stupid mistake but with mildly interesting insight.