Assignment 3: optimizing the rendering

Unity Development 2

The need for optimization

Introduction

In the assignment II, we have built the scene and growth animation of our trees. For this assignment, we chose to improve the tree generation speed and quality. In the quality part, the last assignment was lacking in properly UV mapping which was added together with the procedural generation of the mesh. In the other hand, the tree mesh generation performance was sub-par, in the sense that it was intended to be real-time, but sometimes it would fall bellow 1 fps (frame per second), while we aimed to be above 30 fps.

After some profiling and analysis we find out that rendering the final number of triangles (~1.6 millions) were not a problem, but generating them in C#. So the main idea was to reduce the triangle count without impacting the overall mesh quality, split the generation work among various threads and avoid generating everything at the same time. While doing so, we found out some constraints imposed by the unity engine design:

  • There is no direct access to mesh data (vertices, triangles and uv). Given that, every time we update the mesh we need to copy its data from our local C# script to unity mesh object.
  • When using threads to make calculations one can not call any method from unity. This imposes the need of allocating work memory for each object instead of per thread, as one need to calculate everything and after it, setting up unity mesh all together in the main thread.
  • Unity manual states that meshes with intensive updates should be marked as dynamic. While updating the meshes we had no gain using it, and after the growth animation had ended we had big losses(about half the fps) using it.

With all those points in mind, we reworked the original script, merged the generation of all trees in the same script instead of running one per tree, implemented the Level of Detail feature to reduce the triangle count, splitted the work among various threads and other minor tweaks to the C# code.

Generating the UV Mapping procedurally

UV Mapping

Before generating the UV mapping we choose a tileable texture to repeat over the branches. As the mesh is made by circles passing through the "root" lines with a varying diameter, the same is done to the mapping, the circle is mapped to the texture width while the height is mapped accordingly with the ratio between the circle diameter and the length of the segment.

The bellow figures, show the "before" and "after" result, of the texture together with the detail map and normal map in the shader.

Cutting down the triangle count

,

Level of Detail

In our implementation of tree generation, each child branch has a exponential decrease in its spanning length and its sub-nodes length. Given that, when it reach a certain level the detail of a sub-node may not be perceptible from a certain distance. With this idea in mind, the Level of Detail feature aims to define a minimum length which a detail may be perceptible, anything with a lower length will be merged in a bigger sub-node. The same idea can be applied to the angles between two sub-nodes. The only constraint to be satisfied is when those two sub-nodes have a child branch starting from their junction, if you merge them you have to reposition the child branch accordingly. In simple words, we get a tree which has #n control points, and re-sample it with #m < #n points, following the requested minimum length of its segment parts.

Bellow the three figures show how the details can be controlled. The left image has all possible details and is using 1,584,000 triangles, the center image has huge decrease in detail and is using 375,480 triangles, finally the right image is the minimum detail possible and is using 359,856. On our tests we are using something in the between of the first and second image (644,211 triangles).

* Maximum Detail Our Testing Detail Very Low Detail Lowest Detail
triangles 1,584,000 644,211 375,480 359,856
% of maximum 100,00% 40,67% 23,70% 22,72%

Splitting the calculation work

Multi-Threading

As pointed out in the introduction, the unity engine imposes some constraints when using threads and handling mesh data. To overcome those limitations, the C# code was adapted following them. The first one was doubling the memory and having to copy the data every time which a update was done to the mesh. This is far from optimal, with direct access to the mesh data, one could have good performance gains when working with procedural meshes/animations. Nevertheless, it was "good" enough to make it real-time. A probably better solution generating the mesh data would be using a custom tesselation shader, where we would pass the skeleton of the trees as parameters and let the shader generate the triangles. This approach would avoid the memory copy, but the problem remains somewhat the same, how to have low level access to unity, as a first inspection in unity documentation reveals that the most common custom shaders for tesselation uses built-in tesselation functions, which are not suitable for our purpose.

The implementation idea for the multi-threading is straightforward, besides specific implementation details, one allocate working memory for each tree, and let the threads process the data in their respective space. We choose the per-object thread as the level of granularity, because a single tree alone is not enough to be a bottleneck and choosing a thinner granularity (i.e. per tree branching thread) would not achieve more gains.

* Peak Time ST Peak Time MT Peak Time ST + LOD Peak Time MT + LOD
AVG Peak Time (ms) 975 375 476 168
Unity Overhead (ms) 215 169 100 74
% of Peak ST 100,00% 38,46% 48,82% 17,23%
ST vs MT speedup 2,60x
no LOD vs LOD speedup 2,05x
Combined speedup 5,80x

The test was made with a quad-core processor and the measures were made with the unity profiler. ST stands for Single-Thread, MT for Multi-Thread, LOD for Level of Detail and Unity Overhead for unity editor and rendering overhead. The rendered scene had 24 different trees having its growth animation running at the same time.

Last tricks and some thoughts about the results

,

Conclusion

One last trick, as not everything is only made with raw power, was to spread the tree growth animation over time to not have all trees achieving its peak calculations at the same time. With this, the peak time in milliseconds went down to 30, which is a 32,5x speedup compared to the single-threaded without LOD implementation. With everything together we are able to achieve the real-time performance without noticeable harm to the quality of the scene.

Our conclusion in general about the unity engine is that it's an interesting platform which tries to make it easier for developing and gluing things together without having to code in a low-level fashion. As nothing comes free, the current implementation hides almost everything from the developer, which in some cases, like ours, limits "custom ideas" to be done efficiently. Even so, we cant say it is bad at all, as everything which was done ran in the range of good enough and up.