Optimizing a scene to increase its frame rate can be a difficult process to get right, due to the number of aspects that contribute to the final rendered image.
It can even be hard just to find the right settings to change for the result you want, as some of the new render pipeline settings may seem hidden. This article explores ways to find bottlenecks in your scene in Unity, the categories they fall into, and some common approaches to remove or reduce them and improve your scenes performance and framerate.
The aspects considered in this article are:
- General Optimization
- Fill Rate Optimization
- Vertex Throughput Optimization
- Batch / Set Pass Call Optimization
When profiling a scene, it is best to first establish your baseline metrics, recording the current statistics on those aspects that cause the most cost to performance, such as milliseconds, number of batches and the number of SetPass calls. Placing a camera in a stationary position throughout the optimization process will help achieve consistent measuring statistics to allow you to compare how your fixes have impacted performance.
Once you have established the worst aspects of the scene contributing to the performance degradation, it is recommended to fix these first before moving on to the smaller contributors. When fixed, you can iterate the process and find the next biggest contributors, continuing until you reach your performance target.
When establishing where the bottlenecks are, the main categories that impact scene rendering are:
- Fill Rate - the amount of pixels being processed by the scene.
- Vertex Throughput - the amount of vertices being processed by the scene.
- Batches / SetPass Calls - groups of data that get sent to the GPU for rendering, i.e. meshes and materials.
Simple Fill Rate Profiling
Fragment shader operations are tied to fill rate as they contribute to the final pixel of an object. A relatively quick way to discover if your scene is fill rate limited is to reduce the display resolution. If the scene renders faster with a reduced resolution, this indicates that it may be limited by the fill rate on the GPU. This can be done in editor as well as on mobile platforms. For desktop testing in-editor, a list of premade resolutions appears in the dropdown window under ‘Free Aspect’ in the Game window. A custom resolution can also be added to match native resolution of mobile devices.
For mobile devices, there are a couple of areas in the project settings that change the resolution of the project for a build. The first couple of settings are located under Edit->Project Settings->Player->Resolution Scaling.
You can set a fixed resolution for mobile devices by setting Resolution Scaling Mode to Fixed DPI and entering a custom DPI in the Target DPI field. The DPI of your mobile device can be set there, but you can also set it to a value lower than the device’s native DPI to render the scene at a lower resolution.
Another setting relating to Fixed DPI is located under Edit->Project Settings->Player->Quality-> Resolution Scaling Fixed DPI Factor. This setting is a multiplier for the previously mentioned Target DPI field.
A value of 1 will result in the same setting as the Target DPI, however a value of 0.5 will scale the Target DPI by a half. In the example, the Target DPI is set as 400, and the Resolution Scaling Fixed DPI Factor set as 0.5, so the resulting resolution would be 200 DPI.
Simple Profiling of Vertex Throughput
When reducing the display resolution to test for fill rate, if the FPS rate doesn’t increase significantly, you may be vertex or set pass call bound. When drawing a scene at a reduced resolution you are still processing the same vertex count and setting the same number of draw calls.
Profiling Batches / SetPass Calls
Batching refers to the number of groups of objects that the CPU sends for processing / rendering to the GPU. A SetPass call is a change of state between the batches being sent to the GPU. When a new batch with different data required for rendering than the previous batch is sent to the GPU, the GPU needs to receive the new information on how to render it. For example, if you had a blue cube followed by a red cube in the scene, the GPU would need to change its current instructions on how to render a red cube to the instructions for rendering the blue cube. The more batches and SetPass calls required, the more work the CPU and GPU must do.
Batches and SetPass calls can be profiled by adjusting the field of view as well as turning game objects on and off whilst playing in editor.
For more detailed statistics of which areas of your scene are costly to performance, Unity provides the profiler and the frame debugger.
Using Unity’s Profiler
This can be located under Window -> Analysis -> Profiler. The profiler is a tool used for capturing all the events executed that in the game over several frames.
To understand the loop, refer to following:
To the left of the profiler is a color-coded toggle of features in the scene that take up milliseconds. You can turn them on and off to see how it affects the overall performance.
At the bottom of the window is the Hierarchy view giving an overview of the total milliseconds used when executing each function in the different loops that Unity uses.
The Hierarchy view can be swapped out to display a Timeline view, showing the main functions executed with their millisecond timings, in order, within a single frame.
Using Unity’s Frame Debugger
This can be located under Window -> Analysis -> Frame Debugger.
The Frame Debugger shows the individual draw calls used by Unity to build up the final frame. These can be stepped through to show how certain images are drawn together and which components require more draw calls than others. The less draw calls the scene contains, the better.
Certain Post Processing features require more draw calls, so checking this section on the Frame Debugger early on may give a hint to where many of the resources are being allocated.
Using the Free Asset Resource Checker
The Resource Checker is a helpful tool that provides a summary of the resources (Textures, Materials, Meshes) contained within the scene along with their memory footprint. The Resource Checker tool can be found here: https://assetstore.unity.com/packages/tools/utilities/resource-checker-3224
Generally, the smaller the memory footprint a resource takes up, the less time taken to load it. This means it is possible to impact the speed of how certain art assets are loaded for rendering by:
- Using smaller texture sizes or greater compression (e.g. a 4096x4096 reduced to 2048x2048)
- Lowering the polycount for meshes
- Stripping out data not used on the mesh (UV channels, lightmaps, vertex normals, vertex colours, flat shading)
- Optimizing objects to share the same material.
The following sections will describe possible optimizations to implement relating to the rendering categories.
Graphics APIs (General)
It is worthwhile testing if the scene runs faster with different graphics APIs. The order in which these are set will affect the fallback options if the device doesn’t support a particular API. These settings can be found in Edit->Project Settings->Player->Other Settings.
To force the device to use a particular API, disable Auto Graphics API. Adding a new API may require a recompilation of the entire project to add it to the Graphics APIs list, so be prepared for this to occur. In this example, Vulkan has been added as well as OpenGLES3, with Vulkan taking priority as it is at the top of the list.
Fill Rate Optimization
As mentioned in the profiling section, lowering the resolution of a fill rate limited scene can help gain performance. This may prompt design decisions like rendering the game at half of the mobile’s native resolution or setting the resolution to a fixed size.
Custom render textures within a scene should be considered. Changing their size on a fill rate limited scene will mean less pixels for the renderer to read/write to.
Reduce Buffer Sizes
Disabling the capture of a depth buffer in the render texture’s settings, along with choosing an appropriate color format (i.e. using a single channel color format like R8_UNorm for a greyscale render texture), can help reduce the computations involved with using a render target.
Particle systems with many transparent effects such as fog and raindrops may quickly add up to the overdraw of each frame. Transparent overdraw can quickly add up if many transparent pixels are rendered on top of one another, such as having multiple layers of a fog material or looking through many slides of glass. For effects like fog sheets, reducing the spawned amount but increasing the opacity may achieve a similar affect, but with less pixels being drawn twice.
Reduce Post Processing
Disable Unused Buffers
Disabling the creation of Depth and Opaque textures rendered from the camera can reduce the time taken to render a frame. However, certain effects require the use of these textures such as post process effects and custom shaders that make use of _CameraDepthTexture and _CameraOpaqueTexure features. If you are certain your scene doesn’t make use of these textures, it may be helpful disabling them and measuring the gained performance. They can be disabled in the pipeline asset, located in the heading General.
The project’s color mode can be changed under Edit->Project Settings->Player->Other Settings->Rendering. The Color Space option will allow you to choose Linear or Gamma. Like the Graphics API section, there is also an option to support multiple color Gamut’s, located in the Color Gamut section.
Turning off HDR in the pipeline asset settings can also reduce the time taken to render each frame, as HDR increases the VRAM usage and requires a tone mapping process on top of the rendered image.
Full screen effects
Some post processing effects require expensive calculations at runtime, resulting in slower performance. If your project is already not using HDR (as per pipeline asset), the grading mode in the pipeline asset under Post-Processing can also be switched to Low Dynamic Range.
A quick way to discover the heavier computation effects is by looking at the Stats dropdown window with a stationary camera and turning off certain post process effects. Pay attention to the number of batches and set pass calls, as some effects will contribute more to the total calls. As previously mentioned, the cost of these effects can be also measured in draw calls through the frame debugger, located under Window->Analysis->Frame Debugger.
This displays the process used to build up each frame and will show the draw calls for some post process effects. Removing these effects or implementing workarounds to achieve a similar result with less computations will provide an increase to performance, such as applying a bloom effect to a HDRI sky image (in an image editing software) before importing it into the project.
Reducing Fragment Shader Complexity
If your scene uses a directional light with mixed / realtime lighting, lowering the quality of the real time shadows created can help reduce the time required to render each frame. This can be altered in the pipeline asset for your project.
Reducing the Max Distance value will shrink the area of affect for shadows being drawn and help reduce the vertex throughput and draw calls.
Lowering the Cascade Count will reduce the staged reductions in shadow map size, resulting a more pixelated shadow depending on the resolution of the shader map, but will reduce the fill rate and Set Pass calls.
The shadow map resolution can be changed with the parameter Shadow Resolution under the Lighting heading in the pipeline asset.
Cascade Count: 4
Cascade Count: 1
Setting Soft Shadows to enabled will help alleviate the pixilation, however this includes an added cost to the time taken to render the frame.
Assuming your scene is using forward rendering on a mobile device, reducing the number of real time lights calculated per pixel can benefit performance by reducing vertex throughput and draw calls. Settings for these changes can be found in the pipeline asset under the heading Lighting.
Additional lights can be either Per Vertex or Per Pixel.
- Per Vertex will result in less computations but will have lower quality lights due to the data being interpolated.
- Per Pixel will increase the computation time but will result in higher quality lights. For additional lights viewed at a far distance, the cheaper Per Vertex option should be considered.
Opting for a baked lighting approach will be more performant and can allow more lights as Unity does not include these in any further lighting calculations at runtime.
Anti-aliasing is used to reduce the jaggy edges of objects in the scene. There are many ways this can be implemented, however each come with their own drawbacks and added computations. Experimenting with each technique on the mobile device is recommended to see how it affects the look of the scene.
Unity recommends FXAA, which can be set on the camera component. Also compare how the scene looks without any Anti-Aliasing at all to see if you wish to include it in your project. Because this is a mostly fill rate limited technique, it is best to avoid using the more computationally heavy types of Anti-aliasing where possible.
The MSAA type of Anti-Aliasing works on a hardware level by rendering the boarders of polygon edges multiple times at a subpixel level. This effect has a couple of different levels (2x, x4, x16), each with an increase to the rendering cost.
The MSAA type of Anti-Alising can be turned off in the pipeline asset, located under the Quality heading. Test to see how much of performance benefit compared with the visual quality looks to determine if it needed in your project.
Mobile Friendly Shaders
Reducing the instruction count on your shaders means less computations for the GPU to work through to render an effect. Changing the shader’s precision mode to half on effects that don’t require precise calculations and deleting surface structure modules, will help lower the instruction count.
Moving certain calculations from the fragment program to the vertex program where applicable can help reduce the number of times the calculation is executed.
Lower precision in the Graph Settings.
Delete unused surface structure inputs.
Filter modes on a texture will impact performance depending on which option you choose. Unity provides 3 filtering options. Point filtering is the cheapest to calculate, followed by Bilinear then Trilinear:
- Point. Texture pixels become blocky up close.
- Bilinear. Texture samples are averaged.
- Trilinear. Texture samples are averaged and blended between mipmap levels.
The Aniso Level refers to the Anisotropic filtering quality of the texture. This is used for improving the look of textures at shallow angles, but also contributes to more work required for the GPU.
Enabling mip maps will increase the file size but help reduce the scene from sampling large textures at a faraway distance where their detail isn’t noticeable. This helps with the GPU texture cache as it requires less data to be loaded in. Unity also provides a couple of mipmap filtering options to control their look when viewed from afar.
Reducing the dimensions of the texture will decrease its quality but require less memory to be used. The dimensions of the texture can be reduced in the texture asset’s settings:
To reduce the number of textures in a project, certain maps can be packed into a single texture using their RGBA channels. For instance, a PBR material may have a roughness, metallic, ambient occlusion and height map required for its shading. Since these textures are greyscale images, they can each be packed into a single channel of 1 image to save a potential of 4 separate files. There are many ways to channel pack, such as using online tools, Photoshop, and Substance Designer.
Trim Sheets / Atlases
Trim sheets / Atlases store many different textures into 1 texture for use by multiple objects. This helps reduce the project size as 4 assets each with their own texture (a total of 4 textures) can be reduced to using a single texture. With the PBR workflow this may result in trim sheets / atlases for the different channels, such as a base color trim sheet, the accompanying normal map trim sheet, and a mask map trim sheet for each of the required PBR channels. This technique is most beneficial at the design stage of props, as having in mind what materials and details are part of a trim sheet will influence the design of the created props.
This technique is beneficial if many assets in the scene share the same texture but may be considered a waste of memory if only a few assets end up utilizing a small section of the trim sheet, where the whole texture still gets loaded, but results in a lot of unused space.
Compression & Precision
Textures can be reduced in file size and memory footprint through changing their formats and compression settings.
Further options for customized compression formats can be found by clicking on the PC, Mac & Linux Standalone Settings tab. Changing the format to one that has a lower number of bits will reduce the file size but result in lower quality color data. Other formats include support for only one or a couple of the RGBA channels, such as R8 only storing a single Red channel with 8 bits – which may be suited for greyscale images.
Be careful when choosing different compression algorithms. Some dedicate more bits to other color channels unequally, such as DXT1 which stores 5 bits in the red channel, 6 bits in the green channel, and 5 bits in the blue channel.
Vertex Throughput Optimizations
Reduce Vertex Shader Complexity
Like the method used with reducing the fragment shader complexity, removing surface structure inputs that are not used in the vertex program can reduce the number of shader instructions.
Level of Detail (LOD)
Using Level of Detail (LOD) on your gameobjects should only be used in specific cases. This optimization works by swapping out a higher polygon mesh with a lower polygon version of itself when viewed at specified screen size percentage. This means there are less polygons in total to draw to render a scene.
If your scene is not vertex bound (i.e. the number of polygons in the scene isn’t causing the main bottleneck), then this method is not effective for increasing performance. Because it swaps out a mesh with a lower quality version of itself, this adds to the SetPass calls.
Individual meshes can’t easily be batched together if they all have differing LOD levels, because there would need to be batches accounting for all the different combinations of the LOD, at each screen size percentage.
Using Unity’s LOD Group component for setting up culling distances can be useful however to hide objects depending on screen size. Culling the mesh will reduce the polygon count, number of batches and SetPass calls.
Occlusion culling is a method used to hide meshes that are obstructed from view by another mesh. This process works well in small levels where areas are often occluded by large meshes, such as interiors with corridors. However, using this method on large scale environment scenes can cause more overhead than actual performance gains in some cases, where calculating which objects are to be occluded takes more time than the actual rendering of all the items in the scene.
Reducing the complexity of data that a mesh contains can help reduce the instructions required to render it.
Data associated with a mesh includes vertex positions, vertex normals, UV channels and vertex colors. If the meshes used in a game do not require some of this extra data, like vertex colors and extra UV channels, it is best to remove them since they may still get processed and interpolated in the shader.
A mesh that is smooth shaded will require less memory than a mesh that is flat shaded. For a face to be shaded flat, it requires all its vertices to share the same vertex normal vector. This can mean multiple different vertex normals per vertex, depending on how many flat shaded faces are joined together. This also applies to UV data, where having separate UV shells will require multiple UV positions per vertex.
Batches / SetPass Call Optimizations
Types of Batching
Batching is a way to reduce the amount of unique data sent from the CPU to the GPU for rendering. It is more efficient for the CPU to send a single mesh comprised of smaller individual parts to the GPU than it is to send many different individual parts as separate meshes. Below are the types of batching Unity supports.
Static batching requires each mesh to share the same material and shader. The gameobject also needs to be stationary and cannot move. This can be turned on in the Inspector view of the selected gameobject:
Dynamic batching is handled by Unity and requires each mesh to share the same material and shader. However, the combined mesh data must have less than 900 vertex attributes – so this method is mainly used for very small objects such as quads in a particle system.
Instanced batching requires each game object to share the same mesh, material, and shader.
This can be turned on through enabling ‘Enable GPU Instancing' on the material with a compatible shader.
Instanced Indirect Batching
Instanced Indirect batching requires manual setup on the user’s end and a custom shader is required to make use of its functionality.
This method also requires the same mesh, material, and shader to be used on the objects in the scene.
Methods to Aid the Batching Process
Having assets that already conform to some of the prerequisites of these batching techniques can help make the process easier to set up.
Consider allowing multiple assets to share the same material where possible. This may mean developing assets that make use of atlases / trim sheets so one material can be shared across many unique meshes.
Using shaders that do not require unique mesh data and can be placed on many meshes will also help reduce the number of materials used.
Combining meshes into one single mesh (that share a single material) will also reduce the batches required for rendering.
- Determine what aspect of your scene is causing the biggest cost to performance early on in your optimization cycle.
- Discover whether your scene is fill rate limited, vertex throughput limited, or batching / SetPass call limited. This will guide you to where your optimization resolutions should focus first.
- Working your through the heavy computational areas first will give your biggest boost to performance early on and prevent unnecessary work which may be time and effort spent in the wrong areas.
Hopefully these suggestions and possible resolutions will help your thinking process as you optimize your scene.