GPU Streaming Part 3

While experimenting with some LOD ideas, I accidentally came up with potentially viable GPU Streaming solution. Conceptually, it was to be based around a mixture of the old Merged Mesh (without streaming) and Structured Buffer approaches (see Part 1 and Part 2 for more information), and is referred to as MMSB Hybrid for simplicity.

How it works

The idea was to reserve a space for, and pre-generate relatively small number of vertices in the VRAM, then dynamically populate the Structured Buffer with visible points’ data. Finally, render the requested number of sprites using the previously pre-generated pool while applying to them the relevant portions of the Buffer. Think of it like a dynamic, 3D Displacement Mapping :)

This allowed me to avoid the bottleneck of Hardware Instantiation, while retaining the ability to only transfer a single copy of the relevant data, a win-win.

Comparison of average performance cost to render 1,000,000 points using different techniques. In most situations, the amount of visible points should stay within 1.3 - 1.7 million.    Results obtained with i7-5960X@4.4GHz and NVIDIA GeForce GTX 980Ti

Comparison of average performance cost to render 1,000,000 points using different techniques. In most situations, the amount of visible points should stay within 1.3 - 1.7 million.

Results obtained with i7-5960X@4.4GHz and NVIDIA GeForce GTX 980Ti

As you can see from the graph, the hybrid solution provides significant performance increase over the Hardware Instancing + Structured Buffer combo, and is just 44% slower than the no streaming solution - not bad if you consider the amount of flexibility gained in the process.

Drawbacks

As with most things, unfortunately, this solution comes with a trade-off - because it has to pre-allocate a portion of its data, the VRAM usage is going to be a little higher.

The amount required is approximately 80 MB per 1,000,000 visible points, hence it should be well within acceptable range for pretty much all modern graphics cards. Furthermore, considering that in most cases the cloud will generally display under 2,000,000 points, the total VRAM consumption is expected to be less than 185 MB.

Conclusion

As the Hybrid provides a very much acceptable performance, I plan to postpone investigation of other forms of streaming, at least for the time being.

As soon as all stability tests are completed, the solution will be uploaded to the v0.6 early access branch.