Buffered

Convenience

A lot has been made of VK_EXT_descriptor_buffer, also known as “sane descriptor handling”. It’s an extension that revolutionizes how descriptors can be managed not only in brevity of code but in performance.

That’s why ZINK_DESCRIPTORS=db is now the default wherever it’s supported.

Gains

But what does this gain zink (and other users), other than being completely undebuggable if anything were to break*?
* It won’t, trust me.

One nicety of descriptor buffers is performance. By swapping out descriptor templates for buffers, it removes a layer of indirection from the descriptor update path, which reduces CPU overhead by a small amount. By avoiding the need to bind different descriptor sets, GPU synchronization can also be reduced.

In zink terms, you’ll likely notice a small FPS increase in cases that were extremely CPU-bound (e.g., Tomb Raider).

Memory Too?

Yes, GPU memory utilization is also affected.

Historically, as we all know by now, zink uses six (four when constrained) descriptor sets:

  • Uniforms
  • UBOs
  • Samplers
  • SSBOs
  • Storage Images
  • Bindless

This optimizes access for the types as well as update frequency. In terms of descriptor sets, it means a different descriptor pool per descriptor layout so that sets can be bucket allocated to further reduce overhead.

Initially when I implemented descriptor buffer, I kept this same setup. Each descriptor type had its own descriptor buffer, and the buffers were hardcoded to fit N descriptors of the given type, where N was the maximum number of descriptors I could update per cmdbuf using drawoverhead. Each cmdbuf was associated with a set of descriptor buffers, and it worked great.

But it also used a lot of VRAM, comparable even to the then-default template mode.

During my latest round of descriptor buffer refactors which added handling for bindless textures, I had a realization. I was doing descriptor buffers all wrong.

Instead of having all these different buffers per cmdbuf, why wouldn’t I just have a single buffer? Plus a static one for bindless, of course.

Imagine pointlessly allocating five times the memory required.

So now I had two descriptor buffers:

  • normal descriptors (per cmdbuf)
  • bindless descriptors (per GL context)

In Tomb Raider, this ended up being about a 6% savings for peak VRAM utilization (1445MiB -> 1362MiB). Not bad, and the code is now simpler too.

But what if I stopped hardcoding the descriptor buffer size and instead used a sliding scale so that only the (rough) amount of memory needed was allocated? Some minor hacking here and there, and peak VRAM utilization was cut even more (1362MiB -> 1297MiB).

Now it’s a little over a 10% reduction in peak VRAM utilization.

There’s still some ways to go with VRAM utilization in zink considering RadeonSI peaks at 1221MiB for the same benchmark, but a 6% gap is much more reasonable than a 16% one.

Future

Blog posts about Vulkan descriptor models aren’t going away.

I wish they were, but they just aren’t.

Stay tuned for more of these posts as well as other exciting developments on the road to Mesa 23.1.

Written on February 23, 2023