Sparse

Buffering

The great thing about tomorrow is that it never comes.

Let’s talk about sparse buffers.

What is a sparse buffer? A sparse buffer is a buffer that is not required to be contiguously or fully backed. This means that a buffer larger than the GPU’s available memory can be created, and only some parts of it are utilized at any given time. Because of the non-resident nature of the backing memory, they can never be mapped, instead needing to go through a staging buffer for any host read/write.

In a gallium-based driver, provided that an effective implementation for staging buffers exists, sparse buffer implementation goes almost exclusively through the pipe_context::resource_commit hook, which manages residency of a sparse resource’s backing memory, passing a range to change residency for and an on/off switch.

In zink(-wip), the hook looks like this:

static bool
zink_resource_commit(struct pipe_context *pctx, struct pipe_resource *pres, unsigned level, struct pipe_box *box, bool commit)
{
   struct zink_context *ctx = zink_context(pctx);
   struct zink_resource *res = zink_resource(pres);
   struct zink_screen *screen = zink_screen(pctx->screen);

   /* if any current usage exists, flush the queue */
   if (zink_batch_usage_matches(&res->obj->reads, ctx->curr_batch) ||
       zink_batch_usage_matches(&res->obj->writes, ctx->curr_batch))
      zink_flush_queue(ctx);

   VkBindSparseInfo sparse;
   sparse.sType = VK_STRUCTURE_TYPE_BIND_SPARSE_INFO;
   sparse.pNext = NULL;
   sparse.waitSemaphoreCount = 0;
   sparse.bufferBindCount = 1;
   sparse.imageOpaqueBindCount = 0;
   sparse.imageBindCount = 0;
   sparse.signalSemaphoreCount = 0;

   VkSparseBufferMemoryBindInfo sparse_bind;
   sparse_bind.buffer = res->obj->buffer;
   sparse_bind.bindCount = 1;
   sparse.pBufferBinds = &sparse_bind;

   VkSparseMemoryBind mem_bind;
   mem_bind.resourceOffset = box->x;
   mem_bind.size = box->width;
   mem_bind.memory = commit ? res->obj->mem : VK_NULL_HANDLE;
   mem_bind.memoryOffset = box->x;
   mem_bind.flags = 0;
   sparse_bind.pBinds = &mem_bind;
   VkQueue queue = util_queue_is_initialized(&ctx->batch.flush_queue) ? ctx->batch.thread_queue : ctx->batch.queue;

   VkResult ret = vkQueueBindSparse(queue, 1, &sparse, VK_NULL_HANDLE);
   if (!zink_screen_handle_vkresult(screen, ret)) {
      check_device_lost(ctx);
      return false;
   }
   return true;
}

Naturally there’s a need to enjoy the verbosity of Vulkan structs here, but there’s two key takeaways.

The first is that this implementation is likely suboptimal; it should be making better use of semaphores to avoid having to flush the queue if the resource has current-batch usage. That’s complex to implement, however, so I took the same shortcut that RadeonSI does here.

The second is that this is just copying the pipe_box struct to the VkSparseMemoryBind struct. The reason this works with a 1:1 mapping is because the backing resource is allocated with a 1:1 range mapping, so the values can be directly used.

Other than that, the only changes required for this implementation were to add a bunch of checks for the sparse flag on resources during map/unmap to force staging buffers and to use device-local memory instead of host-visible.

Sometimes zink can be simple!

Written on April 7, 2021