Nv

All In A Day’s Work

I posted yesterday in brief about zink on nvidia blob, But what actually was necessary to get this working?

In short, there were three problem areas that needed to be worked on:

  • GLX
  • image creation
  • memory allocation

Let’s go over some of these in a bit more depth.

GLX

GLX is the layer which connects GL to X. In the context of this post and zink, this can be thought of as the method by which zink negotiates how it’s going to draw to the screen.

The way this works, at a very high level and with only the barest concern for accuracy and depth of subject matter, is that Mesa grabs a bunch of properties and features from the driver (zink) and then generates a series of configurations that can be used for X window outputs. GLX then compares these configurations against the one requested by the driver. If a match is found, the driver gets to draw. Otherwise, an error like this one I found on StackOverflow is printed:

X Error of failed request:  BadMatch (invalid parameter attributes)
Major opcode of failed request:  128 (GLX)
Minor opcode of failed request:  34 ()
Serial number of failed request:  39
Current serial number in output stream:  40

I didn’t do much digging into this. Here’s a visual representation of the process by which, with the assistance of GLX Hall of Famer Adam Jackson, I resolved the issue:

giphy.gif

In short, he found this problem some time ago and disabled some of the less relevant checks for configuration matching. All I had to do was apply the patch.

#teamwork

Image Creation

A core philosophy of Vulkan is that it’s an explicit API. This means that in the case of images, the exact format, tiling mode, target, usage, etc are specified at the time of creation. This also means that drivers have things they support and things they don’t support.

Historically, zink has just assumed that everything is supported and then either crashed horribly or succeeded because ANV is a pretty cool driver that makes a lot of stuff work when it shouldn’t.

To improve this situation, I’ve added two tiers of validation for image resource creation:

  • first, check all VkImageUsageFlags that are needed for an image are supported by the format being used
  • second, run the exact image creation params by the Vulkan driver before creation to see if it’ll work (and abort() if it doesn’t)

Roughly, these correspond to vkGetPhysicalDeviceFormatProperties (which previously had some, but not comprehensive use for this type of validation) and vkGetPhysicalDeviceImageFormatProperties (which has never been used previously in zink).

A major issue that I found along the way is that there’s many cases where zink needs a linear image tiling, as this is the only type of layout which enables reading it back into a buffer. Zink has been assuming that if the format has any (e.g., non-linear) support for the image’s usage, linear is also fine. This is not the case, however, so now there’s more checks which enforce a series of hoops to be jumped through when it’s necessary to do readback from images which have no support at all for linear tiling.

Memory Allocation

This is more or less the same as the issue that existed with image creation: zink tried to allocate memory from the wrong bucket (usually HOST_VISIBLE), and the Vulkan driver couldn’t handle that, so everything blew up.

Now this is handled by using device memory for almost all allocations, and everything works well enough.

Closing Thoughts

Nvidia GPUs should work pretty well from today on in my branch; I’m at a ~95% pass rate in piglit tests as of my last run, putting it solidly in second place on the Zink Preferred GPU List behind ANV, where I’m getting upwards of 97% of tests passing.

This didn’t make it into yesterday’s post, but everyone’s favorite benchmark also runs on zink+nvidia now:

nvidia-heaven.png

Here’s the caveat for all of the above: at present, zink on NV is unusably slow. The primary reason for this is that every frame that gets displayed has to be copied multiple times:

  • first, a GPU copy to a staging image that has HOST_VISIBLE (CPU-readable) memory
  • second, a CPU copy to another image which will then be used for displaying the frame

The first step in the process not only introduces more GPU work, it also forces an explicit fence, meaning that this is effectively right back where zink from master/release versions is at in forcing a complete stop of all work prior to each frame being finished.

The second step is also pretty bad, as it delays queuing any further work by the driver until the entire frame has once again been copied.

There’s not really any way to improve the current situation, but that’s only temporary. In the “soon” future, we’ll be landing WSI support, which will greatly improve things for all the drivers that zink supports, but probably mostly this one.

Written on February 4, 2021