Capped

I Hate Pipe Caps

Those of you who saw me at XDC will recall that I talked about my hatred for Gallium’s pipe caps.

I still hate them.

But mostly I hate a specific type of pipe cap: the pipe cap that gates performance.

My Driver’s Perf: Why Is It Bad?

In a nutshell:

pipe-cap-meme.png

It happens again and again across drivers. Some game/app/test is inexplicably running with a framerate so low it could win a world limbo championship. Initial debugging (perf) initially reveals nothing. GPU profiling reveals nothing. There’s nothing utilizing a massive amount of resources. What’s going on?

The most recent iteration of this Yet Another Missing Pipe Cap bug was seen with DOOM2016. Even as early as the title screen, framerate would struggle to hit 30fps for a while before rocketing up to whatever maximum it could reach. In-game was the same, soaring to the 200fps maximum after enough time standing stationary without moving the camera. As soon as the camera shifted however, back to 20fps we went.

Manually adding timing instrumentation to the driver revealed baffling results: a buffer readback was repeatedly triggering a staging copy, which then fenced for batch completion and stalled rendering.

Backtrace:

#0  zink_buffer_map (pctx=0x7f8fb4018800, pres=0x7f8fe50a4600, level=0, usage=1610612737, box=0x5f0f590, transfer=0x7f8fe50a45e8) at ../src/gallium/drivers/zink/zink_resource.c:1906
#1  0x00007f8fe9d626e5 in tc_buffer_map (_pipe=0x7f8fb415cef0, resource=0x7f8fe50a4600, level=0, usage=1610612737, box=0x5f0f590, transfer=0x7f8fe50a45e8) at ../src/gallium/auxiliary/util/u_threaded_context.c:2623
#2  0x00007f8fe93d7f59 in pipe_buffer_map_range (pipe=0x7f8fb415cef0, buffer=0x7f8fe50a4600, offset=0, length=31457280, access=1, transfer=0x7f8fe50a45e8) at ../src/gallium/auxiliary/util/u_inlines.h:400
#3  0x00007f8fe93d9495 in _mesa_bufferobj_map_range (ctx=0x7f8fb4191710, offset=0, length=31457280, access=1, obj=0x7f8fe50a4520, index=MAP_INTERNAL) at ../src/mesa/main/bufferobj.c:499
#4  0x00007f8fe963e40f in _mesa_validate_pbo_compressed_teximage (ctx=0x7f8fb4191710, dimensions=3, imageSize=16384, pixels=0xa04000, packing=0x7f8fb41c2820, funcName=0x7f8fea64c3e2 "glCompressedTexSubImage") at ../src/mesa/main/pbo.c:456
#5  0x00007f8fe96c2c8c in _mesa_store_compressed_texsubimage (ctx=0x7f8fb4191710, dims=3, texImage=0x7f8fe50bda20, xoffset=16000, yoffset=256, zoffset=1, width=128, height=128, depth=1, format=35919, imageSize=16384, data=0xa04000) at ../src/mesa/main/texstore.c:1357
#6  0x00007f8fe972862a in st_CompressedTexSubImage (ctx=0x7f8fb4191710, dims=3, texImage=0x7f8fe50bda20, x=16000, y=256, z=1, w=128, h=128, d=1, format=35919, imageSize=16384, data=0xa04000) at ../src/mesa/state_tracker/st_cb_texture.c:2390
#7  0x00007f8fe96aa0a5 in compressed_texture_sub_image (ctx=0x7f8fb4191710, dims=3, texObj=0x7f8fe50bd5b0, texImage=0x7f8fe50bda20, target=35866, level=0, xoffset=16000, yoffset=256, zoffset=1, width=128, height=128, depth=1, format=35919, imageSize=16384, data=0xa04000) at ../src/mesa/main/teximage.c:5862
#8  0x00007f8fe96aa553 in compressed_tex_sub_image (dim=3, target=35866, textureOrIndex=0, level=0, xoffset=16000, yoffset=256, zoffset=1, width=128, height=128, depth=1, format=35919, imageSize=16384, data=0xa04000, mode=TEX_MODE_CURRENT_ERROR, caller=0x7f8fea648740 "glCompressedTexSubImage3D") at ../src/mesa/main/teximage.c:6000
#9  0x00007f8fe96aab53 in _mesa_CompressedTexSubImage3D (target=35866, level=0, xoffset=16000, yoffset=256, zoffset=1, width=128, height=128, depth=1, format=35919, imageSize=16384, data=0xa04000) at ../src/mesa/main/teximage.c:6195
#10 0x00007f8fe93435b7 in _mesa_unmarshal_CompressedTexSubImage3D (ctx=0x7f8fb4191710, cmd=0x7f8fb4197990) at src/mapi/glapi/gen/marshal_generated1.c:5199
#11 0x00007f8fe960eb20 in glthread_unmarshal_batch (job=0x7f8fb41978c8, gdata=0x0, thread_index=0) at ../src/mesa/main/glthread.c:65
#12 0x00007f8fe960f47f in _mesa_glthread_finish (ctx=0x7f8fb4191710) at ../src/mesa/main/glthread.c:312
#13 0x00007f8fe960f4c7 in _mesa_glthread_finish_before (ctx=0x7f8fb4191710, func=0x7f8fea5fcc00 "GetQueryObjectui64v") at ../src/mesa/main/glthread.c:328
#14 0x00007f8fe935c214 in _mesa_marshal_GetQueryObjectui64v (id=4097, pname=34918, params=0x5f0fca8) at src/mapi/glapi/gen/marshal_generated3.c:1349
#15 0x000000007a89a63f in ?? ()
#16 0x0000000000000000 in ?? ()

This is the software fallback path for CompressedTexSubImage3D. Zink is (in this instance) a hardware driver, so why…

full.png

Hold on, I think I see something

zoom.png

A little closer

zoom2.png

Enhance

zoom3.png

Perf: Solved

With a trivial MR, DOOM2016 gets a nice 10x perf boost. Probably some other games do too.

But why can’t there be some sort of tag for all the hundreds of pipe caps to indicate that they’re perf-related?

Why do I have to do this dance again and again?

Written on February 9, 2023