What

Blogging: That Thing I Forgot About

Yeah, my b, I forgot this was a thing.

Fuck it though, I’m a professional, so I’m gonna pretend I didn’t just skip a month of blogs and get right back into it.

Gallivm

Gallivm is the nir/tgsi-to-llvm translation layer in Gallium that LLVMpipe (and thus Lavapipe) uses to generate the JIT functions which make triangles. It’s very old code in that it predates me knowing how triangles work, but that doesn’t mean it doesn’t have bugs.

And Gallivm bugs are the worst bugs.

For a long time, I’ve had SIGILL crashes on exactly one machine locally for the CTS glob dEQP-GLES31.functional.program_uniform.by*sampler2D_samplerCube*. These tests pass on everyone else’s machines including CI.

Like I said, Gallivm bugs are the worst bugs.

Debugging

How does one debug JIT code? GDB can’t be used, valgrind doesn’t work, and, despite what LLVM developers would tell you, building an assert-enabled LLVM doesn’t help at all in most cases here since that will only catch invalid behavior, not questionably valid behavior that very obviously produces invalid results.

So we enter the world of lp_build_print debugging. Much like standard printf debugging, the strategy here is to just lp_build_print_value or lp_build_printf("I hate this part of the shader too") our way to figuring out where in the shader the crash occurs.

Here’s an example shader from dEQP-GLES31.functional.program_uniform.by_pointer.render.basic_struct.sampler2D_samplerCube_vertex that crashes:

#version 310 es
in highp vec4 a_position;
out mediump float v_vtxOut;

struct structType
{
	mediump sampler2D m0;
	mediump samplerCube m1;
};
uniform structType u_var;

mediump float compare_float    (mediump float a, mediump float b)  { return abs(a - b) < 0.05 ? 1.0 : 0.0; }
mediump float compare_vec4     (mediump vec4 a, mediump vec4 b)    { return compare_float(a.x, b.x)*compare_float(a.y, b.y)*compare_float(a.z, b.z)*compare_float(a.w, b.w); }

void main (void)
{
	gl_Position = a_position;
	v_vtxOut = 1.0;
	v_vtxOut *= compare_vec4(texture(u_var.m0, vec2(0.0)), vec4(0.15, 0.52, 0.26, 0.35));
	v_vtxOut *= compare_vec4(texture(u_var.m1, vec3(0.0)), vec4(0.88, 0.09, 0.30, 0.61));
}

Which, in llvmpipe NIR, is:

shader: MESA_SHADER_VERTEX
source_sha1: {0xcb00c93e, 0x64db3b0f, 0xf4764ad3, 0x12b69222, 0x7fb42437}
inputs: 1
outputs: 2
uniforms: 0
shared: 0
ray queries: 0
decl_var uniform INTERP_MODE_NONE sampler2D lower@u_var.m0 (0, 0, 0)
decl_var uniform INTERP_MODE_NONE samplerCube lower@u_var.m1 (0, 0, 1)
decl_function main (0 params)

impl main {
	block block_0:
	/* preds: */
	vec1 32 ssa_0 = deref_var &a_position (shader_in vec4) 
	vec4 32 ssa_1 = intrinsic load_deref (ssa_0) (access=0)
	vec1 16 ssa_2 = load_const (0xb0cd = -0.150024)
	vec1 16 ssa_3 = load_const (0x2a66 = 0.049988)
	vec1 16 ssa_4 = load_const (0xb829 = -0.520020)
	vec1 16 ssa_5 = load_const (0xb429 = -0.260010)
	vec1 16 ssa_6 = load_const (0xb59a = -0.350098)
	vec1 16 ssa_7 = load_const (0xbb0a = -0.879883)
	vec1 16 ssa_8 = load_const (0xadc3 = -0.090027)
	vec1 16 ssa_9 = load_const (0xb4cd = -0.300049)
	vec1 16 ssa_10 = load_const (0xb8e1 = -0.609863)
	vec2 32 ssa_13 = load_const (0x00000000, 0x00000000) = (0.000000, 0.000000)
	vec1 32 ssa_49 = load_const (0x00000000 = 0.000000)
	vec4 16 ssa_14 = (float16)txl ssa_13 (coord), ssa_49 (lod), 0 (texture), 0 (sampler)
	vec1 16 ssa_15 = fadd ssa_14.x, ssa_2
	vec1 16 ssa_16 = fabs ssa_15
	vec1 16 ssa_17 = fadd ssa_14.y, ssa_4
	vec1 16 ssa_18 = fabs ssa_17
	vec1 16 ssa_19 = fadd ssa_14.z, ssa_5
	vec1 16 ssa_20 = fabs ssa_19
	vec1 16 ssa_21 = fadd ssa_14.w, ssa_6
	vec1 16 ssa_22 = fabs ssa_21
	vec1 16 ssa_23 = fmax ssa_16, ssa_18
	vec1 16 ssa_24 = fmax ssa_23, ssa_20
	vec1 16 ssa_25 = fmax ssa_24, ssa_22
	vec3 32 ssa_27 = load_const (0x00000000, 0x00000000, 0x00000000) = (0.000000, 0.000000, 0.000000)
	vec1 32 ssa_50 = load_const (0x00000000 = 0.000000)
	vec4 16 ssa_28 = (float16)txl ssa_27 (coord), ssa_50 (lod), 1 (texture), 1 (sampler)
	vec1 16 ssa_29 = fadd ssa_28.x, ssa_7
	vec1 16 ssa_30 = fabs ssa_29
	vec1 16 ssa_31 = fadd ssa_28.y, ssa_8
	vec1 16 ssa_32 = fabs ssa_31
	vec1 16 ssa_33 = fadd ssa_28.z, ssa_9
	vec1 16 ssa_34 = fabs ssa_33
	vec1 16 ssa_35 = fadd ssa_28.w, ssa_10
	vec1 16 ssa_36 = fabs ssa_35
	vec1 16 ssa_37 = fmax ssa_30, ssa_32
	vec1 16 ssa_38 = fmax ssa_37, ssa_34
	vec1 16 ssa_39 = fmax ssa_38, ssa_36
	vec1 16 ssa_40 = fmax ssa_25, ssa_39
	vec1 32 ssa_41 = flt32 ssa_40, ssa_3
	vec1 32 ssa_42 = b2f32 ssa_41
	vec1 32 ssa_43 = deref_var &gl_Position (shader_out vec4) 
	intrinsic store_deref (ssa_43, ssa_1) (wrmask=xyzw /*15*/, access=0)
	vec1 32 ssa_44 = deref_var &v_vtxOut (shader_out float) 
	intrinsic store_deref (ssa_44, ssa_42) (wrmask=x /*1*/, access=0)
	/* succs: block_1 */
	block block_1:
}

There’s two sample ops (txl), and since these tests only do simple texture() calls, it seems reasonable to assume that one of them is causing the crash. Sticking a lp_build_print_value on the texel values fetched by the sample operations will reveal whether the crash occurs before or after them.

What output does this yield?

Test case 'dEQP-GLES31.functional.program_uniform.by_pointer.render.basic_struct.sampler2D_samplerCube_vertex'..
texel 1.43279037e-322 6.95333598e-310 0 1.43279037e-322 1.08694442e-322 1.43279037e-322 1.08694442e-322 0
texel 1.43279037e-322 6.95333598e-310 0 1.43279037e-322 1.08694442e-322 1.43279037e-322 1.08694442e-322 0
texel 1.43279037e-322 6.95333598e-310 0 1.43279037e-322 1.08694442e-322 1.43279037e-322 1.08694442e-322 0
texel 1.43279037e-322 6.95333598e-310 0 1.43279037e-322 1.08694442e-322 1.43279037e-322 1.08694442e-322 0
[1]    3500332 illegal hardware instruction (core dumped)

Each txl op fetches four values, which means this is the result from the first instruction, but the second one isn’t reached before the crash. Unsurprisingly, this is also the cube sampling instruction, which makes sense given that all the crashes of this type I get are from cube sampling tests.

Now that it’s been determined the second txl is causing the crash, it’s reasonable to assume that the construction of that sampling op is the cause rather than the op itself, as proven by sticking some simple lp_build_printf("What am I doing with my life") calls in just before that op. Indeed, as the printfs confirm, I’m still questioning the life choices that led me to this point, so it’s now proven that the txl instruction itself is the problem.

Cube sampling has a lot of complex math involved for face selection, and I’ve spent a lot of time in there recently. My first guess was that the cube coordinates were bogus. Printing them yielded results:

Test case 'dEQP-GLES31.functional.program_uniform.by_pointer.render.basic_struct.sampler2D_samplerCube_vertex'..
texel 6.9008994e-310 0 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617
texel 6.9008994e-310 0 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617
texel 6.9008994e-310 0 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617
texel 6.9008994e-310 0 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617 0.349019617
cubecoords nan nan nan nan nan nan nan nan
cubecoords nan nan nan nan nan nan nan nan

These cube coords have more NaNs than a 1960s Batman TV series, so it looks like I was right in my hunch. Printing the cube S-face value next yields more NaNs. My printf search continued a couple more iterations until I wound up at this function:

static LLVMValueRef
lp_build_cube_imapos(struct lp_build_context *coord_bld, LLVMValueRef coord)
{
   /* ima = +0.5 / abs(coord); */
   LLVMValueRef posHalf = lp_build_const_vec(coord_bld->gallivm, coord_bld->type, 0.5);
   LLVMValueRef absCoord = lp_build_abs(coord_bld, coord);
   LLVMValueRef ima = lp_build_div(coord_bld, posHalf, absCoord);
   return ima;
}

Immediately, all of us multiverse-brain engineers spot something suspicious: this has a division operation with a user-provided divisor. Printing absCoord here yielded all zeroes, which was about where my remaining energy was at this Friday morning, so I mangled the code slightly:

static LLVMValueRef
lp_build_cube_imapos(struct lp_build_context *coord_bld, LLVMValueRef coord)
{
   /* ima = +0.5 / abs(coord); */
   LLVMValueRef posHalf = lp_build_const_vec(coord_bld->gallivm, coord_bld->type, 0.5);
   LLVMValueRef absCoord = lp_build_abs(coord_bld, coord);
   /* avoid div by zero */
   LLVMValueRef sel = lp_build_cmp(coord_bld, PIPE_FUNC_GREATER, absCoord, coord_bld->zero);
   LLVMValueRef div = lp_build_div(coord_bld, posHalf, absCoord);
   LLVMValueRef ima = lp_build_select(coord_bld, sel, div, coord_bld->zero);
   return ima;
}

And blammo, now that Gallivm could no longer divide by zero, the test was now passing. And so were a lot of others.

Progress

There’s been some speculation about how close Zink really is to being “useful”, where “useful” is determined by the majesty of passing GL4.6 CTS.

So how close is it? The answer might shock you.

Remaining Lavapipe Fails: 17

  • KHR-GL46.gpu_shader_fp64.builtin.mod_dvec2,Fail
  • KHR-GL46.gpu_shader_fp64.builtin.mod_dvec3,Fail
  • KHR-GL46.gpu_shader_fp64.builtin.mod_dvec4,Fail
  • KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives,Fail
  • KHR-GL46.tessellation_shader.single.isolines_tessellation,Fail
  • KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through,Fail
  • KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID,Fail
  • KHR-GL46.tessellation_shader.vertex.vertex_spacing,Fail
  • KHR-GL46.texture_barrier.disjoint-texels,Fail
  • KHR-GL46.texture_barrier.overlapping-texels,Fail
  • KHR-GL46.texture_barrier_ARB.disjoint-texels,Fail
  • KHR-GL46.texture_barrier_ARB.overlapping-texels,Fail
  • KHR-GL46.texture_swizzle.functional,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding,Crash

Remaining ANV Fails (Icelake): 9

  • KHR-GL46.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives,Fail
  • KHR-GL46.tessellation_shader.single.isolines_tessellation,Fail
  • KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through,Fail
  • KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding,Fail
  • KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID,Fail
  • KHR-GL46.tessellation_shader.vertex.vertex_spacing,Fail

Big Triangle better keep a careful eye on us now.

Written on March 4, 2022