Shader 编程优化技巧

发表于 2022/09/09 更新于 2023/10/18

作者 JunQiang

7 分钟阅读

Shader 编程优化技巧

if else 优化

例1

  
if (x == 0) {
    y += 5;
}

  
vec4 when_eq(vec4 x, vec4 y) {
    return 1.0 - abs(sign(x - y));
}

y += 5 * when_eq(x, 0);

如果使用HLSL，则应用 step 函数取代GLSL中的 sign 函数

例2

  
if (x == 0) {
    b = a1;
} else {
    b = a2;
}

  
vec4 when_neq(vec4 x, vec4 y) {
    return abs(sign(x - y));
}

b = mix(a1, a2, when_neq(x, 0));

如果使用HLSL，则应用 lerp 函数取代GLSL中的 mix 函数

常用关系运算符优化

  
//relation operator
vec4 when_eq(vec4 x, vec4 y) {
    return 1.0 - abs(sign(x - y));
}

vec4 when_neq(vec4 x, vec4 y) {
    return abs(sign(x - y));
}

vec4 when_gt(vec4 x, vec4 y) {
    return max(sign(x - y), 0.0);
}

vec4 when_lt(vec4 x, vec4 y) {
    return max(sign(y - x), 0.0);
}

vec4 when_ge(vec4 x, vec4 y) {
    return 1.0 - when_lt(x, y);
}

vec4 when_le(vec4 x, vec4 y) {
    return 1.0 - when_gt(x, y);
}

常用逻辑运算符优化

  
//logical operator;
vec4 and(vec4 a, vec4 b) {
    return a * b;
}

vec4 or(vec4 a, vec4 b) {
    return min(a + b, 1.0);
    // or
    // return max(a, b);
}

vec4 xor(vec4 a, vec4 b) {
    return (a + b) % 2.0;
}

vec4 not(vec4 a) {
    return 1.0 - a;
}

给浮点数求余数，HLSL应使用 fmod 函数；GLSL应使用 modf 函数，且只有openGL 3.x/openGL ES 3.x 中才能使用

指令优化

使用MAD

MAD 是英文 multiply, then add 的缩写。指的意思是GPU中，编译器可以把一个乘法和一个加法合并成一条指令(类似AMD的FMA指令)。比如

  
result = 0.5 * (1.0 + variable); // NG
result = 0.5 + 0.5 * variable;

再举个稍微复杂的例子

  
// Without MAD
myOutputColor.xyz = myColor.xyz;
myOutputColor.w = 1.0;
gl_FragColor = myOutputColor;

// With MAD
const vec2 constantList = vec2(1.0, 0.0);
gl_FragColor = mycolor.xyzw * constantList.xxxy + constantList.yyyx;

使用Swizzle

  
in vec4 in_pos;
// The following two lines:
gl_Position.x = in_pos.x;
gl_Position.y = in_pos.y;
// can be simplified to:
gl_Position.xy = in_pos.xy;

多使用shader内置函数

有很多shader内置函数比如 mix(lerp) 和 dot ,都是优化过的，GPU可以在一个时钟周期内完成(“single-cycle”)

Linear Interpolation

  
vec3 colorRGB_0, colorRGB_1;
float alpha;
resultRGB = colorRGB_0 * (1.0 - alpha) + colorRGB_1 * alpha;

// The above can be converted to the following for MAD purposes:
resultRGB = colorRGB_0  + alpha * (colorRGB_1 - colorRGB_0);

// GLSL provides the mix function. This function should be used where possible:
resultRGB = mix(colorRGB_0, colorRGB_1, alpha);

Dot products

  
vec3 fvalue1;
result1 = fvalue1.x + fvalue1.y + fvalue1.z;
vec4 fvalue2;
result2 = fvalue2.x + fvalue2.y + fvalue2.z + fvalue2.w;

// This is essentially a lot of additions. 
// Using a simple constant and the dot-product operator, we can have this:
const vec4 AllOnes = vec4(1.0);
vec3 fvalue1;
result1 = dot(fvalue1, AllOnes.xyz);
vec4 fvalue2;
result2 = dot(fvalue2, AllOnes);

多数shader内置函数都很快，但也存在特例，比如 discard , floor

Index Buffer

在绘制图元时，永远使用Index Buffer而不是单纯使用三角化后的Vertex Buffer。在openGL中就是使用 glDrawElements 而不要用 glDrawArrays。

首先，使用索引缓冲可以极大减少CPU向 vertex shader 传送数据的大小，减少缓冲压力。其次，对于多次引用到的顶点，其顶点计算结果可以被缓存。

If you are using indexed rendering, then it gets complicated. It’s more-or-less 1:1, each vertex having its own VS invocation. However, thanks to post-T&L caching, it is possible for a vertex shader to be executed less than once per input vertex.(原文)

使用优化的数学函数

Sine

  
sin(x) ≈ x - x^3 / 3! + x^5 / 5! - x^7 / 7! + ...

Cosine

  
cos(x) ≈ 1 - x^2 / 2! + x^4 / 4! - x^6 / 6! + ...

sqrt 平方根

  
sqrt(x) ≈ 1 + (1/2) * (x - 1) - (1/8) * (x - 1)^2 + (1/16) * (x - 1)^3 - (5/128) * (x - 1)^4 + ...

exp 指数函数

  
sqrt(x) ≈ 1 + (1/2) * (x - 1) - (1/8) * (x - 1)^2 + (1/16) * (x - 1)^3 - (5/128) * (x - 1)^4 + ...

log 对数函数

  
log(x) ≈ x - (x - 1) - (1/2) * (x - 1)^2 + (1/3) * (x - 1)^3 - (1/4) * (x - 1)^4 + ...

除了使用多项式近似的方法之外，还可以采用其他的方法来优化这些函数的计算。
例如：
对于 sqrt 函数，可以使用牛顿迭代法来求解。对于 exp 和 log 函数，可以使用查找表的方法来快速查找结果。

选用合适的数据类型、精度

  
// 数据类型
vec4 color = texture(tex, texCoord).rgba;

// 数据精度
float color = 0.5;

// 不使用向量
float x1 = 1.0;
float y1 = 2.0;
float x2 = 3.0;
float y2 = 4.0;
float dist = sqrt((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2));

改写示例

  
// 数据类型
uvec4 color = texture(tex, texCoord).rgba;

// 数据精度
half color = 0.5;

// 使用向量
vec2 p1 = vec2(1.0, 2.0);
vec2 p2 = vec2(3.0, 4.0);
float dist = length(p1 - p2);

缓存技术

常量缓存

  
uniform float value;

void main() 
{
	float result = value * 0.5;
	gl_FragColor = vec4(result, result, result, 1.0);
}

着色器缓存

  
glUseProgram(shaderProgram);

其他一些7788的写法

善用 ()
const
mod

一些常见的优化工具

AMD GPU ShaderAnalyzer
NVIDIA Nsight Shader Debugger

图形编程, Shader

Shader

本文由作者按照 CC BY 4.0 进行授权

Shader 编程优化技巧

if else 优化

例1

例2

常用关系运算符优化

常用逻辑运算符优化

指令优化

使用MAD

使用Swizzle

多使用shader内置函数

Linear Interpolation

Dot products

Index Buffer

使用优化的数学函数

Sine

Cosine

sqrt 平方根

exp 指数函数

log 对数函数

选用合适的数据类型、精度

缓存技术

常量缓存

着色器缓存

其他一些7788的写法

一些常见的优化工具

热门标签