來自 2022 WebGL & WebGPU Meetup 的幻燈片

1 在能用的地方都用 label 屬性

WebGPU 中的每個物件都有 label 屬性，不管你是建立它的時候通過傳遞 descriptor 的 label 屬性也好，亦或者是建立完成後直接訪問其 label 屬性也好。這個屬性類似於一個 id，它能讓物件更便於除錯和觀察，寫它幾乎不需要什麼成本考量，但是除錯的時候會非常、非常爽。

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 12 * Float32Array.BYTES_PER_ELEMENT, // 故意設的 12，實際上矩陣應該要 16
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
})
const projectionMatrixArray = new Float32Array(16)

gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray)

上面程式碼故意寫錯的矩陣所用 GPUBuffer 的大小，在錯誤校驗的時候就會帶上 label 資訊了：

// 控制檯輸出
Write range (bufferOffset: 0, size: 64) does not fit in [Buffer "Projection Matrix Buffer"] size (48).

2 使用除錯組

指令緩衝（CommandBuffer）允許你增刪除錯組，除錯組其實就是一組字串，它指示的是哪部分程式碼在執行。錯誤校驗的時候，報錯訊息會顯示呼叫堆疊：

// --- 第一個除錯點：標記當前幀 ---
commandEncoder.pushDebugGroup('Frame ${frameIndex}');
  // --- 第一個子除錯點：標記燈光的更新 ---
  commandEncoder.pushDebugGroup('Clustered Light Compute Pass');
        // 譬如，在這裡更新光源
    updateClusteredLights(commandEncoder);
  commandEncoder.popDebugGroup();
  // --- 結束第一個子除錯點 ---
  // --- 第二個子除錯點：標記渲染通道開始 ---
  commandEncoder.pushDebugGroup('Main Render Pass');
    // 觸發繪製
    renderScene(commandEncoder);
  commandEncoder.popDebugGroup();
  // --- 結束第二個子除錯點
commandEncoder.popDebugGroup();
// --- 結束第一個除錯點 ---

這樣，如果有報錯訊息，就會提示：

// 控制檯輸出
Binding sizes are too small for bind group [BindGroup] at index 0

Debug group stack:
> "Main Render Pass"
> "Frame 234"

3 從 Blob 中載入紋理影像

使用 Blob 建立的 ImageBitmaps 可以獲得最佳的 JPG/PNG 紋理解碼效能。

/**
 * 根據紋理圖片路徑非同步建立紋理物件，並將紋理資料拷貝至物件中
 * @param {GPUDevice} gpuDevice 裝置物件
 * @param {string} url 紋理圖片路徑
 */
async function createTextureFromImageUrl(gpuDevice, url) {
  const blob = await fetch(url).then((r) => r.blob())
  const source = await createImageBitmap(blob)
  
  const textureDescriptor = {
    label: `Image Texture ${url}`,
    size: {
      width: source.width,
      height: source.height,
    },
    format: 'rgba8unorm',
    usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST
  }
  const texture = gpuDevice.createTexture(textureDescriptor)
  gpuDevice.queue.copyExternalImageToTexture(
    { source },
    { texture },
    textureDescriptor.size,
  )
  
  return texture
}

更推薦使用壓縮格式的紋理資源

能用就用。

WebGPU 支援至少 3 種壓縮紋理型別：

texture-compression-bc
texture-compression-etc2
texture-compression-astc

支援多少是取決於硬體能力的，根據官方的討論（Github Issue 2083），全平臺都要支援 BC 格式（又名 DXT、S3TC），或者 ETC2、ASTC 壓縮格式，以保證你可以用紋理壓縮能力。

強烈推薦使用超壓縮紋理格式（例如 Basis Universal），好處是可以無視裝置，它都能轉換到裝置支援的格式上，這樣就避免準備兩種格式的紋理了。

原作者寫了個庫，用於在 WebGL 和 WebGPU 種載入壓縮紋理，參考 Github toji/web-texture-tool

WebGL 對壓縮紋理的支援不太好，現在 WebGPU 原生就支援，所以儘可能用吧！

4 使用 glTF 處理庫 gltf-transform

這是一個開源庫，你可以在 GitHub 上找到它，它提供了命令列工具。

譬如，你可以使用它來壓縮 glb 種的紋理：

> gltf-transform etc1s paddle.glb paddle2.glb
paddle.glb (11.92 MB) → paddle2.glb (1.73 MB)

做到了視覺無損，但是從 Blender 匯出的這個模型的體積能小很多。原模型的紋理是 5 張 2048 x 2048 的 PNG 圖。

這庫除了壓縮紋理，還能縮放紋理，重取樣，給幾何資料附加 Google Draco 壓縮等諸多功能。最終優化下來，glb 的體積只是原來的 5% 不到。

> gltf-transform resize paddle.glb paddle2.glb --width 1024 --height 1024
> gltf-transform etc1s paddle2.glb paddle2.glb
> gltf-transform resample paddle2.glb paddle2.glb
> gltf-transform dedup paddle2.glb paddle2.glb
> gltf-transform draco paddle2.glb paddle2.glb

  paddle.glb (11.92 MB) → paddle2.glb (596.46 KB)

5 緩衝資料上載

WebGPU 中有很多種方式將資料傳入緩衝，writeBuffer() 方法不一定是錯誤用法。當你在 wasm 中呼叫 WebGPU 時，你應該優先考慮 writeBuffer() 這個 API，這樣就避免了額外的緩衝複製操作。

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 16 * Float32Array.BYTES_PER_ELEMENT,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

// 當投影矩陣改變時（例如 window 改變了大小）
function updateProjectionMatrixBuffer(projectionMatrix) {
  const projectionMatrixArray = projectionMatrix.getAsFloat32Array();
  gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray);
}

原作者指出，建立 buffer 時設 mappedAtCreation 並不是必須的，有時候建立時不對映也是可以的，譬如對 glTF 中有關的緩衝載入。

6 推薦非同步建立 pipeline

如果你不是馬上就要渲染管線或者計算管線，儘量用 createRenderPipelineAsync 和 createComputePipelineAsync 這倆 API 來替代同步建立。

同步建立 pipeline，有可能會在底層去把管線的有關資源進行編譯，這會中斷 GPU 有關的步驟。

而對於非同步建立，pipeline 沒準備好就不會 resolve Promise，也就是說可以優先讓 GPU 當前在乾的事情先做完，再去折騰我所需要的管線。

下面看看對比程式碼：

// 同步建立計算管線
const computePipeline = gpuDevice.createComputePipeline({/* ... */})

computePass.setPipeline(computePipeline)
computePass.dispatch(32, 32) // 此時觸發排程，著色器可能在編譯，會卡

再看看非同步建立的程式碼：

// 非同步建立計算管線
const asyncComputePipeline = await gpuDevice.createComputePipelineAsync({/* ... */})

computePass.setPipeline(asyncComputePipeline)
computePass.dispatch(32, 32) // 這個時候著色器早已編譯好，沒有卡頓，棒棒噠

7 慎用隱式管線佈局

隱式管線佈局，尤其是獨立的計算管線，或許對寫 js 的時候很爽，但是這麼做會帶來倆潛在問題：

中斷共享資源繫結組
更新著色器時發生點奇怪的事情

如果你的情況特別簡單，可以使用隱式管線佈局，但是能用顯式建立管線佈局就顯式建立。

下面就是所謂的隱式管線佈局的建立方式，先建立的管線物件，而後呼叫管線的 getBindGroupLayout() API 推斷著色器程式碼中所需的管線佈局物件。

const computePipeline = await gpuDevice.createComputePipelineAsync({
  // 不傳遞佈局物件
  compute: {
    module: computeModule,
    entryPoint: 'computeMain'
  }
})

const computeBindGroup = gpuDevice.createBindGroup({
  // 獲取隱式管線佈局物件
  layout: computePipeline.getBindGroupLayout(0),
  entries: [{
    binding: 0,
    resource: { buffer: storageBuffer },
  }]
})

7 共享資源繫結組與繫結組佈局物件

如果在渲染/計算過程中，有一些數值是不會變但是頻繁要用的，這種情況你可以建立一個簡單一點的資源繫結組佈局，可用於任意一個使用了同一號繫結組的管線物件上。

首先，建立資源繫結組及其佈局：

// 建立一個相機 UBO 的資源繫結組佈局及其繫結組本體
const cameraBindGroupLayout = device.createBindGroupLayout({
  label: `Camera uniforms BindGroupLayout`,
  entries: [{
    binding: 0,
    visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
    buffer: {},
  }]
})

const cameraBindGroup = gpu.device.createBindGroup({
  label: `Camera uniforms BindGroup`,
  layout: cameraBindGroupLayout,
  entries: [{
    binding: 0,
    resource: { buffer: cameraUniformsBuffer, },
  }],
})

隨後，建立兩條渲染管線，注意到這兩條管線都用到了兩個資源繫結組，有區別的地方就是用的材質資源繫結組是不一樣的，共用了相機資源繫結組：

const renderPipelineA = gpuDevice.createRenderPipeline({
  label: `Render Pipeline A`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutA]),
  /* Etc... */
});

const renderPipelineB = gpuDevice.createRenderPipeline({
  label: `Render Pipeline B`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutB]),
  /* Etc... */
});

最後，在渲染迴圈的每一幀中，你只需設定一次相機的資源繫結組，以減少 CPU ~ GPU 的資料傳遞：

const renderPass = commandEncoder.beginRenderPass({/* ... */});

// 只設定一次相機的資源繫結組
renderPass.setBindGroup(0, cameraBindGroup);

for (const pipeline of activePipelines) {
  renderPass.setPipeline(pipeline.gpuRenderPipeline)
  for (const material of pipeline.materials) {
      // 而對於管線中的材質資源繫結組，就分別設定了
    renderPass.setBindGroup(1, material.gpuBindGroup)
    
    // 此處設定 VBO 併發出繪製指令，略
    for (const mesh of material.meshes) {
      renderPass.setVertexBuffer(0, mesh.gpuVertexBuffer)
      renderPass.draw(mesh.drawCount)
    }
  }
}

renderPass.endPass()

原作附帶資訊

作者：Brandon Jones，推特 @Tojiro
原幻燈片：https://docs.google.com/prese...
更多額外閱讀：https://toji.github.io/webgpu...
一個很棒的原生 WebGPU 教程（英文）：https://alain.xyz/blog/raw-we...
對於紋理的對比細節：https://toji.github.io/webgpu...
對於緩衝上載的細節：https://toji.github.io/webgpu...

WebGPU 的幾個最佳實踐