Rust圖形庫gfx-hal 逐幀渲染流程介紹

熊皮皮發表於2019-03-04

文件列表見:Rust 移動端跨平臺複雜圖形渲染專案開發系列總結(目錄)

gfx-hal介面以1:1模仿Vulkan,下面改用Vulkan介面作說明。由於Vulkan介面粒度過細,比OpenGL / ES難學數倍。根據個人經驗,對於移動端圖形開發者,照著OpenGL ES的介面講解Vulkan可降低學習難度。從逐幀渲染部分開始學習,跳過這些資料結構的初始化過程,有利於把握Vulkan的核心流程。

OpenGL / ES 逐幀渲染流程示例

// 準備渲染目標環境
glBindFramebuffer();
glFramebufferTexture2D(); glCheckFramebufferStatus(); // 假如渲染到紋理
glViewport(x, y, width, height);
// 準備Shader需要讀取的資料
glUseProgram(x);
glBindBuffer(i)
loop i in 0..VertexVarCount {
    glEnableVertexAttribArray(i);
    glVertexAttribPointer(i, ...); 
}
loop i in 0..UniformVarCount {
    switch UniformType {
        case NoTexture: glUniformX(i, data); break;
        case Texture: {
            glActiveTexture(j);
            glBindTexture(type, texture_name);
            glUniform1i(location, j);
            break;
        }
        default:ERROR();
    }
}
// 配置其他Fragment操作,比如glBlend, glStencil
glDrawArrays/Elements/ArraysInstanced...
// 到此完成Draw Call,視情況呼叫EGL函式交換前後幀緩衝區,非GL函式,
// 渲染到紋理則無此操作。
// 為了不干擾後續繪製,恢復剛才設定的Fragment操作為預設值。
eglSwapbuffers()/[EAGLContext presentRenderbuffer]; 
複製程式碼

可見,OpenGL / ES的介面遮蔽了絕大部分細節,整體程式碼量顯得很少,但初學時也不好理解,用久了就成套路,覺得就該這樣,以致於第一次接觸Vulkan發現很多細節之前完全不瞭解,有點懵。

OpenGL / ES形成套路後的缺點是,出錯的第一時間很難定位出是專案程式碼的問題,比如狀態機沒設定好,還是驅動的問題,在iOS上還好,Android真是眼黑。我猜你會說有各廠家的Profile工具和Google gapid,都特麼不好用,高通的技術支援建議我們用Android 8.0 + Root裝置。但是,往往出問題的都是Android 4.x,所以,這特麼在逗我們呢。

渲染到檢視

gfx-hal(Vulkan)逐幀渲染到檢視的核心呼叫流程如下所示:

EventSource ->[CommandPool -> ComanndBuffer
                -> Submit -> Submission
                -> QueueGroup -> CommandQueue]
-> GraphicsHardware
複製程式碼

說明:

  • EventSource:表示訊號源,比如相機回撥一幀影像、螢幕的vsync訊號、使用者輸入等。
  • CommandQueue:用於執行不同型別任務的佇列,比如渲染任務、計算任務。
  • QueueGroup:CommandQueue集合
  • GraphicsHardware:圖形硬體

具體流程程式碼:

  • 重置Fence,給後面提交Submission到佇列使用。
    device.reset_fence(&frame_fence);
    複製程式碼
  • 重置CommandPool,即重置從此Pool中建立的CommandBuffer。如果CommandBuffer還在中,需要開發者實現資源同步操作。
    command_pool.reset();
    複製程式碼
  • 從SwapChain獲取Image索引
    let frame = swap_chain.acquire_image(!0, FrameSync::Semaphore(&mut frame_semaphore));
    複製程式碼
  • 通過CommandPool建立、配置CommandBuffer,命令錄製結束後得到有效的Submit物件
    let mut cmd_buffer = command_pool.acquire_command_buffer(false);
    // 一系列類似OpenGL / ES的Fragment操作、繫結資料到Program的配置
    // 兩個值得注意的Pipeline操作
    cmd_buffer.bind_graphics_pipeline(&pipeline);
    cmd_buffer.bind_graphics_descriptor_sets(&pipeline_layout, 0, Some(&desc_set), &[]);
    // 聯合RenderPass的操作
    let mut encoder = cmd_buffer.begin_render_pass_inline(&render_pass,...);
    let submit = cmd_buffer.finish()
    複製程式碼
  • 通過Submit建立Submission
    let submission = Submission::new()
        .wait_on(&[(&frame_semaphore, PipelineStage::BOTTOM_OF_PIPE)])
        .submit(Some(submit));
    複製程式碼
  • 提交Submission到佇列
    queue.submit(submission, Some(&mut frame_fence));
    複製程式碼
  • 等待CPU編碼完成
    device.wait_for_fence(&frame_fence, !0);
    複製程式碼
  • 交換前後幀緩衝區,相當於eglSwapbuffers
    swap_chain.present(&mut queue_group.queues[0], frame, &[])
    複製程式碼

配置CommandBuffer的進一步介紹

OpenGL / ES 2/3.x沒CommandPoolCommandBuffer資料結構,除了最新的OpenGL小版本才加入了SPIR-V和Command,但OpenGL ES還沒更新。Metal的CommandBuffer介面定義不同於Vulkan。Metal建立MTLCommandBuffer,由Buffer與RenderPassDescriptor一起建立出 Enconder,然後打包本次渲染相關的資源,最後提交Buffer到佇列讓GPU執行。Vulkan基本把Metal的Encoder操作放到CommandBuffer,只留了很薄的Encoder操作。

總體流程:

  • 由Command Pool分配可用Command Buffer
  • 配置viewport等資訊
  • 配置頂點資料緩衝區
  • 配置Uniform與Buffer的對應關係
  • 設定輸出目標RenderPass
  • 設定繪製方式,draw/draw_indexed/draw_indirect等等
  • 結束配置

程式碼示例如下:

let submit = {
    // 從緩衝區中取出一個實際為RawCommandBuffer的例項,
    // 加上執行緒安全物件,組裝成執行緒安全的CommandBuffer例項,
    // 這是HAL的程式設計“套路”,還有很多這類資料結構
    let mut cmd_buffer = command_pool.acquire_command_buffer(false);

    cmd_buffer.set_viewports(0, &[viewport]);
    cmd_buffer.set_scissors(0, &[viewport.rect]);
    cmd_buffer.bind_graphics_pipeline(&pipeline);
    cmd_buffer.bind_vertex_buffers(0, pso::VertexBufferSet(vec![(&vertex_buffer, 0)]));
    cmd_buffer.bind_graphics_descriptor_sets(&pipeline_layout, 0, Some(&desc_set)); //TODO

    {
        let mut encoder = cmd_buffer.begin_render_pass_inline(
            &render_pass,
            &framebuffers[frame.id()],
            viewport.rect,
            &[command::ClearValue::Color(command::ClearColor::Float([0.8, 0.8, 0.8, 1.0]))],
        );
        encoder.draw(0..6, 0..1);
    }

    cmd_buffer.finish()
};
複製程式碼

這段程式碼顯示了CommandBuffer兩個很關鍵的操作:bind_graphics_pipeline(GraphicsPipeline)bind_graphics_descriptor_sets(PipelineLayout, DescriptorSet)。GraphicsPipeline相當於OpenGL / ES的Program,PipelineLayoutDescriptorSet描述了Shader的Uniform變數如何讀取Buffer的資料,這兩個資料結構的初始化極其複雜,我之前看了都想罵人,在此另起文件說明。

渲染到紋理

渲染到紋理(Render to Texture, RtT)場景沒SwapChain,此時要麼配置RenderPass.Attachment.format為紋理的格式,或者硬編碼。接著Submmit到Queue,流程就結束了,無需且無法呼叫swap_chain.present()。如果要獲取該CommandBuffer的GPU操作結束事件或耗時,新增相應的回撥函式給CommandBuffer即可。不扯淡了,無碼言卵,碼上見真相。

配置RenderPass.Attachment.format

核心流程

let render_pass = {
    // attachment是Render to Texture的關鍵
    let attachment = pass::Attachment {}
    let subpass = pass::SubpassDesc {}
    let dependency = pass::SubpassDependency {}
}    
複製程式碼

詳細內容:

let render_pass = {
    let attachment = pass::Attachment {
        format: Some(format),
        samples: 1,
        ops: pass::AttachmentOps::new(
            pass::AttachmentLoadOp::Clear,
            pass::AttachmentStoreOp::Store,
        ),
        stencil_ops: pass::AttachmentOps::DONT_CARE,
        layouts: image::Layout::Undefined..image::Layout::Present,
    };

    let subpass = pass::SubpassDesc {
        colors: &[(0, image::Layout::ColorAttachmentOptimal)],
        depth_stencil: None,
        inputs: &[],
        resolves: &[],
        preserves: &[],
    };

    let dependency = pass::SubpassDependency {
        passes: pass::SubpassRef::External..pass::SubpassRef::Pass(0),
        stages: PipelineStage::COLOR_ATTACHMENT_OUTPUT..PipelineStage::COLOR_ATTACHMENT_OUTPUT,
        accesses: image::Access::empty()
            ..(image::Access::COLOR_ATTACHMENT_READ | image::Access::COLOR_ATTACHMENT_WRITE),
    };

    device
        .create_render_pass(&[attachment], &[subpass], &[dependency])
        .expect("Can`t create render pass")
}; // End: RenderPass init
複製程式碼

提交到CommandQueue

和渲染到檢視一樣提交即可,少一步swap_chain.present()。如何驗證到這步就夠了呢?看原始碼是一種方案,如果是Metal,用Xcode Capture GPU Frame也是一種方案。如何對Cargo專案進行Xcode Capture GPU Frame?參考我另一個文件:Xcode External Build System 失敗的 Capture GPU Frame 經歷、解決方案與覆盤,血淚教訓。

// ... lots of previous stuff
queue.submit(submission, Some(&mut frame_fence)); // 大功告成
device.wait_for_fence(&frame_fence, !0);
複製程式碼

HAL相關資料結構定義

FrameSync定義

/// Synchronization primitives which will be signalled once a frame got retrieved.
///
/// The semaphore or fence _must_ be unsignalled.
pub enum FrameSync<`a, B: Backend> {
    /// Semaphore used for synchronization.
    ///
    /// Will be signaled once the frame backbuffer is available.
    Semaphore(&`a B::Semaphore),

    /// Fence used for synchronization.
    ///
    /// Will be signaled once the frame backbuffer is available.
    Fence(&`a B::Fence),
}
複製程式碼

CommandBuffer(關鍵資料結構)

/// A strongly-typed command buffer that will only implement methods that are valid for the operations
/// it supports.
pub struct CommandBuffer<`a, B: Backend, C, S: Shot = OneShot, L: Level = Primary> {
    pub(crate) raw: &`a mut B::CommandBuffer,
    pub(crate) _marker: PhantomData<(C, S, L)>
}
複製程式碼

Submit

/// Thread-safe finished command buffer for submission.
pub struct Submit<B: Backend, C, S, L>(pub(crate) B::CommandBuffer, pub(crate) PhantomData<(C, S, L)>);
impl<B: Backend, C, S, L> Submit<B, C, S, L> {
    fn new(buffer: B::CommandBuffer) -> Self {
        Submit(buffer, PhantomData)
    }
}
unsafe impl<B: Backend, C, S, L> Send for Submit<B, C, S, L> {}
複製程式碼

Submission

/// Submission information for a command queue, generic over a particular
/// backend and a particular queue type.
pub struct Submission<`a, B: Backend, C> {
    cmd_buffers: SmallVec<[Cow<`a, B::CommandBuffer>; 16]>,
    wait_semaphores: SmallVec<[(&`a B::Semaphore, pso::PipelineStage); 16]>,
    signal_semaphores: SmallVec<[&`a B::Semaphore; 16]>,
    marker: PhantomData<C>,
}

/////////////////////////////// submit介面 /////////////////////////////////
/// Append a new list of finished command buffers to this submission.
///
/// All submits for this call must be of the same type.
/// Submission will be automatically promoted to to the minimum required capability
/// to hold all passed submits.
pub fn submit<I, K>(mut self, submits: I) -> Submission<`a, B, <(C, K) as Upper>::Result>
where
    I: IntoIterator,
    I::Item: Submittable<`a, B, K, Primary>,
    (C, K): Upper
{
    self.cmd_buffers.extend(submits.into_iter().map(
        |s| { unsafe { s.into_buffer() } }
    ));
    Submission {
        cmd_buffers: self.cmd_buffers,
        wait_semaphores: self.wait_semaphores,
        signal_semaphores: self.signal_semaphores,
        marker: PhantomData,
    }
}
複製程式碼

Vulkan與Metal的CommandBuffer複用與效能等問題討論

根據實踐,持續更新。

CommandBuffer重用

Metal的CommandBuffer一旦Commit到Queue,則不能再次使用。Vulkan可多次提交。

After a command buffer has been committed for execution, the only valid operations on the command buffer are to wait for it to be scheduled or completed (using synchronous calls or handler blocks) and to check the status of the command buffer execution. When used, scheduled and completed handlers are blocks that are invoked in execution order. These handlers should perform quickly; if expensive or blocking work needs to be scheduled, defer that work to another thread.

In a multithreaded app, it’s advisable to break your overall task into subtasks that can be encoded separately. Create a command buffer for each chunk of work, then call the enqueue() method on these command buffer objects to establish the order of execution. Fill each buffer object (using multiple threads) and commit them. The command queue automatically schedules and executes these command buffers as they become available.

developer.apple.com/documentati…

提交到佇列的函式名區別

提交CommandBuffer到Queue,Metal和Vulkan用了不同的單詞。Metal = commit(),Vulkan = submit()

相關文章