剖析虛幻渲染體系(03)- 渲染機制

0嚮往0發表於2021-03-28

 

 

3.1 本篇概述和基礎

3.1.1 渲染機制概述

本篇主要講述UE怎麼將場景的物體怎麼組織成一個個Draw Call,期間做了那些優化和處理以及場景渲染器是如何渲染整個場景的。主要涉及的內容有:

  • 模型繪製流程。
  • 動態和靜態渲染路徑。
  • 場景渲染器。
  • 涉及的基礎概念和優化技術。
  • 核心類和介面的程式碼剖析。

後面的章節會具體涉及這些技術。

3.1.2 渲染機制基礎

按慣例,為了更好地切入本篇主題,先闡述或回顧一下本篇將會涉及的一些基礎概念和型別。

型別 解析
UPrimitiveComponent 圖元元件,是所有可渲染或擁有物理模擬的物體父類。是CPU層裁剪的最小粒度單位。
FPrimitiveSceneProxy 圖元場景代理,是UPrimitiveComponent在渲染器的代表,映象了UPrimitiveComponent在渲染執行緒的狀態。
FPrimitiveSceneInfo 渲染器內部狀態(描述了FRendererModule的實現),相當於融合了UPrimitiveComponent and FPrimitiveSceneProxy。只存在渲染器模組,所以引擎模組無法感知到它的存在。
FScene 是UWorld在渲染模組的代表。只有加入到FScene的物體才會被渲染器感知到。渲染執行緒擁有FScene的所有狀態(遊戲執行緒不可直接修改)。
FSceneView 描述了FScene內的單個檢視(view),同個FScene允許有多個view,換言之,一個場景可以被多個view繪製,或者多個view同時被繪製。每一幀都會建立新的view例項。
FViewInfo view在渲染器的內部代表,只存在渲染器模組,引擎模組不可見。
FSceneRenderer 每幀都會被建立,封裝幀間臨時資料。下派生FDeferredShadingSceneRenderer(延遲著色場景渲染器)和FMobileSceneRenderer(移動端場景渲染器),分別代表PC和移動端的預設渲染器。
FMeshBatchElement 單個網格模型的資料,包含網格渲染中所需的部分資料,如頂點、索引、UniformBuffer及各種標識等。
FMeshBatch 存著一組FMeshBatchElement的資料,這組FMeshBatchElement的資料擁有相同的材質和頂點緩衝。
FMeshDrawCommand 完整地描述了一個Pass Draw Call的所有狀態和資料,如shader繫結、頂點資料、索引資料、PSO快取等。
FMeshPassProcessor 網格渲染Pass處理器,負責將場景中感興趣的網格物件執行處理,將其由FMeshBatch物件轉成一個或多個FMeshDrawCommand。

需要特意指出,以上概念中除了UPrimitiveComponent是屬於遊戲執行緒的物件,其它皆屬於渲染執行緒。

 

3.2 模型繪製管線

3.2.1 模型繪製管線概覽

在學習OpenGL或DirectX等圖形API時,想必大家肯定都接觸過類似的程式碼(以OpenGL畫三角形為例):

void DrawTriangle()
{
    // 構造三角形頂點和索引資料.
    float vertices[] = {
         0.5f,  0.5f, 0.0f,  // top right
         0.5f, -0.5f, 0.0f,  // bottom right
        -0.5f, -0.5f, 0.0f,  // bottom left
        -0.5f,  0.5f, 0.0f   // top left 
    };
    unsigned int indices[] = {
        0, 1, 3,  // first Triangle
        1, 2, 3   // second Triangle
    };
    
    // 建立GPU側的資源並繫結.
    unsigned int VBO, VAO, EBO;
    glGenVertexArrays(1, &VAO);
    glGenBuffers(1, &VBO);
    glGenBuffers(1, &EBO);
    glBindVertexArray(VAO);

    glBindBuffer(GL_ARRAY_BUFFER, VBO);
    glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);

    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, EBO);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);

    glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(float), (void*)0);
    glEnableVertexAttribArray(0);

    glBindBuffer(GL_ARRAY_BUFFER, 0); 
    glBindVertexArray(0); 

    // 清理背景
    glClearColor(0.2f, 0.3f, 0.3f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    
    // 繪製三角形
    glUseProgram(shaderProgram);
    glBindVertexArray(VAO);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
}

以上的Hello Triangle大致經過了幾個階段:構造CPU資源,建立和繫結GPU側資源,呼叫繪製介面。這對於簡單的應用程式,或者學習圖形學來言,直接呼叫圖形學API可以簡化過程,直奔主題。但是,對於商業遊戲引擎而言,需要以每秒數十幀渲染複雜的場景(成百上千個Draw Call,數十萬甚至數百萬個三角形),肯定不能直接採用簡單的圖形API呼叫。

商業遊戲引擎需要在真正呼叫圖形API之前,需要做很多操作和優化,諸如遮擋剔除、動態和靜態合拼、動態Instance、快取狀態和命令、生成中間指令再轉譯成圖形API指令等等。

在UE4.21之前,為了達到上述的目的,採用了網格渲染流程(Mesh Draw Pipeline),示意圖如下:

UE4.21及之前版本的網格繪製流程。

大致過程是渲染之時,渲染器會遍歷場景的所有經過了可見性測試的PrimitiveSceneProxy物件,利用其介面收集不同的FMeshBatch,然後在不同的渲染Pass中遍歷這些FMeshBatch,利用Pass對應的DrawingPolicy將其轉成RHI的命令列表,最後才會生成對應圖形API的指令,提交到GPU硬體中執行。

UE4.22在此基礎上,為了更好地做渲染優化,給網格渲染管線進行了一次比較大的重構,拋棄了低效率的DrawingPolicy,用PassMeshProcessor取而代之,在FMeshBatch和RHI命令之間增加了一個概念FMeshDrawCommand,以便更大程度更加可控地排序、快取、合併繪製指令:

UE4.22重構後新的網格繪製流程。增加了新的FMeshDrawCommand和FMeshPassProcessor等概念及操作。

這樣做的目的主要有兩個:

  • 支援RTX的實時光線追蹤。光線追蹤需要遍歷整個場景的物體,要保留整個場景的shader資源。
  • GPU驅動的渲染管線。包含GPU裁剪,所以CPU沒法知道每一幀的可見性,但又不能每幀建立整個場景的繪製指令,否則無法達成實時渲染。

為了達成上述的目的,重構後的管線採取了更多聚合快取措施,體現在:

  • 靜態圖元在加入場景時就建立繪製指令,然後快取。
  • 允許RHI層做盡可能多的預處理。
    • shader Binding Table Entry。
    • Graphics Pipeline State。
  • 避免靜態網格每幀都重建繪製指令。

重構了模型渲染管線之後,多數場景案例下,DepthPass和BasePass可以減少數倍的Draw Call數量,快取海量的命令:

Fortnite的一個測試場景在新舊網格渲染管線下的渲染資料對比。可見在新的網格渲染流程下,Draw Call得到了大量的降低,命令快取數量也巨大。

本節的後續章節就以重構後的網格繪製流程作為剖析物件。

3.2.2 從FPrimitiveSceneProxy到FMeshBatch

在上一篇中,已經解析過FPrimitiveSceneProxy是遊戲執行緒UPrimitiveComponent在渲染執行緒的映象資料。而FMeshBatch是本節才接觸的新概念,它它包含了繪製Pass所需的所有資訊,解耦了網格Pass和FPrimitiveSceneProxy,所以FPrimitiveSceneProxy並不知道會被哪些Pass繪製。

FMeshBatch和FMeshBatchElement的主要宣告如下:

// Engine\Source\Runtime\Engine\Public\MeshBatch.h

// 網格批次元素, 儲存了FMeshBatch單個網格所需的資料.
struct FMeshBatchElement
{
    // 網格的UniformBuffer, 如果使用GPU Scene, 則需要為null.
    FRHIUniformBuffer* PrimitiveUniformBuffer;
    // 網格的UniformBuffer在CPU側的資料.
    const TUniformBuffer<FPrimitiveUniformShaderParameters>* PrimitiveUniformBufferResource;
    // 索引緩衝.
    const FIndexBuffer* IndexBuffer;

    union 
    {
        uint32* InstanceRuns;
        class FSplineMeshSceneProxy* SplineMeshSceneProxy;
    };
    // 使用者資料.
    const void* UserData;
    void* VertexFactoryUserData;

    FRHIVertexBuffer* IndirectArgsBuffer;
    uint32 IndirectArgsOffset;

    // 圖元ID模式, 有PrimID_FromPrimitiveSceneInfo(GPU Scene模式)和PrimID_DynamicPrimitiveShaderData(每個網格擁有自己的UniformBuffer)
    // 只可被渲染器修改.
    EPrimitiveIdMode PrimitiveIdMode : PrimID_NumBits + 1;
    uint32 DynamicPrimitiveShaderDataIndex : 24;

    uint32 FirstIndex;
    /** When 0, IndirectArgsBuffer will be used. */
    uint32 NumPrimitives;

    // Instance數量
    uint32 NumInstances;
    uint32 BaseVertexIndex;
    uint32 MinVertexIndex;
    uint32 MaxVertexIndex;
    int32 UserIndex;
    float MinScreenSize;
    float MaxScreenSize;

    uint32 InstancedLODIndex : 4;
    uint32 InstancedLODRange : 4;
    uint32 bUserDataIsColorVertexBuffer : 1;
    uint32 bIsSplineProxy : 1;
    uint32 bIsInstanceRuns : 1;

    // 獲取圖元數量.
    int32 GetNumPrimitives() const
    {
        if (bIsInstanceRuns && InstanceRuns)
        {
            int32 Count = 0;
            for (uint32 Run = 0; Run < NumInstances; Run++)
            {
                Count += NumPrimitives * (InstanceRuns[Run * 2 + 1] - InstanceRuns[Run * 2] + 1);
            }
            return Count;
        }
        else
        {
            return NumPrimitives * NumInstances;
        }
    }
};


// 網格批次.
struct FMeshBatch
{
    // 這組FMeshBatchElement的資料擁有相同的材質和頂點緩衝。
    // TInlineAllocator<1>表明Elements陣列至少有1個元素.
    TArray<FMeshBatchElement,TInlineAllocator<1> > Elements; 
    const FVertexFactory* VertexFactory; // 頂點工廠.
    const FMaterialRenderProxy* MaterialRenderProxy; // 渲染所用的材質.

    uint16 MeshIdInPrimitive; // 圖元所在的網格id, 用於相同圖元的穩定排序.
    int8 LODIndex; // 網格LOD索引, 用於LOD的平滑過渡.
    uint8 SegmentIndex; // 子模型索引.
    
    // 裁剪標記.
    uint32 ReverseCulling : 1;
    uint32 bDisableBackfaceCulling : 1;

    // 特定渲染Pass的關聯標記.
    uint32 CastShadow        : 1; // 是否在陰影Pass中渲染.
    uint32 bUseForMaterial    : 1; // 是否在需要材質的Pass中渲染.
    uint32 bUseForDepthPass : 1; // 是否在深度Pass中渲染.
    uint32 bUseAsOccluder    : 1; // 標明是否遮擋體.
    uint32 bWireframe        : 1; // 是否線框模式.

    uint32 Type : PT_NumBits; // 圖元型別, 如PT_TriangleList(預設), PT_LineList, ...
    uint32 DepthPriorityGroup : SDPG_NumBits; // 深度優先順序組, 如SDPG_World (default), SDPG_Foreground

    // 其它標記和資料
    const FLightCacheInterface* LCI;
    FHitProxyId BatchHitProxyId;
    float TessellationDisablingShadowMapMeshSize;
    
    uint32 bCanApplyViewModeOverrides : 1;
    uint32 bUseWireframeSelectionColoring : 1;
    uint32 bUseSelectionOutline : 1;
    uint32 bSelectable : 1;
    uint32 bRequiresPerElementVisibility : 1;
    uint32 bDitheredLODTransition : 1;
    uint32 bRenderToVirtualTexture : 1;
    uint32 RuntimeVirtualTextureMaterialType : RuntimeVirtualTexture::MaterialType_NumBits;
    
    (......)
    
    // 工具介面.
    bool IsTranslucent(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsDecal(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsDualBlend(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool UseForHairStrands(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsMasked(ERHIFeatureLevel::Type InFeatureLevel) const;
    int32 GetNumPrimitives() const;
    bool HasAnyDrawCalls() const;
};

由此可見,FMeshBatch記錄了一組擁有相同材質和頂點工廠的FMeshBatchElement資料(如下圖),還儲存了渲染Pass特定的標記和其它所需的資料,為的就是在網格渲染後續流程中使用和二次加工。

一個FMeshBatch擁有一組FMeshBatchElement、一個頂點工廠和一個材質例項,同一個FMeshBatch的所有FMeshBatchElement共享著相同的材質和頂點緩衝(可可被視為Vertex Factory)。但通常情況(大多數情況)下,FMeshBatch只會有一個FMeshBatchElement。

場景渲染器FSceneRenderer在渲染之初,會執行可見性測試和剔除,以便剔除被遮擋和被隱藏的物體,在此階段的末期會呼叫GatherDynamicMeshElements收集當前場景所有的FPrimitiveSceneProxy,流程示意程式碼如下:

void FSceneRender::Render(FRHICommandListImmediate& RHICmdList)
{
    bool FDeferredShadingSceneRenderer::InitViews((FRHICommandListImmediate& RHICmdList,  ...)
    {
        void FSceneRender::ComputeViewVisibility(FRHICommandListImmediate& RHICmdList, ...)
        {
            FSceneRender::GatherDynamicMeshElements(Views, Scene, ViewFamily, DynamicIndexBuffer, DynamicVertexBuffer, DynamicReadBuffer, HasDynamicMeshElementsMasks, HasDynamicEditorMeshElementsMasks, HasViewCustomDataMasks, MeshCollector);
        }
    }
}

再進入FSceneRender::GatherDynamicMeshElements看看執行了哪些邏輯:

// Engine\Source\Runtime\Renderer\Private\SceneVisibility.cpp

void FSceneRenderer::GatherDynamicMeshElements(
    TArray<FViewInfo>& InViews, 
    const FScene* InScene, 
    const FSceneViewFamily& InViewFamily, 
    FGlobalDynamicIndexBuffer& DynamicIndexBuffer,
    FGlobalDynamicVertexBuffer& DynamicVertexBuffer,
    FGlobalDynamicReadBuffer& DynamicReadBuffer,
    const FPrimitiveViewMasks& HasDynamicMeshElementsMasks, 
    const FPrimitiveViewMasks& HasDynamicEditorMeshElementsMasks, 
    const FPrimitiveViewMasks& HasViewCustomDataMasks,
    FMeshElementCollector& Collector)
{
    (......)
    
    int32 NumPrimitives = InScene->Primitives.Num();

    int32 ViewCount = InViews.Num();
    {
        // 處理FMeshElementCollector.
        Collector.ClearViewMeshArrays();
        for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
        {
            Collector.AddViewMeshArrays(
                &InViews[ViewIndex], 
                &InViews[ViewIndex].DynamicMeshElements,
                &InViews[ViewIndex].SimpleElementCollector,
                &InViews[ViewIndex].DynamicPrimitiveShaderData, 
                InViewFamily.GetFeatureLevel(),
                &DynamicIndexBuffer,
                &DynamicVertexBuffer,
                &DynamicReadBuffer);
        }

        const bool bIsInstancedStereo = (ViewCount > 0) ? (InViews[0].IsInstancedStereoPass() || InViews[0].bIsMobileMultiViewEnabled) : false;
        const EShadingPath ShadingPath = Scene->GetShadingPath();
        
        // 遍歷場景所有的圖元.
        for (int32 PrimitiveIndex = 0; PrimitiveIndex < NumPrimitives; ++PrimitiveIndex)
        {
            const uint8 ViewMask = HasDynamicMeshElementsMasks[PrimitiveIndex];

            if (ViewMask != 0) // 只處理沒有被遮擋或隱藏的物體
            {
                // Don't cull a single eye when drawing a stereo pair
                const uint8 ViewMaskFinal = (bIsInstancedStereo) ? ViewMask | 0x3 : ViewMask;

                FPrimitiveSceneInfo* PrimitiveSceneInfo = InScene->Primitives[PrimitiveIndex];
                const FPrimitiveBounds& Bounds = InScene->PrimitiveBounds[PrimitiveIndex];
                // 將FPrimitiveSceneProxy的資訊設定到收集器中.
                Collector.SetPrimitive(PrimitiveSceneInfo->Proxy, PrimitiveSceneInfo->DefaultDynamicHitProxyId);
                // 設定動態網格自定義資料.
                SetDynamicMeshElementViewCustomData(InViews, HasViewCustomDataMasks, PrimitiveSceneInfo);

                // 標記DynamicMeshEndIndices的起始.
                if (PrimitiveIndex > 0)
                {
                    for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                    {
                        InViews[ViewIndex].DynamicMeshEndIndices[PrimitiveIndex - 1] = Collector.GetMeshBatchCount(ViewIndex);
                    }
                }
                
                // 獲取動態網格元素的資料.
                PrimitiveSceneInfo->Proxy->GetDynamicMeshElements(InViewFamily.Views, InViewFamily, ViewMaskFinal, Collector);

                // 標記DynamicMeshEndIndices的末尾.
                for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                {
                    InViews[ViewIndex].DynamicMeshEndIndices[PrimitiveIndex] = Collector.GetMeshBatchCount(ViewIndex);
                }
                
                // 處理MeshPass相關的資料和標記.
                for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                {
                    if (ViewMaskFinal & (1 << ViewIndex))
                    {
                        FViewInfo& View = InViews[ViewIndex];
                        const bool bAddLightmapDensityCommands = View.Family->EngineShowFlags.LightMapDensity && AllowDebugViewmodes();
                        const FPrimitiveViewRelevance& ViewRelevance = View.PrimitiveViewRelevanceMap[PrimitiveIndex];

                        const int32 LastNumDynamicMeshElements = View.DynamicMeshElementsPassRelevance.Num();
                        View.DynamicMeshElementsPassRelevance.SetNum(View.DynamicMeshElements.Num());

                        for (int32 ElementIndex = LastNumDynamicMeshElements; ElementIndex < View.DynamicMeshElements.Num(); ++ElementIndex)
                        {
                            const FMeshBatchAndRelevance& MeshBatch = View.DynamicMeshElements[ElementIndex];
                            FMeshPassMask& PassRelevance = View.DynamicMeshElementsPassRelevance[ElementIndex];
                            // 這裡會計算當前的MeshBatch會被哪些MeshPass引用, 從而加到view的對應MeshPass的陣列中.
                            ComputeDynamicMeshRelevance(ShadingPath, bAddLightmapDensityCommands, ViewRelevance, MeshBatch, View, PassRelevance, PrimitiveSceneInfo, Bounds);
                        }
                    }
                }
            }
        }
    }

    (......)
    
    // 收集器執行任務.
    MeshCollector.ProcessTasks();
}

上面的程式碼可知,收集動態圖後設資料時,會給每個View建立一個FMeshElementCollector的物件,以便收集場景中所有可見的FPrimitiveSceneProxy的網格資料。而中間有一句關鍵的程式碼PrimitiveSceneInfo->Proxy->GetDynamicMeshElements()就是給每個圖元物件向渲染器(收集器)新增可見圖元元素的機會,下面展開此函式展開看看(由於基類FPrimitiveSceneProxy的這個介面是空函式體,未做任何操作,所以這個收集操作由具體的子類實現,這裡以子類FSkeletalMeshSceneProxy的實現為例):

// Engine\Source\Runtime\Engine\Private\SkeletalMesh.cpp

void FSkeletalMeshSceneProxy::GetDynamicMeshElements(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, FMeshElementCollector& Collector) const
{
    GetMeshElementsConditionallySelectable(Views, ViewFamily, true, VisibilityMap, Collector);
}

void FSkeletalMeshSceneProxy::GetMeshElementsConditionallySelectable(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, bool bInSelectable, uint32 VisibilityMap, FMeshElementCollector& Collector) const
{
    (......)

    const int32 LODIndex = MeshObject->GetLOD();
    const FSkeletalMeshLODRenderData& LODData = SkeletalMeshRenderData->LODRenderData[LODIndex];

    if( LODSections.Num() > 0 && LODIndex >= SkeletalMeshRenderData->CurrentFirstLODIdx )
    {
        const FLODSectionElements& LODSection = LODSections[LODIndex]
        
        // 根據LOD遍歷所有的子模型, 加入到collector中.
        for (FSkeletalMeshSectionIter Iter(LODIndex, *MeshObject, LODData, LODSection); Iter; ++Iter)
        {
            const FSkelMeshRenderSection& Section = Iter.GetSection();
            const int32 SectionIndex = Iter.GetSectionElementIndex();
            const FSectionElementInfo& SectionElementInfo = Iter.GetSectionElementInfo();

            bool bSectionSelected = false;
            if (MeshObject->IsMaterialHidden(LODIndex, SectionElementInfo.UseMaterialIndex) || Section.bDisabled)
            {
                continue;
            }
            // 將指定LODIndex和SectionIndex加入到Collector中.
            GetDynamicElementsSection(Views, ViewFamily, VisibilityMap, LODData, LODIndex, SectionIndex, bSectionSelected, SectionElementInfo, bInSelectable, Collector);
        }
    }
    
    (......)
}

void FSkeletalMeshSceneProxy::GetDynamicElementsSection(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, const FSkeletalMeshLODRenderData& LODData, const int32 LODIndex, const int32 SectionIndex, bool bSectionSelected, const FSectionElementInfo& SectionElementInfo, bool bInSelectable, FMeshElementCollector& Collector ) const
{
    const FSkelMeshRenderSection& Section = LODData.RenderSections[SectionIndex];
    const bool bIsSelected = false;
    const bool bIsWireframe = ViewFamily.EngineShowFlags.Wireframe;

    for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
    {
        if (VisibilityMap & (1 << ViewIndex))
        {
            const FSceneView* View = Views[ViewIndex];
            
            // 從Colloctor分配一個FMeshBatch.
            FMeshBatch& Mesh = Collector.AllocateMesh();
            
            // 建立基礎的網格批次物件(FMeshBatchElement例項).
            CreateBaseMeshBatch(View, LODData, LODIndex, SectionIndex, SectionElementInfo, Mesh);
            
            if(!Mesh.VertexFactory)
            {
                // hide this part
                continue;
            }

            Mesh.bWireframe |= bForceWireframe;
            Mesh.Type = PT_TriangleList;
            Mesh.bSelectable = bInSelectable;
            
            // 設定首個FMeshBatchElement物件.
            FMeshBatchElement& BatchElement = Mesh.Elements[0];
            const bool bRequiresAdjacencyInformation = RequiresAdjacencyInformation( SectionElementInfo.Material, Mesh.VertexFactory->GetType(), ViewFamily.GetFeatureLevel() );
            if ( bRequiresAdjacencyInformation )
            {
                check(LODData.AdjacencyMultiSizeIndexContainer.IsIndexBufferValid() );
                BatchElement.IndexBuffer = LODData.AdjacencyMultiSizeIndexContainer.GetIndexBuffer();
                Mesh.Type = PT_12_ControlPointPatchList;
                BatchElement.FirstIndex *= 4;
            }

            BatchElement.MinVertexIndex = Section.BaseVertexIndex;
            Mesh.ReverseCulling = IsLocalToWorldDeterminantNegative();
            Mesh.CastShadow = SectionElementInfo.bEnableShadowCasting;
            Mesh.bCanApplyViewModeOverrides = true;
            Mesh.bUseWireframeSelectionColoring = bIsSelected;
            
            (......)

            if ( ensureMsgf(Mesh.MaterialRenderProxy, TEXT("GetDynamicElementsSection with invalid MaterialRenderProxy. Owner:%s LODIndex:%d UseMaterialIndex:%d"), *GetOwnerName().ToString(), LODIndex, SectionElementInfo.UseMaterialIndex) &&
                 ensureMsgf(Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel), TEXT("GetDynamicElementsSection with invalid FMaterial. Owner:%s LODIndex:%d UseMaterialIndex:%d"), *GetOwnerName().ToString(), LODIndex, SectionElementInfo.UseMaterialIndex) )
            {
                // 將FMeshBatch新增到收集器中.
                Collector.AddMesh(ViewIndex, Mesh);
            }
            
            (......)
        }
    }
}

由此可見,FSkeletalMeshSceneProxy會根據不同的LOD索引,給每個Section網格新增一個FMeshBatch,每個FMeshBatch只有一個FMeshBatchElement例項。此外,FSceneRender::GatherDynamicMeshElements的邏輯中還有關鍵的一句ComputeDynamicMeshRelevance,它的作用是計算當前的MeshBatch會被哪些MeshPass引用,從而加到view的對應MeshPass的計數中:

// Engine\Source\Runtime\Renderer\Private\SceneVisibility.cpp

void ComputeDynamicMeshRelevance(EShadingPath ShadingPath, bool bAddLightmapDensityCommands, const FPrimitiveViewRelevance& ViewRelevance, const FMeshBatchAndRelevance& MeshBatch, FViewInfo& View, FMeshPassMask& PassMask, FPrimitiveSceneInfo* PrimitiveSceneInfo, const FPrimitiveBounds& Bounds)
{
    const int32 NumElements = MeshBatch.Mesh->Elements.Num();

    // 深度Pass/主Pass計數.
    if (ViewRelevance.bDrawRelevance && (ViewRelevance.bRenderInMainPass || ViewRelevance.bRenderCustomDepth || ViewRelevance.bRenderInDepthPass))
    {
        PassMask.Set(EMeshPass::DepthPass);
        View.NumVisibleDynamicMeshElements[EMeshPass::DepthPass] += NumElements;

        if (ViewRelevance.bRenderInMainPass || ViewRelevance.bRenderCustomDepth)
        {
            PassMask.Set(EMeshPass::BasePass);
            View.NumVisibleDynamicMeshElements[EMeshPass::BasePass] += NumElements;

            if (ShadingPath == EShadingPath::Mobile)
            {
                PassMask.Set(EMeshPass::MobileBasePassCSM);
                View.NumVisibleDynamicMeshElements[EMeshPass::MobileBasePassCSM] += NumElements;
            }

            if (ViewRelevance.bRenderCustomDepth)
            {
                PassMask.Set(EMeshPass::CustomDepth);
                View.NumVisibleDynamicMeshElements[EMeshPass::CustomDepth] += NumElements;
            }

            if (bAddLightmapDensityCommands)
            {
                PassMask.Set(EMeshPass::LightmapDensity);
                View.NumVisibleDynamicMeshElements[EMeshPass::LightmapDensity] += NumElements;
            }


            if (ViewRelevance.bVelocityRelevance)
            {
                PassMask.Set(EMeshPass::Velocity);
                View.NumVisibleDynamicMeshElements[EMeshPass::Velocity] += NumElements;
            }

            if (ViewRelevance.bOutputsTranslucentVelocity)
            {
                PassMask.Set(EMeshPass::TranslucentVelocity);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucentVelocity] += NumElements;
            }

            if (ViewRelevance.bUsesSingleLayerWaterMaterial)
            {
                PassMask.Set(EMeshPass::SingleLayerWaterPass);
                View.NumVisibleDynamicMeshElements[EMeshPass::SingleLayerWaterPass] += NumElements;
            }
        }
    }
    
    // 半透明及其它Pass計數.
    if (ViewRelevance.HasTranslucency()
        && !ViewRelevance.bEditorPrimitiveRelevance
        && ViewRelevance.bRenderInMainPass)
    {
        if (View.Family->AllowTranslucencyAfterDOF())
        {
            if (ViewRelevance.bNormalTranslucency)
            {
                PassMask.Set(EMeshPass::TranslucencyStandard);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyStandard] += NumElements;
            }

            if (ViewRelevance.bSeparateTranslucency)
            {
                PassMask.Set(EMeshPass::TranslucencyAfterDOF);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAfterDOF] += NumElements;
            }

            if (ViewRelevance.bSeparateTranslucencyModulate)
            {
                PassMask.Set(EMeshPass::TranslucencyAfterDOFModulate);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAfterDOFModulate] += NumElements;
            }
        }
        else
        {
            PassMask.Set(EMeshPass::TranslucencyAll);
            View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAll] += NumElements;
        }

        if (ViewRelevance.bDistortion)
        {
            PassMask.Set(EMeshPass::Distortion);
            View.NumVisibleDynamicMeshElements[EMeshPass::Distortion] += NumElements;
        }

        if (ShadingPath == EShadingPath::Mobile && View.bIsSceneCapture)
        {
            PassMask.Set(EMeshPass::MobileInverseOpacity);
            View.NumVisibleDynamicMeshElements[EMeshPass::MobileInverseOpacity] += NumElements;
        }
    }

    (......)
}

上面的程式碼中還涉及到收集器FMeshElementCollector,它的作用是收集指定view的所有可見MeshBatch資訊,宣告如下:

// Engine\Source\Runtime\Engine\Public\SceneManagement.h

class FMeshElementCollector
{
public:
    // 繪製點, 線, 面, 精靈的介面.
    FPrimitiveDrawInterface* GetPDI(int32 ViewIndex)
    {
        return SimpleElementCollectors[ViewIndex];
    }
    // 分配一個FMeshBatch物件.
    FMeshBatch& AllocateMesh()
    {
        const int32 Index = MeshBatchStorage.Add(1);
        return MeshBatchStorage[Index];
    }
    
    // 增加MeshBatch到收集器中. 新增時會初始化和設定相關資料, 再新增到MeshBatches列表中.
    void AddMesh(int32 ViewIndex, FMeshBatch& MeshBatch);
    
    // 資料獲取介面.
    FGlobalDynamicIndexBuffer& GetDynamicIndexBuffer();
    FGlobalDynamicVertexBuffer& GetDynamicVertexBuffer();
    FGlobalDynamicReadBuffer& GetDynamicReadBuffer();
    uint32 GetMeshBatchCount(uint32 ViewIndex) const;
    uint32 GetMeshElementCount(uint32 ViewIndex) const;
    ERHIFeatureLevel::Type GetFeatureLevel() const;

    void RegisterOneFrameMaterialProxy(FMaterialRenderProxy* Proxy);
    template<typename T, typename... ARGS>
    T& AllocateOneFrameResource(ARGS&&... Args);
    bool ShouldUseTasks() const;
    
    // 任務介面.
    void AddTask(TFunction<void()>&& Task)
    {
        ParallelTasks.Add(new (FMemStack::Get()) TFunction<void()>(MoveTemp(Task)));
    }
    void AddTask(const TFunction<void()>& Task)
    {
        ParallelTasks.Add(new (FMemStack::Get()) TFunction<void()>(Task));
    }
    void ProcessTasks();
    
protected:
    FMeshElementCollector(ERHIFeatureLevel::Type InFeatureLevel);

    // 設定FPrimitiveSceneProxy的資料.
    void SetPrimitive(const FPrimitiveSceneProxy* InPrimitiveSceneProxy, FHitProxyId DefaultHitProxyId)
    {
        check(InPrimitiveSceneProxy);
        PrimitiveSceneProxy = InPrimitiveSceneProxy;

        for (int32 ViewIndex = 0; ViewIndex < SimpleElementCollectors.Num(); ViewIndex++)
        {
            SimpleElementCollectors[ViewIndex]->HitProxyId = DefaultHitProxyId;
            SimpleElementCollectors[ViewIndex]->PrimitiveMeshId = 0;
        }

        for (int32 ViewIndex = 0; ViewIndex < MeshIdInPrimitivePerView.Num(); ++ViewIndex)
        {
            MeshIdInPrimitivePerView[ViewIndex] = 0;
        }
    }

    void ClearViewMeshArrays();

    // 向View新增一組Mesh.
    void AddViewMeshArrays(
        FSceneView* InView, 
        TArray<FMeshBatchAndRelevance,SceneRenderingAllocator>* ViewMeshes,
        FSimpleElementCollector* ViewSimpleElementCollector, 
        TArray<FPrimitiveUniformShaderParameters>* InDynamicPrimitiveShaderData,
        ERHIFeatureLevel::Type InFeatureLevel,
        FGlobalDynamicIndexBuffer* InDynamicIndexBuffer,
        FGlobalDynamicVertexBuffer* InDynamicVertexBuffer,
        FGlobalDynamicReadBuffer* InDynamicReadBuffer);

    TChunkedArray<FMeshBatch> MeshBatchStorage; // 儲存分配的所有FMeshBatch例項.
    TArray<TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>*, TInlineAllocator<2> > MeshBatches; // 需要被渲染的FMeshBatch例項
    TArray<int32, TInlineAllocator<2> > NumMeshBatchElementsPerView; // 每個view收集到的MeshBatchElement數量.
    TArray<FSimpleElementCollector*, TInlineAllocator<2> > SimpleElementCollectors; // 點線面精靈等簡單物體的收集器.

    TArray<FSceneView*, TInlineAllocator<2> > Views; // 收集器收集的FSceneView例項.
    TArray<uint16, TInlineAllocator<2> > MeshIdInPrimitivePerView; // Current Mesh Id In Primitive per view
    TArray<TArray<FPrimitiveUniformShaderParameters>*, TInlineAllocator<2> > DynamicPrimitiveShaderDataPerView; // view的動態圖後設資料, 用於更新到GPU Scene中.
    
    TArray<FMaterialRenderProxy*, SceneRenderingAllocator> TemporaryProxies;
    TArray<FOneFrameResource*, SceneRenderingAllocator> OneFrameResources;

    const FPrimitiveSceneProxy* PrimitiveSceneProxy; // 當前正在收集的PrimitiveSceneProxy

    // 全域性動態緩衝.
    FGlobalDynamicIndexBuffer* DynamicIndexBuffer;
    FGlobalDynamicVertexBuffer* DynamicVertexBuffer;
    FGlobalDynamicReadBuffer* DynamicReadBuffer;

    ERHIFeatureLevel::Type FeatureLevel;

    const bool bUseAsyncTasks; // 是否使用非同步任務.
    TArray<TFunction<void()>*, SceneRenderingAllocator> ParallelTasks; // 收集完動態網格資料後需要等待處理的任務列表.
};

FMeshElementCollector和View是一一對應關係,每個View在渲染之初都會建立一個收集器。收集器收集完對應view的可見圖元列表後,通常擁有一組需要渲染的FMeshBatch列表,以及它們的管理資料和狀態,為後續的流程收集和準備足夠的準備。

此外,FMeshElementCollector在收集完網格資料後,還可以指定需要等待處理的任務列表,以實現多執行緒並行處理的同步。

3.2.3 從FMeshBatch到FMeshDrawCommand

上一節內容講到收集完動態的MeshElement,實際上,緊接著會呼叫SetupMeshPass來建立FMeshPassProcessor

void FSceneRender::Render(FRHICommandListImmediate& RHICmdList)
{
    bool FDeferredShadingSceneRenderer::InitViews((FRHICommandListImmediate& RHICmdList,  ...)
    {
        void FSceneRender::ComputeViewVisibility(FRHICommandListImmediate& RHICmdList, ...)
        {
            // 收集動態MeshElement
            FSceneRender::GatherDynamicMeshElements(Views, Scene, ViewFamily, DynamicIndexBuffer, DynamicVertexBuffer, DynamicReadBuffer, HasDynamicMeshElementsMasks, HasDynamicEditorMeshElementsMasks, HasViewCustomDataMasks, MeshCollector);
            
            // 處理所有view的FMeshPassProcessor.
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
            {
                FViewInfo& View = Views[ViewIndex];
                if (!View.ShouldRenderView())
                {
                    continue;
                }
                
                // 處理指定view的FMeshPassProcessor.
                FViewCommands& ViewCommands = ViewCommandsPerView[ViewIndex];
                SetupMeshPass(View, BasePassDepthStencilAccess, ViewCommands);
            }
        }
    }
}

其中FSceneRenderer::SetupMeshPass邏輯和解釋如下:

void FSceneRenderer::SetupMeshPass(FViewInfo& View, FExclusiveDepthStencil::Type BasePassDepthStencilAccess, FViewCommands& ViewCommands)
{
    const EShadingPath ShadingPath = Scene->GetShadingPath();
    
    // 遍歷EMeshPass定義的所有Pass。
    for (int32 PassIndex = 0; PassIndex < EMeshPass::Num; PassIndex++)
    {
        const EMeshPass::Type PassType = (EMeshPass::Type)PassIndex;
        
        if ((FPassProcessorManager::GetPassFlags(ShadingPath, PassType) & EMeshPassFlags::MainView) != EMeshPassFlags::None)
        {
            (......)

            // 建立FMeshPassProcessor
            PassProcessorCreateFunction CreateFunction = FPassProcessorManager::GetCreateFunction(ShadingPath, PassType);
            FMeshPassProcessor* MeshPassProcessor = CreateFunction(Scene, &View, nullptr);

            // 獲取指定Pass的FParallelMeshDrawCommandPass物件。
            FParallelMeshDrawCommandPass& Pass = View.ParallelMeshDrawCommandPasses[PassIndex];

            if (ShouldDumpMeshDrawCommandInstancingStats())
            {
                Pass.SetDumpInstancingStats(GetMeshPassName(PassType));
            }

            // 並行地處理可見Pass的處理任務,建立此Pass的所有繪製命令。
            Pass.DispatchPassSetup(
                Scene,
                View,
                PassType,
                BasePassDepthStencilAccess,
                MeshPassProcessor,
                View.DynamicMeshElements,
                &View.DynamicMeshElementsPassRelevance,
                View.NumVisibleDynamicMeshElements[PassType],
                ViewCommands.DynamicMeshCommandBuildRequests[PassType],
                ViewCommands.NumDynamicMeshCommandBuildRequestElements[PassType],
                ViewCommands.MeshCommands[PassIndex]);
        }
    }
}

上面程式碼涉及的EMeshPass列舉定義如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

namespace EMeshPass
{
    enum Type
    {
        DepthPass,            // 深度
        BasePass,            // 幾何/基礎
        SkyPass,             // 天空
        SingleLayerWaterPass, // 單層水體
        CSMShadowDepth,     // 級聯陰影深度
        Distortion,         // 擾動
        Velocity,             // 速度
        
        // 透明相關的Pass
        TranslucentVelocity,
        TranslucencyStandard,
        TranslucencyAfterDOF, 
        TranslucencyAfterDOFModulate,
        TranslucencyAll, 
        
        LightmapDensity,     // 光照圖強度
        DebugViewMode,        // 除錯檢視模式
        CustomDepth,        // 自定義深度
        MobileBasePassCSM,
        MobileInverseOpacity, 
        VirtualTexture,        // 虛擬紋理

        // 編輯器模式下的特殊Pass
#if WITH_EDITOR
        HitProxy,
        HitProxyOpaqueOnly,
        EditorSelection,
#endif

        Num,
        NumBits = 5,
    };
}

由此可見,UE事先羅列了所有可能需要繪製的Pass,在SetupMeshPass階段對需要用到的Pass並行化地生成DrawCommand。其中FParallelMeshDrawCommandPass::DispatchPassSetup主要邏輯和解析如下:

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchPassSetup(
    FScene* Scene,
    const FViewInfo& View,
    EMeshPass::Type PassType,
    FExclusiveDepthStencil::Type BasePassDepthStencilAccess,
    FMeshPassProcessor* MeshPassProcessor,
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>& DynamicMeshElements,
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance,
    int32 NumDynamicMeshElements,
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator>& InOutDynamicMeshCommandBuildRequests,
    int32 NumDynamicMeshCommandBuildRequestElements,
    FMeshCommandOneFrameArray& InOutMeshDrawCommands,
    FMeshPassProcessor* MobileBasePassCSMMeshPassProcessor,
    FMeshCommandOneFrameArray* InOutMobileBasePassCSMMeshDrawCommands
)
{
    MaxNumDraws = InOutMeshDrawCommands.Num() + NumDynamicMeshElements + NumDynamicMeshCommandBuildRequestElements;
    
    // 設定TaskContext的資料,收集生成MeshCommand所需的資料。
    TaskContext.MeshPassProcessor = MeshPassProcessor;
    TaskContext.MobileBasePassCSMMeshPassProcessor = MobileBasePassCSMMeshPassProcessor;
    TaskContext.DynamicMeshElements = &DynamicMeshElements;
    TaskContext.DynamicMeshElementsPassRelevance = DynamicMeshElementsPassRelevance;

    TaskContext.View = &View;
    TaskContext.ShadingPath = Scene->GetShadingPath();
    TaskContext.ShaderPlatform = Scene->GetShaderPlatform();
    TaskContext.PassType = PassType;
    TaskContext.bUseGPUScene = UseGPUScene(GMaxRHIShaderPlatform, View.GetFeatureLevel());
    TaskContext.bDynamicInstancing = IsDynamicInstancingEnabled(View.GetFeatureLevel());
    TaskContext.bReverseCulling = View.bReverseCulling;
    TaskContext.bRenderSceneTwoSided = View.bRenderSceneTwoSided;
    TaskContext.BasePassDepthStencilAccess = BasePassDepthStencilAccess;
    TaskContext.DefaultBasePassDepthStencilAccess = Scene->DefaultBasePassDepthStencilAccess;
    TaskContext.NumDynamicMeshElements = NumDynamicMeshElements;
    TaskContext.NumDynamicMeshCommandBuildRequestElements = NumDynamicMeshCommandBuildRequestElements;

    // Only apply instancing for ISR to main view passes
    const bool bIsMainViewPass = PassType != EMeshPass::Num && (FPassProcessorManager::GetPassFlags(TaskContext.ShadingPath, TaskContext.PassType) & EMeshPassFlags::MainView) != EMeshPassFlags::None;
    TaskContext.InstanceFactor = (bIsMainViewPass && View.IsInstancedStereoPass()) ? 2 : 1;

    // 設定基於view的透明排序鍵
    TaskContext.TranslucencyPass = ETranslucencyPass::TPT_MAX;
    TaskContext.TranslucentSortPolicy = View.TranslucentSortPolicy;
    TaskContext.TranslucentSortAxis = View.TranslucentSortAxis;
    TaskContext.ViewOrigin = View.ViewMatrices.GetViewOrigin();
    TaskContext.ViewMatrix = View.ViewMatrices.GetViewMatrix();
    TaskContext.PrimitiveBounds = &Scene->PrimitiveBounds;

    switch (PassType)
    {
        case EMeshPass::TranslucencyStandard: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_StandardTranslucency; break;
        case EMeshPass::TranslucencyAfterDOF: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_TranslucencyAfterDOF; break;
        case EMeshPass::TranslucencyAfterDOFModulate: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_TranslucencyAfterDOFModulate; break;
        case EMeshPass::TranslucencyAll: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_AllTranslucency; break;
        case EMeshPass::MobileInverseOpacity: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_StandardTranslucency; break;
    }
    
    // 交換命令列表
    FMemory::Memswap(&TaskContext.MeshDrawCommands, &InOutMeshDrawCommands, sizeof(InOutMeshDrawCommands));
    FMemory::Memswap(&TaskContext.DynamicMeshCommandBuildRequests, &InOutDynamicMeshCommandBuildRequests, sizeof(InOutDynamicMeshCommandBuildRequests));

    if (TaskContext.ShadingPath == EShadingPath::Mobile && TaskContext.PassType == EMeshPass::BasePass)
    {
        FMemory::Memswap(&TaskContext.MobileBasePassCSMMeshDrawCommands, InOutMobileBasePassCSMMeshDrawCommands, sizeof(*InOutMobileBasePassCSMMeshDrawCommands));
    }
    else
    {
        check(MobileBasePassCSMMeshPassProcessor == nullptr && InOutMobileBasePassCSMMeshDrawCommands == nullptr);
    }

    if (MaxNumDraws > 0)
    {
        // 根據最大繪製數量(MaxNumDraws)在渲染執行緒預分配資源.
        bPrimitiveIdBufferDataOwnedByRHIThread = false;
        TaskContext.PrimitiveIdBufferDataSize = TaskContext.InstanceFactor * MaxNumDraws * sizeof(int32);
        TaskContext.PrimitiveIdBufferData = FMemory::Malloc(TaskContext.PrimitiveIdBufferDataSize);
        PrimitiveIdVertexBufferPoolEntry = GPrimitiveIdVertexBufferPool.Allocate(TaskContext.PrimitiveIdBufferDataSize);
        TaskContext.MeshDrawCommands.Reserve(MaxNumDraws);
        TaskContext.TempVisibleMeshDrawCommands.Reserve(MaxNumDraws);

        const bool bExecuteInParallel = FApp::ShouldUseThreadingForPerformance()
            && CVarMeshDrawCommandsParallelPassSetup.GetValueOnRenderThread() > 0
            && GRenderingThread; // Rendering thread is required to safely use rendering resources in parallel.
        
        // 如果是並行方式, 便建立並行任務例項並加入TaskGraph系統執行.
        if (bExecuteInParallel) 
        {
            FGraphEventArray DependentGraphEvents;
            DependentGraphEvents.Add(TGraphTask<FMeshDrawCommandPassSetupTask>::CreateTask(nullptr, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(TaskContext));
            TaskEventRef = TGraphTask<FMeshDrawCommandInitResourcesTask>::CreateTask(&DependentGraphEvents, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(TaskContext);
        }
        else
        {
            QUICK_SCOPE_CYCLE_COUNTER(STAT_MeshPassSetupImmediate);
            FMeshDrawCommandPassSetupTask Task(TaskContext);
            Task.AnyThreadTask();
            FMeshDrawCommandInitResourcesTask DependentTask(TaskContext);
            DependentTask.AnyThreadTask();
        }
    }
}

以上程式碼涉及了幾個關鍵的概念:FMeshPassProcessor,FMeshDrawCommandPassSetupTaskContext,FMeshDrawCommandPassSetupTask,FMeshDrawCommandInitResourcesTask。後面3個概念的定義和解析如下:

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.h

// 並行網格繪製命令通道設定任務(FMeshDrawCommandPassSetupTask)所需的上下文.
class FMeshDrawCommandPassSetupTaskContext
{
public:
    // view相關的資料.
    const FViewInfo* View;
    EShadingPath ShadingPath;
    EShaderPlatform ShaderPlatform;
    EMeshPass::Type PassType;
    bool bUseGPUScene;
    bool bDynamicInstancing;
    bool bReverseCulling;
    bool bRenderSceneTwoSided;
    FExclusiveDepthStencil::Type BasePassDepthStencilAccess;
    FExclusiveDepthStencil::Type DefaultBasePassDepthStencilAccess;

    // 網格通道處理器(Mesh pass processor).
    FMeshPassProcessor* MeshPassProcessor;
    FMeshPassProcessor* MobileBasePassCSMMeshPassProcessor;
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>* DynamicMeshElements;
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance;

    // 命令相關的資料.
    int32 InstanceFactor;
    int32 NumDynamicMeshElements;
    int32 NumDynamicMeshCommandBuildRequestElements;
    FMeshCommandOneFrameArray MeshDrawCommands;
    FMeshCommandOneFrameArray MobileBasePassCSMMeshDrawCommands;
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator> DynamicMeshCommandBuildRequests;
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator> MobileBasePassCSMDynamicMeshCommandBuildRequests;
    FDynamicMeshDrawCommandStorage MeshDrawCommandStorage;
    FGraphicsMinimalPipelineStateSet MinimalPipelineStatePassSet;
    bool NeedsShaderInitialisation;

    // 需在渲染執行緒預分配的資源.
    void* PrimitiveIdBufferData;
    int32 PrimitiveIdBufferDataSize;
    FMeshCommandOneFrameArray TempVisibleMeshDrawCommands;

    // 透明物體排序所需.
    ETranslucencyPass::Type TranslucencyPass;
    ETranslucentSortPolicy::Type TranslucentSortPolicy;
    FVector TranslucentSortAxis;
    FVector ViewOrigin;
    FMatrix ViewMatrix;
    const TArray<struct FPrimitiveBounds>* PrimitiveBounds;

    // For logging instancing stats.
    int32 VisibleMeshDrawCommandsNum;
    int32 NewPassVisibleMeshDrawCommandsNum;
    int32 MaxInstances;
};


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

// 轉換指定EMeshPass中的每個FMeshBatch到一組FMeshDrawCommand. FMeshDrawCommandPassSetupTask要用到.
void GenerateDynamicMeshDrawCommands(
    const FViewInfo& View,
    EShadingPath ShadingPath,
    EMeshPass::Type PassType,
    FMeshPassProcessor* PassMeshProcessor,
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>& DynamicMeshElements,
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance,
    int32 MaxNumDynamicMeshElements,
    const TArray<const FStaticMeshBatch*, SceneRenderingAllocator>& DynamicMeshCommandBuildRequests,
    int32 MaxNumBuildRequestElements,
    FMeshCommandOneFrameArray& VisibleCommands,
    FDynamicMeshDrawCommandStorage& MeshDrawCommandStorage,
    FGraphicsMinimalPipelineStateSet& MinimalPipelineStatePassSet,
    bool& NeedsShaderInitialisation
)
{
    (......)

    // 構建FDynamicPassMeshDrawListContext例項, 用於傳遞PassMeshProcessor生成的繪製命令.
    FDynamicPassMeshDrawListContext DynamicPassMeshDrawListContext(
        MeshDrawCommandStorage,
        VisibleCommands,
        MinimalPipelineStatePassSet,
        NeedsShaderInitialisation
    );
    PassMeshProcessor->SetDrawListContext(&DynamicPassMeshDrawListContext);

    // 處理動態網格批次.
    {
        const int32 NumCommandsBefore = VisibleCommands.Num();
        const int32 NumDynamicMeshBatches = DynamicMeshElements.Num();
        
        // 遍歷所有的動態網格批次.
        for (int32 MeshIndex = 0; MeshIndex < NumDynamicMeshBatches; MeshIndex++)
        {
            if (!DynamicMeshElementsPassRelevance || (*DynamicMeshElementsPassRelevance)[MeshIndex].Get(PassType))
            {
                const FMeshBatchAndRelevance& MeshAndRelevance = DynamicMeshElements[MeshIndex];
                check(!MeshAndRelevance.Mesh->bRequiresPerElementVisibility);
                const uint64 BatchElementMask = ~0ull;
                
                // 將FMeshBatch加入到PassMeshProcessor進行處理.
                PassMeshProcessor->AddMeshBatch(*MeshAndRelevance.Mesh, BatchElementMask, MeshAndRelevance.PrimitiveSceneProxy);
            }
        }

        (......)
    }
    
    // 處理靜態網格批次.
    {
        const int32 NumCommandsBefore = VisibleCommands.Num();
        const int32 NumStaticMeshBatches = DynamicMeshCommandBuildRequests.Num();

        for (int32 MeshIndex = 0; MeshIndex < NumStaticMeshBatches; MeshIndex++)
        {
            const FStaticMeshBatch* StaticMeshBatch = DynamicMeshCommandBuildRequests[MeshIndex];
            const uint64 BatchElementMask = StaticMeshBatch->bRequiresPerElementVisibility ? View.StaticMeshBatchVisibility[StaticMeshBatch->BatchVisibilityId] : ~0ull;
            
            // 將FMeshBatch加入到PassMeshProcessor進行處理.
            PassMeshProcessor->AddMeshBatch(*StaticMeshBatch, BatchElementMask, StaticMeshBatch->PrimitiveSceneInfo->Proxy, StaticMeshBatch->Id);
        }

        (......)
    }
}

// 並行設定網格繪製指令的任務. 包含動態網格繪製命令的生成, 排序, 合併等.
class FMeshDrawCommandPassSetupTask
{
public:
    FMeshDrawCommandPassSetupTask(FMeshDrawCommandPassSetupTaskContext& InContext)
        : Context(InContext)
    {
    }
    
    (......)

    void AnyThreadTask()
    {
        const bool bMobileShadingBasePass = Context.ShadingPath == EShadingPath::Mobile && Context.PassType == EMeshPass::BasePass;
        const bool bMobileVulkanSM5BasePass = IsVulkanMobileSM5Platform(Context.ShaderPlatform) && Context.PassType == EMeshPass::BasePass;

        if (bMobileShadingBasePass)
        {
            (......)
        }
        else
        {
            // 生成動態和靜態網格繪製指令(通過MeshPassProcessor將FMeshBatch轉換成MeshDrawCommand).
            GenerateDynamicMeshDrawCommands(
                *Context.View,
                Context.ShadingPath,
                Context.PassType,
                Context.MeshPassProcessor,
                *Context.DynamicMeshElements,
                Context.DynamicMeshElementsPassRelevance,
                Context.NumDynamicMeshElements,
                Context.DynamicMeshCommandBuildRequests,
                Context.NumDynamicMeshCommandBuildRequestElements,
                Context.MeshDrawCommands,
                Context.MeshDrawCommandStorage,
                Context.MinimalPipelineStatePassSet,
                Context.NeedsShaderInitialisation
            );
        }

        if (Context.MeshDrawCommands.Num() > 0)
        {
            if (Context.PassType != EMeshPass::Num)
            {
                // 應用view中已經存在的MeshDrawCommand, 例如:渲染平面反射的反向裁剪模式(reverse culling mode)
                ApplyViewOverridesToMeshDrawCommands(
                    Context.ShadingPath,
                    Context.PassType,
                    Context.bReverseCulling,
                    Context.bRenderSceneTwoSided,
                    Context.BasePassDepthStencilAccess,
                    Context.DefaultBasePassDepthStencilAccess,
                    Context.MeshDrawCommands,
                    Context.MeshDrawCommandStorage,
                    Context.MinimalPipelineStatePassSet,
                    Context.NeedsShaderInitialisation,
                    Context.TempVisibleMeshDrawCommands
                );
            }

            // 更新排序鍵.
            if (bMobileShadingBasePass || bMobileVulkanSM5BasePass)
            {
                (......)
            }
            else if (Context.TranslucencyPass != ETranslucencyPass::TPT_MAX)
            {
                // 用view相關的資料更新網格排序鍵. 排序鍵的型別是FMeshDrawCommandSortKey, 包含了BasePass和透明的鍵值, 其中透明物體的排序以其到攝像機的距離為依據.
                UpdateTranslucentMeshSortKeys(
                    Context.TranslucentSortPolicy,
                    Context.TranslucentSortAxis,
                    Context.ViewOrigin,
                    Context.ViewMatrix,
                    *Context.PrimitiveBounds,
                    Context.TranslucencyPass,
                    Context.MeshDrawCommands
                );
            }

            {
                QUICK_SCOPE_CYCLE_COUNTER(STAT_SortVisibleMeshDrawCommands);
                // 執行MeshDrawCommand的排序, FCompareFMeshDrawCommands首先以FMeshDrawCommandSortKey作為排序依據, 其次再用StateBucketId.
                Context.MeshDrawCommands.Sort(FCompareFMeshDrawCommands());
            }

            if (Context.bUseGPUScene)
            {
                // 生成GPU場景的相關資料(主要是渲染場景中所有的Primitive資料).
                BuildMeshDrawCommandPrimitiveIdBuffer(
                    Context.bDynamicInstancing,
                    Context.MeshDrawCommands,
                    Context.MeshDrawCommandStorage,
                    Context.PrimitiveIdBufferData,
                    Context.PrimitiveIdBufferDataSize,
                    Context.TempVisibleMeshDrawCommands,
                    Context.MaxInstances,
                    Context.VisibleMeshDrawCommandsNum,
                    Context.NewPassVisibleMeshDrawCommandsNum,
                    Context.ShaderPlatform,
                    Context.InstanceFactor
                );
            }
        }
    }

    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        AnyThreadTask();
    }

private:
    FMeshDrawCommandPassSetupTaskContext& Context; // 裝置上下文.
};


// MeshDrawCommand所需的預分配資源。
class FMeshDrawCommandInitResourcesTask
{
public:

    (......)

    void AnyThreadTask()
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(MeshDrawCommandInitResourcesTask);
        if (Context.NeedsShaderInitialisation)
        {
            // 初始化所有已繫結的shader資源。
            for (const FGraphicsMinimalPipelineStateInitializer& Initializer : Context.MinimalPipelineStatePassSet)
            {
                Initializer.BoundShaderState.LazilyInitShaders();
            }
        }
    }

    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        AnyThreadTask();
    }

private:
    FMeshDrawCommandPassSetupTaskContext& Context;
};

由此可見FMeshDrawCommandPassSetupTask擔當了在網格渲染管線中擔當了相當重要的角色, 包含動態網格繪和靜態制繪製命令的生成、排序、合併等。其中排序階段的鍵值由FMeshDrawCommandSortKey決定,它的定義如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

// FVisibleMeshDrawCommand的排序鍵值.
class RENDERER_API FMeshDrawCommandSortKey
{
public:
    union 
    {
        uint64 PackedData;    // 打包後的64位鍵值資料

        // 幾何通道排序鍵值
        struct
        {
            uint64 VertexShaderHash        : 16; // 低位:VS地址的雜湊值。
            uint64 PixelShaderHash        : 32; // 中位:PS地址的雜湊值。
            uint64 Masked                : 16; // 高位:是否Masked的材質
        } BasePass;

        // 透明通道排序鍵值
        struct
        {
            uint64 MeshIdInPrimitive    : 16; // 低位:共享同一個Primitive的穩定的網格id
            uint64 Distance                : 32; // 中位:到攝像機的距離
            uint64 Priority                : 16; // 高位:優先順序(由材質指定)
        } Translucent;
    
        // 普通排序鍵值
        struct 
        {
            uint64 VertexShaderHash     : 32; // 低位:VS地址的雜湊值。
            uint64 PixelShaderHash         : 32; // 高位:PS地址的雜湊值。
        } Generic;
    };
    
    // 不相等操作符
    FORCEINLINE bool operator!=(FMeshDrawCommandSortKey B) const
    {
        return PackedData != B.PackedData;
    }

    // 小於操作符,用於排序。
    FORCEINLINE bool operator<(FMeshDrawCommandSortKey B) const
    {
        return PackedData < B.PackedData;
    }

    static const FMeshDrawCommandSortKey Default;
};

以上FMeshDrawCommandSortKey需要補充幾點說明:

  • FMeshDrawCommandSortKey雖然可儲存BasePass、透明Pass、普通Pass3種鍵值,但同時只有一種資料生效。

  • 鍵值的計算邏輯分散在不同的檔案和階段。譬如BasePass的鍵值可以發生在BasePassRendering、DepthRendering以及MeshPassProcessor階段。其中它們的鍵值計算邏輯和解析如下表:

    鍵名 計算程式碼 解析
    VertexShaderHash PointerHash(VertexShader) 材質所用的VS的指標雜湊值。
    PixelShaderHash PointerHash(PixelShader) 材質所用的PS的指標雜湊值。
    Masked BlendMode == EBlendMode::BLEND_Masked ? 0 : 1 材質的混合模式是否Masked。
    MeshIdInPrimitive MeshIdInPrimitivePerView[ViewIndex] 基於檢視的共享同一個Primitive的穩定的網格id。
    Distance (uint32)~BitInvertIfNegativeFloat(((uint32)&Distance)) 根據ETranslucentSortPolicy算出Distance,再逆轉負數距離。
    Priority - 直接從材質指定的透明排序優先順序獲得。
  • operator<直接對比PackedData,表明越高位的資料優先順序越高,具體地說,BasePass的排序依據首先是判斷是否Masked的材質,再判斷PS和VS的地址雜湊值;同理,透明通道的排序優先順序依次是:材質指定的優先順序、網格到攝像機的距離、網格ID。

    一般來說,對網格進行排序時,對效能影響最大的因素會作為最大的優先順序。

    譬如BasePass階段,Masked的材質在某些GPU裝置會嚴重阻礙並行效率和吞吐量,排在最高位;而PS在指令數量、計算複雜度往往高於VS,故而排在VS之前也是合情合理的。

    但是,透明通道的排序有一點比較特殊,那就是物體與攝像機的距離遠近關係,因為要正確繪製半透明物體的前後關係,必須將它們從遠到近的距離進行繪製,否則前後關係會出現錯亂。故而透明通道必須將距離放在最高位(優先順序最大)。

  • PackedData將若干組資料打包成單個uint64,在比較時只需比較一次,可提升排序的效率。否則按照傳統的寫法,用幾個if-else語句,勢必增加CPU指令數量,降低排序效率。

  • 修改鍵值和相關的排序邏輯,可自定義排序優先順序和演算法。譬如增加若干排序維度:紋理、頂點資料、渲染狀態等。

接下來闡述一些重要的概念:FMeshPassProcessorFMeshDrawCommands,上面的程式碼多次出現它們的身影。FMeshPassProcessor充當了將FMeshBatch轉換成FMeshDrawCommands的角色,下面是它們及關聯概念的定義和解析:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h


// 不包含渲染紋理(Render Target)的渲染管線狀態。在沒有更改RT的一組繪製指令中非常有用。它的尺寸會影響網格繪製指令的遍歷效能。
class FGraphicsMinimalPipelineStateInitializer
{
public:
    // RT的相關資料:畫素格式,標記。
    using TRenderTargetFormats = TStaticArray<uint8/*EPixelFormat*/, MaxSimultaneousRenderTargets>;
    using TRenderTargetFlags = TStaticArray<uint32, MaxSimultaneousRenderTargets>;

    (......)

    // 將自己的值拷貝一份並傳遞出去。
    FGraphicsPipelineStateInitializer AsGraphicsPipelineStateInitializer() const
    {    
        return FGraphicsPipelineStateInitializer
        (    BoundShaderState.AsBoundShaderState()
            , BlendState
            , RasterizerState
            , DepthStencilState
            , ImmutableSamplerState
            , PrimitiveType
            , 0
            , FGraphicsPipelineStateInitializer::TRenderTargetFormats(PF_Unknown)
            , FGraphicsPipelineStateInitializer::TRenderTargetFlags(0)
            , PF_Unknown
            , 0
            , ERenderTargetLoadAction::ENoAction
            , ERenderTargetStoreAction::ENoAction
            , ERenderTargetLoadAction::ENoAction
            , ERenderTargetStoreAction::ENoAction
            , FExclusiveDepthStencil::DepthNop
            , 0
            , ESubpassHint::None
            , 0
            , 0
            , bDepthBounds
            , bMultiView
            , bHasFragmentDensityAttachment
        );
    }
    
    (......)

    // 計算FGraphicsMinimalPipelineStateInitializer的雜湊值。
    inline friend uint32 GetTypeHash(const FGraphicsMinimalPipelineStateInitializer& Initializer)
    {
        //add and initialize any leftover padding within the struct to avoid unstable key
        struct FHashKey
        {
            uint32 VertexDeclaration;
            uint32 VertexShader;
            uint32 PixelShader;
            uint32 RasterizerState;
        } HashKey;
        HashKey.VertexDeclaration = PointerHash(Initializer.BoundShaderState.VertexDeclarationRHI);
        HashKey.VertexShader = GetTypeHash(Initializer.BoundShaderState.VertexShaderIndex);
        HashKey.PixelShader = GetTypeHash(Initializer.BoundShaderState.PixelShaderIndex);
        HashKey.RasterizerState = PointerHash(Initializer.RasterizerState);

        return uint32(CityHash64((const char*)&HashKey, sizeof(FHashKey)));
    }

    // 比較介面。
    bool operator==(const FGraphicsMinimalPipelineStateInitializer& rhs) const;
    bool operator!=(const FGraphicsMinimalPipelineStateInitializer& rhs) const
    bool operator<(const FGraphicsMinimalPipelineStateInitializer& rhs) const;
    bool operator>(const FGraphicsMinimalPipelineStateInitializer& rhs) const;

    // 渲染管線狀態
    FMinimalBoundShaderStateInput    BoundShaderState;     // 繫結的shader狀態。
    FRHIBlendState*                    BlendState;            // 混合狀態。
    FRHIRasterizerState*            RasterizerState;     // 光柵化狀態。
    FRHIDepthStencilState*            DepthStencilState;    // 深度目標狀態。
    FImmutableSamplerState            ImmutableSamplerState;    // 不可變的取樣器狀態。

    // 其它狀態。
    bool                bDepthBounds = false;
    bool                bMultiView = false;
    bool                bHasFragmentDensityAttachment = false;
    uint8                Padding[1] = {}; // 記憶體對齊而加的資料。

    EPrimitiveType        PrimitiveType;
};


// 唯一地代表了FGraphicsMinimalPipelineStateInitializer一個例項的id,用於快速排序。
class FGraphicsMinimalPipelineStateId
{
public:
    uint32 GetId() const
    {
        return PackedId;
    }
    
    // 判斷和比較介面。
    inline bool IsValid() const 
    inline bool operator==(const FGraphicsMinimalPipelineStateId& rhs) const;
    inline bool operator!=(const FGraphicsMinimalPipelineStateId& rhs) const;
    
    // 獲取關聯的FGraphicsMinimalPipelineStateInitializer。
    inline const FGraphicsMinimalPipelineStateInitializer& GetPipelineState(const FGraphicsMinimalPipelineStateSet& InPipelineSet) const
    {
        if (bComesFromLocalPipelineStateSet)
        {
            return InPipelineSet.GetByElementId(SetElementIndex);
        }

        {
            FScopeLock Lock(&PersistentIdTableLock);
            return PersistentIdTable.GetByElementId(SetElementIndex).Key;
        }
    }

    static void InitializePersistentIds();
    // 獲取FGraphicsMinimalPipelineStateInitializer對應的永久的pipeline state id。
    static FGraphicsMinimalPipelineStateId GetPersistentId(const FGraphicsMinimalPipelineStateInitializer& InPipelineState);
    static void RemovePersistentId(FGraphicsMinimalPipelineStateId Id);

    // 按如下順序獲取pipeline state id:全域性永久的id表和PassSet引數,如果都沒找到,會建立一個空白的例項,並加入到PassSet引數。
    RENDERER_API static FGraphicsMinimalPipelineStateId GetPipelineStateId(const FGraphicsMinimalPipelineStateInitializer& InPipelineState, FGraphicsMinimalPipelineStateSet& InOutPassSet, bool& NeedsShaderInitialisation);


private:
    // 打包的鍵值。
    union
    {
        uint32 PackedId = 0;

        struct
        {
            uint32 SetElementIndex                   : 30;
            uint32 bComesFromLocalPipelineStateSet : 1;
            uint32 bValid                           : 1;
        };
    };

    struct FRefCountedGraphicsMinimalPipelineState
    {
        FRefCountedGraphicsMinimalPipelineState() : RefNum(0)
        {
        }
        uint32 RefNum;
    };

    static FCriticalSection PersistentIdTableLock;
    using PersistentTableType = Experimental::TRobinHoodHashMap<FGraphicsMinimalPipelineStateInitializer, FRefCountedGraphicsMinimalPipelineState>;
    // 持久id表。
    static PersistentTableType PersistentIdTable;

    static int32 LocalPipelineIdTableSize;
    static int32 CurrentLocalPipelineIdTableSize;
    static bool NeedsShaderInitialisation;
};

// 網格繪製指令,記錄了繪製單個Mesh所需的所有資源和資料,且不應該有多餘的資料,如果需要在InitView傳遞資料,可用FVisibleMeshDrawCommand。
// 所有被FMeshDrawCommand引用的資源都必須保證生命週期,因為FMeshDrawCommand並不管理資源的生命週期。
class FMeshDrawCommand
{
public:
    // 資源繫結
    FMeshDrawShaderBindings ShaderBindings;
    FVertexInputStreamArray VertexStreams;
    FRHIIndexBuffer* IndexBuffer;

    // 快取的渲染管線狀態(PSO)
    FGraphicsMinimalPipelineStateId CachedPipelineId;

    // 繪製命令引數。
    uint32 FirstIndex;
    uint32 NumPrimitives;
    uint32 NumInstances;

    // 頂點資料,包含普通模式和非直接模式。
    union
    {
        struct 
        {
            uint32 BaseVertexIndex;
            uint32 NumVertices;
        } VertexParams;
        
        struct  
        {
            FRHIVertexBuffer* Buffer;
            uint32 Offset;
        } IndirectArgs;
    };

    int8 PrimitiveIdStreamIndex;

    // 非渲染狀態引數。
    uint8 StencilRef;

    // 判斷是否和指定的FMeshDrawCommand相匹配,如果匹配,可以合併成同一個instance進行繪製。
    bool MatchesForDynamicInstancing(const FMeshDrawCommand& Rhs) const
    {
        return CachedPipelineId == Rhs.CachedPipelineId
            && StencilRef == Rhs.StencilRef
            && ShaderBindings.MatchesForDynamicInstancing(Rhs.ShaderBindings)
            && VertexStreams == Rhs.VertexStreams
            && PrimitiveIdStreamIndex == Rhs.PrimitiveIdStreamIndex
            && IndexBuffer == Rhs.IndexBuffer
            && FirstIndex == Rhs.FirstIndex
            && NumPrimitives == Rhs.NumPrimitives
            && NumInstances == Rhs.NumInstances
            && ((NumPrimitives > 0 && VertexParams.BaseVertexIndex == Rhs.VertexParams.BaseVertexIndex && VertexParams.NumVertices == Rhs.VertexParams.NumVertices)
                || (NumPrimitives == 0 && IndirectArgs.Buffer == Rhs.IndirectArgs.Buffer && IndirectArgs.Offset == Rhs.IndirectArgs.Offset));
    }

    // 獲取動態例項的雜湊值。
    uint32 GetDynamicInstancingHash() const
    {
        //add and initialize any leftover padding within the struct to avoid unstable keys
        struct FHashKey
        {
            uint32 IndexBuffer;
            uint32 VertexBuffers = 0;
            uint32 VertexStreams = 0;
            uint32 PipelineId;
            uint32 DynamicInstancingHash;
            uint32 FirstIndex;
            uint32 NumPrimitives;
            uint32 NumInstances;
            uint32 IndirectArgsBufferOrBaseVertexIndex;
            uint32 NumVertices;
            uint32 StencilRefAndPrimitiveIdStreamIndex;

            // 指標地址雜湊
            static inline uint32 PointerHash(const void* Key)
            {
#if PLATFORM_64BITS
                // Ignoring the lower 4 bits since they are likely zero anyway.
                // Higher bits are more significant in 64 bit builds.
                return reinterpret_cast<UPTRINT>(Key) >> 4;
#else
                return reinterpret_cast<UPTRINT>(Key);
#endif
            };

            // 雜湊組合
            static inline uint32 HashCombine(uint32 A, uint32 B)
            {
                return A ^ (B + 0x9e3779b9 + (A << 6) + (A >> 2));
            }
        } HashKey;

        // 將FMeshDrawCommand的所有成員變數數值填充到FHashKey
        HashKey.PipelineId = CachedPipelineId.GetId();
        HashKey.StencilRefAndPrimitiveIdStreamIndex = StencilRef | (PrimitiveIdStreamIndex << 8);
        HashKey.DynamicInstancingHash = ShaderBindings.GetDynamicInstancingHash();

        for (int index = 0; index < VertexStreams.Num(); index++)
        {
            const FVertexInputStream& VertexInputStream = VertexStreams[index];
            const uint32 StreamIndex = VertexInputStream.StreamIndex;
            const uint32 Offset = VertexInputStream.Offset;

            uint32 Packed = (StreamIndex << 28) | Offset;
            HashKey.VertexStreams = FHashKey::HashCombine(HashKey.VertexStreams, Packed);
            HashKey.VertexBuffers = FHashKey::HashCombine(HashKey.VertexBuffers, FHashKey::PointerHash(VertexInputStream.VertexBuffer));
        }

        HashKey.IndexBuffer = FHashKey::PointerHash(IndexBuffer);
        HashKey.FirstIndex = FirstIndex;
        HashKey.NumPrimitives = NumPrimitives;
        HashKey.NumInstances = NumInstances;

        if (NumPrimitives > 0)
        {
            HashKey.IndirectArgsBufferOrBaseVertexIndex = VertexParams.BaseVertexIndex;
            HashKey.NumVertices = VertexParams.NumVertices;
        }
        else
        {
            HashKey.IndirectArgsBufferOrBaseVertexIndex = FHashKey::PointerHash(IndirectArgs.Buffer);
            HashKey.NumVertices = IndirectArgs.Offset;
        }        

        // 將填充完的HashKey轉成雜湊值,資料完全一樣的HashKey總是具有相同的雜湊值,這樣可以很方便地判斷是否可以合批渲染。
        return uint32(CityHash64((char*)&HashKey, sizeof(FHashKey)));
    }

    (......)
    
    // 將FMeshBatch的相關資料進行處理並傳遞到FMeshDrawCommand中。
    void SetDrawParametersAndFinalize(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        FGraphicsMinimalPipelineStateId PipelineId,
        const FMeshProcessorShaders* ShadersForDebugging)
    {
        const FMeshBatchElement& BatchElement = MeshBatch.Elements[BatchElementIndex];

        IndexBuffer = BatchElement.IndexBuffer ? BatchElement.IndexBuffer->IndexBufferRHI.GetReference() : nullptr;
        FirstIndex = BatchElement.FirstIndex;
        NumPrimitives = BatchElement.NumPrimitives;
        NumInstances = BatchElement.NumInstances;

        if (NumPrimitives > 0)
        {
            VertexParams.BaseVertexIndex = BatchElement.BaseVertexIndex;
            VertexParams.NumVertices = BatchElement.MaxVertexIndex - BatchElement.MinVertexIndex + 1;
        }
        else
        {
            IndirectArgs.Buffer = BatchElement.IndirectArgsBuffer;
            IndirectArgs.Offset = BatchElement.IndirectArgsOffset;
        }

        Finalize(PipelineId, ShadersForDebugging);
    }

    // 儲存PipelineId和shader除錯資訊。
    void Finalize(FGraphicsMinimalPipelineStateId PipelineId, const FMeshProcessorShaders* ShadersForDebugging)
    {
        CachedPipelineId = PipelineId;
        ShaderBindings.Finalize(ShadersForDebugging);    
    }

    /** Submits commands to the RHI Commandlist to draw the MeshDrawCommand. */
    static void SubmitDraw(
        const FMeshDrawCommand& RESTRICT MeshDrawCommand, 
        const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
        FRHIVertexBuffer* ScenePrimitiveIdsBuffer,
        int32 PrimitiveIdOffset,
        uint32 InstanceFactor,
        FRHICommandList& CommandList, 
        class FMeshDrawCommandStateCache& RESTRICT StateCache);

    (......)
};


// 可見的網格繪製指令。儲存了已經被斷定為可見的網格繪製指令所需的資訊,以便後續進行可見性處理。
// 與FMeshDrawCommand不同的是,FVisibleMeshDrawCommand只應該儲存InitViews操作(可見性/排序)所需的書,而不應該有繪製提交相關的資料。
class FVisibleMeshDrawCommand
{
public:
    (......)
    
    // 關聯的FMeshDrawCommand例項。
    const FMeshDrawCommand* MeshDrawCommand;
    // 基於無狀態排序的鍵值。(如基於深度排序的透明繪製指令)
    FMeshDrawCommandSortKey SortKey;
    // 繪製圖元id,可用於從PrimitiveSceneData的SRV獲取圖後設資料。有效的DrawPrimitiveId可以反向追蹤FPrimitiveSceneInfo例項。
    int32 DrawPrimitiveId;
    // 生產FVisibleMeshDrawCommand的場景圖元id,如果是-1則代表沒有FPrimitiveSceneInfo,可以反向追蹤FPrimitiveSceneInfo例項。
    int32 ScenePrimitiveId;
    // Offset into the buffer of PrimitiveIds built for this pass, in int32's.
    int32 PrimitiveIdBufferOffset;

    // 動態instancing狀態桶id(Dynamic instancing state bucket ID)。
    // 所有相同StateBucketId的繪製指令可被合併到同一個instancing中。
    // -1表示由其它因素代替StateBucketId進行排序。
    int32 StateBucketId;

    // Needed for view overrides
    ERasterizerFillMode MeshFillMode : ERasterizerFillMode_NumBits + 1;
    ERasterizerCullMode MeshCullMode : ERasterizerCullMode_NumBits + 1;
};


// 網格通道處理器
class FMeshPassProcessor
{
public:
    
    // 以下的場景、view、context等資料由構建函式傳入.
    const FScene* RESTRICT Scene;
    ERHIFeatureLevel::Type FeatureLevel;
    const FSceneView* ViewIfDynamicMeshCommand;
    FMeshPassDrawListContext* DrawListContext;

    (......)

    // 增加FMeshBatch例項, 由具體的子類Pass實現.
    virtual void AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId = -1) = 0;
    
    // 網格繪製策略重寫設定.
    struct FMeshDrawingPolicyOverrideSettings
    {
        EDrawingPolicyOverrideFlags    MeshOverrideFlags = EDrawingPolicyOverrideFlags::None;
        EPrimitiveType                MeshPrimitiveType = PT_TriangleList;
    };
    
    (......)

    // 將1個FMeshBatch轉換成1或多個MeshDrawCommands.
    template<typename PassShadersType, typename ShaderElementDataType>
    void BuildMeshDrawCommands(
        const FMeshBatch& RESTRICT MeshBatch,
        uint64 BatchElementMask,
        const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy,
        const FMaterialRenderProxy& RESTRICT MaterialRenderProxy,
        const FMaterial& RESTRICT MaterialResource,
        const FMeshPassProcessorRenderState& RESTRICT DrawRenderState,
        PassShadersType PassShaders,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        EMeshPassFeatures MeshPassFeatures,
        const ShaderElementDataType& ShaderElementData)
    {
        const FVertexFactory* RESTRICT VertexFactory = MeshBatch.VertexFactory;
        const FPrimitiveSceneInfo* RESTRICT PrimitiveSceneInfo = PrimitiveSceneProxy ? PrimitiveSceneProxy->GetPrimitiveSceneInfo() : nullptr;

        // FMeshDrawCommand例項, 用於收集各類渲染資源和資料.
        FMeshDrawCommand SharedMeshDrawCommand;
        
        // 處理FMeshDrawCommand的模板資料.
        SharedMeshDrawCommand.SetStencilRef(DrawRenderState.GetStencilRef());

        // 渲染狀態例項.
        FGraphicsMinimalPipelineStateInitializer PipelineState;
        PipelineState.PrimitiveType = (EPrimitiveType)MeshBatch.Type;
        PipelineState.ImmutableSamplerState = MaterialRenderProxy.ImmutableSamplerState;
        
        // 處理FMeshDrawCommand的頂點資料, shader和渲染狀態.
        EVertexInputStreamType InputStreamType = EVertexInputStreamType::Default;
        if ((MeshPassFeatures & EMeshPassFeatures::PositionOnly) != EMeshPassFeatures::Default)                InputStreamType = EVertexInputStreamType::PositionOnly;
        if ((MeshPassFeatures & EMeshPassFeatures::PositionAndNormalOnly) != EMeshPassFeatures::Default)    InputStreamType = EVertexInputStreamType::PositionAndNormalOnly;

        FRHIVertexDeclaration* VertexDeclaration = VertexFactory->GetDeclaration(InputStreamType);
        SharedMeshDrawCommand.SetShaders(VertexDeclaration, PassShaders.GetUntypedShaders(), PipelineState);

        PipelineState.RasterizerState = GetStaticRasterizerState<true>(MeshFillMode, MeshCullMode);
        PipelineState.BlendState = DrawRenderState.GetBlendState();
        PipelineState.DepthStencilState = DrawRenderState.GetDepthStencilState();

        VertexFactory->GetStreams(FeatureLevel, InputStreamType, SharedMeshDrawCommand.VertexStreams);

        SharedMeshDrawCommand.PrimitiveIdStreamIndex = VertexFactory->GetPrimitiveIdStreamIndex(InputStreamType);

        // 處理VS/PS/GS等shader的繫結資料.
        int32 DataOffset = 0;
        if (PassShaders.VertexShader.IsValid())
        {
            FMeshDrawSingleShaderBindings ShaderBindings = SharedMeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Vertex, DataOffset);
            PassShaders.VertexShader->GetShaderBindings(Scene, FeatureLevel, PrimitiveSceneProxy, MaterialRenderProxy, MaterialResource, DrawRenderState, ShaderElementData, ShaderBindings);
        }

        if (PassShaders.PixelShader.IsValid())
        {
            FMeshDrawSingleShaderBindings ShaderBindings = SharedMeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Pixel, DataOffset);
            PassShaders.PixelShader->GetShaderBindings(Scene, FeatureLevel, PrimitiveSceneProxy, MaterialRenderProxy, MaterialResource, DrawRenderState, ShaderElementData, ShaderBindings);
        }

        (......)

        const int32 NumElements = MeshBatch.Elements.Num();

        // 遍歷該FMeshBatch的所有MeshBatchElement, 從材質中獲取FMeshBatchElement關聯的所有shader型別的繫結資料.
        for (int32 BatchElementIndex = 0; BatchElementIndex < NumElements; BatchElementIndex++)
        {
            if ((1ull << BatchElementIndex) & BatchElementMask)
            {
                const FMeshBatchElement& BatchElement = MeshBatch.Elements[BatchElementIndex];
                FMeshDrawCommand& MeshDrawCommand = DrawListContext->AddCommand(SharedMeshDrawCommand, NumElements);

                DataOffset = 0;
                if (PassShaders.VertexShader.IsValid())
                {
                    FMeshDrawSingleShaderBindings VertexShaderBindings = MeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Vertex, DataOffset);
                    FMeshMaterialShader::GetElementShaderBindings(PassShaders.VertexShader, Scene, ViewIfDynamicMeshCommand, VertexFactory, InputStreamType, FeatureLevel, PrimitiveSceneProxy, MeshBatch, BatchElement, ShaderElementData, VertexShaderBindings, MeshDrawCommand.VertexStreams);
                }

                if (PassShaders.PixelShader.IsValid())
                {
                    FMeshDrawSingleShaderBindings PixelShaderBindings = MeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Pixel, DataOffset);
                    FMeshMaterialShader::GetElementShaderBindings(PassShaders.PixelShader, Scene, ViewIfDynamicMeshCommand, VertexFactory, EVertexInputStreamType::Default, FeatureLevel, PrimitiveSceneProxy, MeshBatch, BatchElement, ShaderElementData, PixelShaderBindings, MeshDrawCommand.VertexStreams);
                }
                
                (......)

                // 處理並獲得PrimitiveId.
                int32 DrawPrimitiveId;
                int32 ScenePrimitiveId;
                GetDrawCommandPrimitiveId(PrimitiveSceneInfo, BatchElement, DrawPrimitiveId, ScenePrimitiveId);

                // 最後處理MeshDrawCommand
                FMeshProcessorShaders ShadersForDebugging = PassShaders.GetUntypedShaders();
                DrawListContext->FinalizeCommand(MeshBatch, BatchElementIndex, DrawPrimitiveId, ScenePrimitiveId, MeshFillMode, MeshCullMode, SortKey, PipelineState, &ShadersForDebugging, MeshDrawCommand);
            }
        }
    }

protected:
    RENDERER_API void GetDrawCommandPrimitiveId(
        const FPrimitiveSceneInfo* RESTRICT PrimitiveSceneInfo,
        const FMeshBatchElement& BatchElement,
        int32& DrawPrimitiveId,
        int32& ScenePrimitiveId) const;
};

上面計算鍵值時數次用到了CityHash64CityHash64是一種計算任意數量字串雜湊值的演算法,是一個快速的非加密雜湊函式,也是一種快速的非加密的雜湊函式。它的實現程式碼在Engine\Source\Runtime\Core\Private\Hash\CityHash.cpp中,有興趣的童鞋自行研讀了。

與之相似的雜湊演算法有:HalfMD5,MD5,SipHash64,SipHash128,IntHash32,IntHash64,SHA1,SHA224,SHA256等等。

FMeshDrawCommand儲存了所有RHI所需的繪製網格的資訊,這些資訊時平臺無關和圖形API無關的(stateless),並且是基於資料驅動的設計,因此可以共享它的裝置上下文。

FMeshPassProcessor::AddMeshBatch由子類實現,每個子類通常對應著EMeshPass列舉的一個通道。它的常見子類有:

  • FDepthPassMeshProcessor:深度通道網格處理器,對應EMeshPass::DepthPass

  • FBasePassMeshProcessor:幾何通道網格處理器,對應EMeshPass::BasePass

  • FCustomDepthPassMeshProcessor:自定義深度通道網格處理器,對應EMeshPass::CustomDepth

  • FShadowDepthPassMeshProcessor:陰影通道網格處理器,對應EMeshPass::CSMShadowDepth

  • FTranslucencyDepthPassMeshProcessor:透明深度通道網格處理器,沒有對應的EMeshPass

  • FLightmapDensityMeshProcessor:光照圖網格處理器,對應EMeshPass::LightmapDensity

  • ......

不同的Pass處理FMeshBatch會有所不同,以最常見的FBasePassMeshProcessor為例:

// Engine\Source\Runtime\Renderer\Private\BasePassRendering.cpp

void FBasePassMeshProcessor::AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId)
{
    if (MeshBatch.bUseForMaterial)
    {
        (......)

        if (bShouldDraw
            && (!PrimitiveSceneProxy || PrimitiveSceneProxy->ShouldRenderInMainPass())
            && ShouldIncludeDomainInMeshPass(Material.GetMaterialDomain())
            && ShouldIncludeMaterialInDefaultOpaquePass(Material))
        {
            (......)

            // 處理簡單的前向渲染
            if (IsSimpleForwardShadingEnabled(GetFeatureLevelShaderPlatform(FeatureLevel)))
            {
                AddMeshBatchForSimpleForwardShading(
                    MeshBatch,
                    BatchElementMask,
                    StaticMeshId,
                    PrimitiveSceneProxy,
                    MaterialRenderProxy,
                    Material,
                    LightMapInteraction,
                    bIsLitMaterial,
                    bAllowStaticLighting,
                    bUseVolumetricLightmap,
                    bAllowIndirectLightingCache,
                    MeshFillMode,
                    MeshCullMode);
            }
            // 渲染體積透明自陰影的物體
            else if (bIsLitMaterial
                && bIsTranslucent
                && PrimitiveSceneProxy
                && PrimitiveSceneProxy->CastsVolumetricTranslucentShadow())
            {
                (......)

                if (bIsLitMaterial
                    && bAllowStaticLighting
                    && bUseVolumetricLightmap
                    && PrimitiveSceneProxy)
                {
                    Process< FSelfShadowedVolumetricLightmapPolicy >(
                        MeshBatch,
                        BatchElementMask,
                        StaticMeshId,
                        PrimitiveSceneProxy,
                        MaterialRenderProxy,
                        Material,
                        BlendMode,
                        ShadingModels,
                        FSelfShadowedVolumetricLightmapPolicy(),
                        ElementData,
                        MeshFillMode,
                        MeshCullMode);
                }
                
                (......)
            }
            // 根據不同的光照圖的選項和質量等級,呼叫Process進行處理。
            else
            {
                static const auto CVarSupportLowQualityLightmap = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.SupportLowQualityLightmaps"));
                const bool bAllowLowQualityLightMaps = (!CVarSupportLowQualityLightmap) || (CVarSupportLowQualityLightmap->GetValueOnAnyThread() != 0);

                switch (LightMapInteraction.GetType())
                {
                case LMIT_Texture:
                    if (bAllowHighQualityLightMaps)
                    {
                        const FShadowMapInteraction ShadowMapInteraction = (bAllowStaticLighting && MeshBatch.LCI && bIsLitMaterial)
                            ? MeshBatch.LCI->GetShadowMapInteraction(FeatureLevel)
                            : FShadowMapInteraction();

                        if (ShadowMapInteraction.GetType() == SMIT_Texture)
                        {
                            Process< FUniformLightMapPolicy >(
                                MeshBatch,
                                BatchElementMask,
                                StaticMeshId,
                                PrimitiveSceneProxy,
                                MaterialRenderProxy,
                                Material,
                                BlendMode,
                                ShadingModels,
                                FUniformLightMapPolicy(LMP_DISTANCE_FIELD_SHADOWS_AND_HQ_LIGHTMAP),
                                MeshBatch.LCI,
                                MeshFillMode,
                                MeshCullMode);
                        }
                            
                        (......)
                    }
                        
                    (......)
                        
                    break;
                default:
                    if (bIsLitMaterial
                        && bAllowStaticLighting
                        && Scene
                        && Scene->VolumetricLightmapSceneData.HasData()
                        && PrimitiveSceneProxy
                        && (PrimitiveSceneProxy->IsMovable()
                            || PrimitiveSceneProxy->NeedsUnbuiltPreviewLighting()
                            || PrimitiveSceneProxy->GetLightmapType() == ELightmapType::ForceVolumetric))
                    {
                        Process< FUniformLightMapPolicy >(
                            MeshBatch,
                            BatchElementMask,
                            StaticMeshId,
                            PrimitiveSceneProxy,
                            MaterialRenderProxy,
                            Material,
                            BlendMode,
                            ShadingModels,
                            FUniformLightMapPolicy(LMP_PRECOMPUTED_IRRADIANCE_VOLUME_INDIRECT_LIGHTING),
                            MeshBatch.LCI,
                            MeshFillMode,
                            MeshCullMode);
                    }
                        
                    (......)
                        
                    break;
                };
            }
        }
    }
}

// FBasePassMeshProcessor對不同的光照圖型別進行處理(shader繫結,渲染狀態,排序鍵值,頂點資料等等),最後呼叫BuildMeshDrawCommands將FMeshBatch轉換成FMeshDrawCommands。
template<typename LightMapPolicyType>
void FBasePassMeshProcessor::Process(
    const FMeshBatch& RESTRICT MeshBatch,
    uint64 BatchElementMask,
    int32 StaticMeshId,
    const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy,
    const FMaterialRenderProxy& RESTRICT MaterialRenderProxy,
    const FMaterial& RESTRICT MaterialResource,
    EBlendMode BlendMode,
    FMaterialShadingModelField ShadingModels,
    const LightMapPolicyType& RESTRICT LightMapPolicy,
    const typename LightMapPolicyType::ElementDataType& RESTRICT LightMapElementData,
    ERasterizerFillMode MeshFillMode,
    ERasterizerCullMode MeshCullMode)
{
    const FVertexFactory* VertexFactory = MeshBatch.VertexFactory;

    const bool bRenderSkylight = Scene && Scene->ShouldRenderSkylightInBasePass(BlendMode) && ShadingModels.IsLit();
    const bool bRenderAtmosphericFog = IsTranslucentBlendMode(BlendMode) && (Scene && Scene->HasAtmosphericFog() && Scene->ReadOnlyCVARCache.bEnableAtmosphericFog);

    TMeshProcessorShaders<
        TBasePassVertexShaderPolicyParamType<LightMapPolicyType>,
        FBaseHS,
        FBaseDS,
        TBasePassPixelShaderPolicyParamType<LightMapPolicyType>> BasePassShaders;

    // 獲取指定光照圖策略型別的shader。
    GetBasePassShaders<LightMapPolicyType>(
        MaterialResource,
        VertexFactory->GetType(),
        LightMapPolicy,
        FeatureLevel,
        bRenderAtmosphericFog,
        bRenderSkylight,
        Get128BitRequirement(),
        BasePassShaders.HullShader,
        BasePassShaders.DomainShader,
        BasePassShaders.VertexShader,
        BasePassShaders.PixelShader
        );

    // 渲染狀態處理。
    FMeshPassProcessorRenderState DrawRenderState(PassDrawRenderState);

    SetDepthStencilStateForBasePass(
        ViewIfDynamicMeshCommand,
        DrawRenderState,
        FeatureLevel,
        MeshBatch,
        StaticMeshId,
        PrimitiveSceneProxy,
        bEnableReceiveDecalOutput);

    if (bTranslucentBasePass)
    {
        SetTranslucentRenderState(DrawRenderState, MaterialResource, GShaderPlatformForFeatureLevel[FeatureLevel], TranslucencyPassType);
    }

    // 初始化Shader的材質書。
    TBasePassShaderElementData<LightMapPolicyType> ShaderElementData(LightMapElementData);
    ShaderElementData.InitializeMeshMaterialData(ViewIfDynamicMeshCommand, PrimitiveSceneProxy, MeshBatch, StaticMeshId, true);

    // 處理排序鍵值。
    FMeshDrawCommandSortKey SortKey = FMeshDrawCommandSortKey::Default;

    if (bTranslucentBasePass)
    {
        SortKey = CalculateTranslucentMeshStaticSortKey(PrimitiveSceneProxy, MeshBatch.MeshIdInPrimitive);
    }
    else
    {
        SortKey = CalculateBasePassMeshStaticSortKey(EarlyZPassMode, BlendMode, BasePassShaders.VertexShader.GetShader(), BasePassShaders.PixelShader.GetShader());
    }

    // 將FMeshBatch的元素轉換成FMeshDrawCommands。
    BuildMeshDrawCommands(
        MeshBatch,
        BatchElementMask,
        PrimitiveSceneProxy,
        MaterialRenderProxy,
        MaterialResource,
        DrawRenderState,
        BasePassShaders,
        MeshFillMode,
        MeshCullMode,
        SortKey,
        EMeshPassFeatures::Default,
        ShaderElementData);
}

由此可見,FMeshPassProcessor的主要作用是:

  • Pass過濾。將該Pass無關的MeshBatch給過濾掉,比如深度Pass過濾掉透明物體。

  • 選擇繪製命令所需的Shader及渲染狀態(深度、模板、混合狀態、光柵化狀態等)。

  • 收集繪製命令涉及的Shader資源繫結。

    • Pass的Uniform Buffer,如ViewUniformBuffer、DepthPassUniformBuffer。
    • 頂點工廠繫結(頂點資料和索引)。
    • 材質繫結。
    • Pass的與繪製指令相關的繫結。
  • 收集Draw Call相關的引數。

FMeshPassProcessor::BuildMeshDrawCommands在最後階段會呼叫FMeshPassDrawListContext::FinalizeCommandFMeshPassDrawListContext提供了兩個基本介面,是個抽象類,派生類有FDynamicPassMeshDrawListContextFCachedPassMeshDrawListContext,分別代表了動態網格繪製指令和快取網格繪製指令的上下文。它們的介面和解析如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

// 網格通道繪製列表上下文。
class FMeshPassDrawListContext
{
public:
    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) = 0;
    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) = 0;
};

// 【動態】網格通道繪製列表上下文。
class FDynamicPassMeshDrawListContext : public FMeshPassDrawListContext
{
public:
    (......)

    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final
    {
        // 將FMeshDrawCommand加進列表,返回其在陣列的下標。
        const int32 Index = DrawListStorage.MeshDrawCommands.AddElement(Initializer);
        FMeshDrawCommand& NewCommand = DrawListStorage.MeshDrawCommands[Index];
        return NewCommand;
    }

    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) override final
    {
        // 獲取渲染管線Id
        FGraphicsMinimalPipelineStateId PipelineId = FGraphicsMinimalPipelineStateId::GetPipelineStateId(PipelineState, GraphicsMinimalPipelineStateSet, NeedsShaderInitialisation);

        // 對FMeshBatch等資料進行處理, 並儲存到MeshDrawCommand中.
        MeshDrawCommand.SetDrawParametersAndFinalize(MeshBatch, BatchElementIndex, PipelineId, ShadersForDebugging);

        // 建立FVisibleMeshDrawCommand, 並將FMeshDrawCommand等資料填充給它.
        FVisibleMeshDrawCommand NewVisibleMeshDrawCommand;
        NewVisibleMeshDrawCommand.Setup(&MeshDrawCommand, DrawPrimitiveId, ScenePrimitiveId, -1, MeshFillMode, MeshCullMode, SortKey);
        // 直接加入到TArray中,說明動態模式並未合併和例項化MeshDrawCommand。
        DrawList.Add(NewVisibleMeshDrawCommand);
    }

private:
    // 儲存FMeshDrawCommand的列表,使用的資料結構是TChunkedArray。
    FDynamicMeshDrawCommandStorage& DrawListStorage;
    // FVisibleMeshDrawCommand列表,使用的資料結構是TArray,它內部引用了FMeshDrawCommand指標,指向的資料儲存於DrawListStorage。
    FMeshCommandOneFrameArray& DrawList;
    // PSO集合。
    FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet;
    
    bool& NeedsShaderInitialisation;
};


// 【快取】網格通道繪製列表上下文。
class FCachedPassMeshDrawListContext : public FMeshPassDrawListContext
{
public:
    FCachedPassMeshDrawListContext(FCachedMeshDrawCommandInfo& InCommandInfo, FCriticalSection& InCachedMeshDrawCommandLock, FCachedPassMeshDrawList& InCachedDrawLists, FStateBucketMap& InCachedMeshDrawCommandStateBuckets, const FScene& InScene);

    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final
    {
        if (NumElements == 1)
        {
            return Initializer;
        }
        else
        {
            MeshDrawCommandForStateBucketing = Initializer;
            return MeshDrawCommandForStateBucketing;
        }
    }

    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) override final
    {
        FGraphicsMinimalPipelineStateId PipelineId = FGraphicsMinimalPipelineStateId::GetPersistentId(PipelineState);

        MeshDrawCommand.SetDrawParametersAndFinalize(MeshBatch, BatchElementIndex, PipelineId, ShadersForDebugging);

        if (UseGPUScene(GMaxRHIShaderPlatform, GMaxRHIFeatureLevel))
        {
            Experimental::FHashElementId SetId;
            auto hash = CachedMeshDrawCommandStateBuckets.ComputeHash(MeshDrawCommand);
            {
                FScopeLock Lock(&CachedMeshDrawCommandLock);

                (......)
                
                // 從快取雜湊表中查詢hash的id,如果不存在則新增新的. 從而達到了合併FMeshDrawCommand的目的。
                SetId = CachedMeshDrawCommandStateBuckets.FindOrAddIdByHash(hash, MeshDrawCommand, FMeshDrawCommandCount());
                // 計數加1
                CachedMeshDrawCommandStateBuckets.GetByElementId(SetId).Value.Num++;

                (......)
            }

            CommandInfo.StateBucketId = SetId.GetIndex();
        }
        else
        {
            FScopeLock Lock(&CachedMeshDrawCommandLock);
            // Only one FMeshDrawCommand supported per FStaticMesh in a pass
            // Allocate at lowest free index so that 'r.DoLazyStaticMeshUpdate' can shrink the TSparseArray more effectively
            CommandInfo.CommandIndex = CachedDrawLists.MeshDrawCommands.EmplaceAtLowestFreeIndex(CachedDrawLists.LowestFreeIndexSearchStart, MeshDrawCommand);
        }

        // 儲存其它資料.
        CommandInfo.SortKey = SortKey;
        CommandInfo.MeshFillMode = MeshFillMode;
        CommandInfo.MeshCullMode = MeshCullMode;
    }

private:
    FMeshDrawCommand MeshDrawCommandForStateBucketing;
    FCachedMeshDrawCommandInfo& CommandInfo;
    FCriticalSection& CachedMeshDrawCommandLock;
    FCachedPassMeshDrawList& CachedDrawLists;
    FStateBucketMap& CachedMeshDrawCommandStateBuckets; // 羅賓漢雜湊表,自動合併和計數具有相同雜湊值的FMeshDrawCommand。
    const FScene& Scene;
};

由此可見,從FMeshBatchFMeshDrawCommand階段,渲染器做了大量的處理,為的是將FMeshBatch轉換到FMeshDrawCommand,並儲存到FMeshPassProcessor的FMeshPassDrawListContext成員變數中。期間還從各個物件中收集或處理網格繪製指令所需的一切資料,以便進入後續的渲染流程。下圖展示了這些關鍵過程:

關於FMeshDrawCommand的合併,需要補充說明,動態繪製路徑模式的FDynamicPassMeshDrawListContextFMeshDrawCommand儲存於TArray結構內,不會合並FMeshDrawCommand,亦不會動態例項化網格,但可以提升基於狀態排序的魯棒性。

快取(靜態)繪製路徑模式的FCachedPassMeshDrawListContext依靠FStateBucketMap實現了合併和計數功能,以便在提交繪製階段例項化繪製。

另外補充一下,UE並沒有像Unity那樣的動態合批功能,只有編輯器階段手動合網格(見下圖)。

UE編輯器中內建的Actor合併工具開啟方式及其介面預覽。

3.2.4 從FMeshDrawCommand到RHICommandList

上一節已經詳盡地闡述瞭如何將FMeshBatch轉換成FMeshDrawCommand,本節將闡述後續的步驟,即如何將FMeshDrawCommand轉換到RHICommandList,期間又做了什麼處理和優化。

FMeshBatch轉換成FMeshDrawCommand後,每個Pass都對應了一個FMeshPassProcessor,每個FMeshPassProcessor儲存了該Pass需要繪製的所有FMeshDrawCommand,以便渲染器在合適的時間觸發並渲染。以最簡單的PrePass(深度Pass)為例:

void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
{
    (......)
    
    // FMeshBatch轉換成FMeshDrawCommand的邏輯在InitViews完成
    InitViews(RHICmdList, BasePassDepthStencilAccess, ILCTaskData, UpdateViewCustomDataEvents);
    
    (......)
    
    // 渲染PrePass(深度Pass)
    RenderPrePass(FRHICommandListImmediate& RHICmdList, TFunctionRef<void()> AfterTasksAreStarted)
    {
        bool bParallel = GRHICommandList.UseParallelAlgorithms() && CVarParallelPrePass.GetValueOnRenderThread();
        
        (......)
        
        if(EarlyZPassMode != DDM_None)
        {
            const bool bWaitForTasks = bParallel && (CVarRHICmdFlushRenderThreadTasksPrePass.GetValueOnRenderThread() > 0 || CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() > 0);

            // 遍歷所有view,每個view都渲染一次深度Pass。
            for(int32 ViewIndex = 0;ViewIndex < Views.Num();ViewIndex++)
            {
                const FViewInfo& View = Views[ViewIndex];

                // 處理深度Pass的渲染資源和狀態。
                TUniformBufferRef<FSceneTexturesUniformParameters> PassUniformBuffer;
                CreateDepthPassUniformBuffer(RHICmdList, View, PassUniformBuffer);

                FMeshPassProcessorRenderState DrawRenderState(View, PassUniformBuffer);

                SetupDepthPassState(DrawRenderState);

                if (View.ShouldRenderView())
                {
                    Scene->UniformBuffers.UpdateViewUniformBuffer(View);

                    if (bParallel)
                    {
                        // 並行渲染深度Pass。
                        bDepthWasCleared = RenderPrePassViewParallel(View, RHICmdList, DrawRenderState, AfterTasksAreStarted, !bDidPrePre) || bDepthWasCleared;
                        bDidPrePre = true;
                    }
                    (......)
                }

                (......)
            }
        }
        
        (......)
    }
}

// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

// 並行渲染深度Pass介面
bool FDeferredShadingSceneRenderer::RenderPrePassViewParallel(const FViewInfo& View, FRHICommandListImmediate& ParentCmdList, const FMeshPassProcessorRenderState& DrawRenderState, TFunctionRef<void()> AfterTasksAreStarted, bool bDoPrePre)
{
    bool bDepthWasCleared = false;

    {
        // 構造繪製指令儲存容器。
        FPrePassParallelCommandListSet ParallelCommandListSet(View, this, ParentCmdList,
            CVarRHICmdPrePassDeferredContexts.GetValueOnRenderThread() > 0, 
            CVarRHICmdFlushRenderThreadTasksPrePass.GetValueOnRenderThread() == 0 && CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() == 0,
            DrawRenderState);

        // 觸發並行繪製。
        View.ParallelMeshDrawCommandPasses[EMeshPass::DepthPass].DispatchDraw(&ParallelCommandListSet, ParentCmdList);

        (......)
    }

    (......)

    return bDepthWasCleared;
}


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchDraw(FParallelCommandListSet* ParallelCommandListSet, FRHICommandList& RHICmdList) const
{
    (......)
    
    FRHIVertexBuffer* PrimitiveIdsBuffer = PrimitiveIdVertexBufferPoolEntry.BufferRHI;
    const int32 BasePrimitiveIdsOffset = 0;

    if (ParallelCommandListSet)
    {
        (......)
        
        const ENamedThreads::Type RenderThread = ENamedThreads::GetRenderThread();

        // 處理前序任務。
        FGraphEventArray Prereqs;
        if (ParallelCommandListSet->GetPrereqs())
        {
            Prereqs.Append(*ParallelCommandListSet->GetPrereqs());
        }
        if (TaskEventRef.IsValid())
        {
            Prereqs.Add(TaskEventRef);
        }

        // 構造與工作執行緒數量相同的並行繪製任務數。
        const int32 NumThreads = FMath::Min<int32>(FTaskGraphInterface::Get().GetNumWorkerThreads(), ParallelCommandListSet->Width);
        const int32 NumTasks = FMath::Min<int32>(NumThreads, FMath::DivideAndRoundUp(MaxNumDraws, ParallelCommandListSet->MinDrawsPerCommandList));
        const int32 NumDrawsPerTask = FMath::DivideAndRoundUp(MaxNumDraws, NumTasks);

        // 遍歷NumTasks次,構造NumTasks個繪製任務(FDrawVisibleMeshCommandsAnyThreadTask)例項。
        for (int32 TaskIndex = 0; TaskIndex < NumTasks; TaskIndex++)
        {
            const int32 StartIndex = TaskIndex * NumDrawsPerTask;
            const int32 NumDraws = FMath::Min(NumDrawsPerTask, MaxNumDraws - StartIndex);
            checkSlow(NumDraws > 0);

            FRHICommandList* CmdList = ParallelCommandListSet->NewParallelCommandList();

            // 構造FDrawVisibleMeshCommandsAnyThreadTask例項並加入TaskGraph中,其中TaskContext.MeshDrawCommands就是上一節闡述過的由FMeshPassProcessor生成的。
            FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FDrawVisibleMeshCommandsAnyThreadTask>::CreateTask(&Prereqs, RenderThread).ConstructAndDispatchWhenReady(*CmdList, TaskContext.MeshDrawCommands, TaskContext.MinimalPipelineStatePassSet, PrimitiveIdsBuffer, BasePrimitiveIdsOffset, TaskContext.bDynamicInstancing, TaskContext.InstanceFactor, TaskIndex, NumTasks);
            // 將事件加入ParallelCommandListSet,以便追蹤深度Pass的並行繪製是否完成。
            ParallelCommandListSet->AddParallelCommandList(CmdList, AnyThreadCompletionEvent, NumDraws);
        }
    }
    
    (......)
}


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FDrawVisibleMeshCommandsAnyThreadTask::DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
{
    // 計算繪製的範圍
    const int32 DrawNum = VisibleMeshDrawCommands.Num();
    const int32 NumDrawsPerTask = TaskIndex < DrawNum ? FMath::DivideAndRoundUp(DrawNum, TaskNum) : 0;
    const int32 StartIndex = TaskIndex * NumDrawsPerTask;
    const int32 NumDraws = FMath::Min(NumDrawsPerTask, DrawNum - StartIndex);

    // 將繪製所需的資料傳遞到繪製介面
    SubmitMeshDrawCommandsRange(VisibleMeshDrawCommands, GraphicsMinimalPipelineStateSet, PrimitiveIdsBuffer, BasePrimitiveIdsOffset, bDynamicInstancing, StartIndex, NumDraws, InstanceFactor, RHICmdList);

    RHICmdList.EndRenderPass();
    RHICmdList.HandleRTThreadTaskCompletion(MyCompletionGraphEvent);
}

// 提交指定範圍的網格繪製指令。
void SubmitMeshDrawCommandsRange(
    const FMeshCommandOneFrameArray& VisibleMeshDrawCommands,
    const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
    FRHIVertexBuffer* PrimitiveIdsBuffer,
    int32 BasePrimitiveIdsOffset,
    bool bDynamicInstancing,
    int32 StartIndex,
    int32 NumMeshDrawCommands,
    uint32 InstanceFactor,
    FRHICommandList& RHICmdList)
{
    FMeshDrawCommandStateCache StateCache;

    // 遍歷給定範圍的繪製指令,一個一個提交。
    for (int32 DrawCommandIndex = StartIndex; DrawCommandIndex < StartIndex + NumMeshDrawCommands; DrawCommandIndex++)
    {
        const FVisibleMeshDrawCommand& VisibleMeshDrawCommand = VisibleMeshDrawCommands[DrawCommandIndex];
        const int32 PrimitiveIdBufferOffset = BasePrimitiveIdsOffset + (bDynamicInstancing ? VisibleMeshDrawCommand.PrimitiveIdBufferOffset : DrawCommandIndex) * sizeof(int32);
        // 提交單個MeshDrawCommand.
        FMeshDrawCommand::SubmitDraw(*VisibleMeshDrawCommand.MeshDrawCommand, GraphicsMinimalPipelineStateSet, PrimitiveIdsBuffer, PrimitiveIdBufferOffset, InstanceFactor, RHICmdList, StateCache);
    }
}

// 提交單個MeshDrawCommand到RHICommandList.
void FMeshDrawCommand::SubmitDraw(
    const FMeshDrawCommand& RESTRICT MeshDrawCommand, 
    const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
    FRHIVertexBuffer* ScenePrimitiveIdsBuffer,
    int32 PrimitiveIdOffset,
    uint32 InstanceFactor,
    FRHICommandList& RHICmdList,
    FMeshDrawCommandStateCache& RESTRICT StateCache)
{
    (......)
    
    const FGraphicsMinimalPipelineStateInitializer& MeshPipelineState = MeshDrawCommand.CachedPipelineId.GetPipelineState(GraphicsMinimalPipelineStateSet);

    // 設定和快取PSO.
    if (MeshDrawCommand.CachedPipelineId.GetId() != StateCache.PipelineId)
    {
        FGraphicsPipelineStateInitializer GraphicsPSOInit = MeshPipelineState.AsGraphicsPipelineStateInitializer();
        RHICmdList.ApplyCachedRenderTargets(GraphicsPSOInit);
        SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
        StateCache.SetPipelineState(MeshDrawCommand.CachedPipelineId.GetId());
    }

    // 設定和快取模板值.
    if (MeshDrawCommand.StencilRef != StateCache.StencilRef)
    {
        RHICmdList.SetStencilRef(MeshDrawCommand.StencilRef);
        StateCache.StencilRef = MeshDrawCommand.StencilRef;
    }

    // 設定頂點資料.
    for (int32 VertexBindingIndex = 0; VertexBindingIndex < MeshDrawCommand.VertexStreams.Num(); VertexBindingIndex++)
    {
        const FVertexInputStream& Stream = MeshDrawCommand.VertexStreams[VertexBindingIndex];

        if (MeshDrawCommand.PrimitiveIdStreamIndex != -1 && Stream.StreamIndex == MeshDrawCommand.PrimitiveIdStreamIndex)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, ScenePrimitiveIdsBuffer, PrimitiveIdOffset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
        else if (StateCache.VertexStreams[Stream.StreamIndex] != Stream)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, Stream.VertexBuffer, Stream.Offset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
    }

    // 設定shader繫結的資源.
    MeshDrawCommand.ShaderBindings.SetOnCommandList(RHICmdList, MeshPipelineState.BoundShaderState.AsBoundShaderState(), StateCache.ShaderBindings);

    // 根據不同的資料呼叫不同型別的繪製指令到RHICommandList.
    if (MeshDrawCommand.IndexBuffer)
    {
        if (MeshDrawCommand.NumPrimitives > 0)
        {
            RHICmdList.DrawIndexedPrimitive(
                MeshDrawCommand.IndexBuffer,
                MeshDrawCommand.VertexParams.BaseVertexIndex,
                0,
                MeshDrawCommand.VertexParams.NumVertices,
                MeshDrawCommand.FirstIndex,
                MeshDrawCommand.NumPrimitives,
                MeshDrawCommand.NumInstances * InstanceFactor
            );
        }
        else
        {
            RHICmdList.DrawIndexedPrimitiveIndirect(
                MeshDrawCommand.IndexBuffer, 
                MeshDrawCommand.IndirectArgs.Buffer, 
                MeshDrawCommand.IndirectArgs.Offset
                );
        }
    }
    else
    {
        if (MeshDrawCommand.NumPrimitives > 0)
        {
            RHICmdList.DrawPrimitive(
                MeshDrawCommand.VertexParams.BaseVertexIndex + MeshDrawCommand.FirstIndex,
                MeshDrawCommand.NumPrimitives,
                    MeshDrawCommand.NumInstances * InstanceFactor);
        }
        else
        {
            RHICmdList.DrawPrimitiveIndirect(
                MeshDrawCommand.IndirectArgs.Buffer,
                MeshDrawCommand.IndirectArgs.Offset);
        }
    }
}

上述程式碼已經詳盡第闡述了PrePass(深度通道)的繪製過程。關於從FMeshDrawCommand到RHICommandList需要補充以下說明:

  • 每個Pass都會執行類似上面的過程,同一幀會執行多次,但並不是所有的Pass都會開啟,可通過view的PassMask動態開啟和關閉。

  • DispatchDraw和SubmitMeshDrawCommandsRange特意採用了扁平化的陣列,並且考慮了以下因素:

    • 只通過可見性集合就可以方便快捷地劃分FVisibleMeshDrawCommand的陣列,以便扁平化地將向多執行緒系統TaskGraph提交FMeshDrawCommand繪製指令。
    • 通過對FMeshDrawCommand列表的排序和增加StateCache減少向RHICommandList提交的指令數量,減少RHICommandList轉換和執行的負載。增加這個步驟後,Fortnite可以減少20%的RHI執行時間。
    • 快取一致性的遍歷。緊密地打包FMeshDrawCommand,輕量化、扁平化且連續地在記憶體中儲存SubmitDraw所需的資料,可以提升快取和預存取命中率。
      • TChunkedArray<FMeshDrawCommand> MeshDrawCommands;
      • typedef TArray<FVisibleMeshDrawCommand, SceneRenderingAllocator> FMeshCommandOneFrameArray;
      • TArray<FMeshDrawShaderBindingsLayout, TInlineAllocator<2>>ShaderLayouts;
      • typedef TArray<FVertexInputStream, TInlineAllocator<4>>FVertexInputStreamArray;
      • const int32 NumInlineShaderBindings = 10;
  • 將MeshDrawCommandPasses轉成RHICommandList的命令時支援並行模式,並行的分配策略只是簡單地將地將陣列平均分成等同於工作執行緒的數量,然後每個工作執行緒執行指定範圍的繪製指令。這樣做的好處是實現簡單快捷易於理解,提升CPU的cache命中率,缺點是每個組內的任務執行時間可能存在較大的差異,這樣整體的執行時間由最長的一組決定,勢必拉長了時間,降低並行效率。針對這個問題,筆者想出了一些策略:

    • 啟發性策略。記錄上一幀每個MeshDrawCommand的執行時間,下一幀根據它們的執行時間將相鄰的MeshDrawCommand相加,當它們的總和趨近每組的平均值時,作為一組執行體。
    • 考察MeshDrawCommand的某個或某幾個屬性。比如以網格的面數或材質數為分組的依據,將每組MeshDrawCommand的考察屬性之和大致相同。

    當然以上策略會增加邏輯複雜度,也可能降低CPU的cache命中率,實際效果要以執行環境為準。

  • FMeshDrawCommand::SubmitDraw的過程做了PSO和模板值的快取,防止向RHICommandList提交重複的資料和指令,減少CPU和GPU的IO互動。

    CPU與GPU之間的IO和渲染狀態的切換一直是困擾實時渲染領域的問題,在CPU和GPU異構的體系中尤為明顯。所以,減少CPU和GPU的資料互動是渲染效能優化的一大措施。採取快取PSO等狀態後,在極端情況下,可以帶來數倍的效能提升。

  • FMeshDrawCommand::SubmitDraw支援四種繪製模型,一個維度為是否有頂點索引,另一個維度為是否Indirect繪製。

    Indirect Draw簡介

    在沒有Indirect Draw之前,應用程式如果想要實現同一個Draw Call繪製多個物體,只能使用GPU Instance,但是GPU Instance有非常多的限制,比如需要完全一樣的頂點、索引、渲染狀態和材質資料,只允許Transform不一樣。即使貼圖可以打包Atlas,材質屬性和模型網格可以打包StructuredBuffer,也沒法避免每次繪製時頂點數必須一樣這一個致命的限制,想要實現GPU Driven Rendering Pipeline必須打碎成相同頂點數的Cluster。

    Indirect Draw技術的出現,GPU驅動的渲染管線將變得更加簡單且高效。它的核心思想是允許將同一個網格所需的資源引用放入一個Argument Buffer:

    不同網格的Argument Buffer又可以組成更長的Buffer:

    由於每個網格的資料可以儲存在不同的GPU執行緒中,可以並行地執行多個網格之間的繪製,相較傳統的序列繪製必然有明顯的效率提升:

    但是,Indirect Draw只在DirectX11、DirecXt12、Vulkan、Metal等現代圖形API中支援。

3.2.5 從RHICommandList到GPU

RHI全稱Rendering Hardware Interface(渲染硬體介面),是不同圖形API的抽象層,而RHICommandList便是負責收錄與圖形API無關的中間層繪製指令和資料。

RHICommandList收錄了一系列中間繪製指令之後,會在RHI執行緒一一轉換到對應目標圖形API的介面,下面以FRHICommandList::DrawIndexedPrimitive介面為例:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

void FRHICommandList::DrawIndexedPrimitive(FRHIIndexBuffer* IndexBuffer, int32 BaseVertexIndex, uint32 FirstInstance, uint32 NumVertices, uint32 StartIndex, uint32 NumPrimitives, uint32 NumInstances)
{
    if (!IndexBuffer)
    {
        UE_LOG(LogRHI, Fatal, TEXT("Tried to call DrawIndexedPrimitive with null IndexBuffer!"));
    }

    // 繞開RHI執行緒直接執行.
    if (Bypass())
    {
        GetContext().RHIDrawIndexedPrimitive(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
        return;
    }
    
    // 建立繪製指令.
    ALLOC_COMMAND(FRHICommandDrawIndexedPrimitive)(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
}

// FRHICommandDrawIndexedPrimitive的宣告體
FRHICOMMAND_MACRO(FRHICommandDrawIndexedPrimitive)
{
    // 命令所需的資料.
    FRHIIndexBuffer* IndexBuffer;
    int32 BaseVertexIndex;
    uint32 FirstInstance;
    uint32 NumVertices;
    uint32 StartIndex;
    uint32 NumPrimitives;
    uint32 NumInstances;
    
    FRHICommandDrawIndexedPrimitive(FRHIIndexBuffer* InIndexBuffer, int32 InBaseVertexIndex, uint32 InFirstInstance, uint32 InNumVertices, uint32 InStartIndex, uint32 InNumPrimitives, uint32 InNumInstances)
        : IndexBuffer(InIndexBuffer)
        , BaseVertexIndex(InBaseVertexIndex)
        , FirstInstance(InFirstInstance)
        , NumVertices(InNumVertices)
        , StartIndex(InStartIndex)
        , NumPrimitives(InNumPrimitives)
        , NumInstances(InNumInstances)
    {
    }
    
    // 執行此命令的介面.
    RHI_API void Execute(FRHICommandListBase& CmdList);
};


// Engine\Source\Runtime\RHI\Public\RHICommandListCommandExecutes.inl

// FRHICommandDrawIndexedPrimitive的執行介面實現.
void FRHICommandDrawIndexedPrimitive::Execute(FRHICommandListBase& CmdList)
{
    RHISTAT(DrawIndexedPrimitive);
    INTERNAL_DECORATOR(RHIDrawIndexedPrimitive)(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
}

// INTERNAL_DECORATOR的巨集實際上就是呼叫RHICommandList內IRHICommandContext的對應介面.
#if !defined(INTERNAL_DECORATOR)
    #define INTERNAL_DECORATOR(Method) CmdList.GetContext().Method
#endif


// Engine\Source\Runtime\RHI\Public\RHICommandList.h

// 分配RHI命令的巨集定義
#define ALLOC_COMMAND(...) new ( AllocCommand(sizeof(__VA_ARGS__), alignof(__VA_ARGS__)) ) __VA_ARGS__

// 分配RHI命令的介面.
void* FRHICommandListBase::AllocCommand(int32 AllocSize, int32 Alignment)
{
    checkSlow(!IsExecuting());
    // 從命令記憶體管理器分配記憶體.
    FRHICommandBase* Result = (FRHICommandBase*) MemManager.Alloc(AllocSize, Alignment);
    ++NumCommands;
    // 將新分配的命令加到連結串列的尾部.
    *CommandLink = Result;
    CommandLink = &Result->Next;
    return Result;
}

從上面可以知道,通過預先定義的巨集FRHICOMMAND_MACROINTERNAL_DECORATORALLOC_COMMAND將RHICommandList中間層繪製指令,經過IRHICommandContext轉換到對應圖形API,以便後續提交繪製指令到GPU。

 

3.3 靜態和動態繪製路徑

3.3.1 繪製路徑概述

3.2章節中其實已經出現了若干靜態路徑和動態路徑的影子,但更多是以動態路徑進行闡述。實際上,UE為了優化靜態網格的繪製,分離出了靜態繪製路徑,以便對其做定製化的效能優化。靜態路徑又分為兩種,一種是需要View的資訊,另一種是不需要View的資訊,可以執行更多的快取優化:

UE存在3種網格繪製路徑(橙色為每幀動態生成,藍色為只生成一次後快取):第1種是動態繪製路徑,從FPrimitiveSceneProxy到RHICommandList每幀都會動態建立,效率最低,但可控性最強;第2種是需要View的靜態路徑,可以快取FMeshBatch資料,效率中,可控性中;第3種是不需要view的靜態繪製路徑,可以快取FMeshBatch和FMeshDrawCommand,效率最高,但可控性差,需滿足的條件多。

靜態繪製路徑的快取資料只需要生成一次,所以可以減少渲染執行緒執行時間,提升執行效率。諸如靜態網格,通過實現DrawStaticElements介面注入FStaticMeshBatch,而DrawStaticElements通常是SceneProxy加入場景時被呼叫的。

3.3.2 動態繪製路徑

動態繪製路徑每幀都會重建FMeshBatch資料,而不會快取,因此可擴充套件性最強,但效率最低。常用於粒子特效、骨骼動畫、程式動態網格以及需要每幀更新資料的網格。通過GetDynamicMeshElements介面來收集FMeshBatch,具體參見[3.2 模型繪製管線](#3.2 模型繪製管線)。

FParallelMeshDrawCommandPass是通用的網格Pass,建議只用於效能較關鍵的網格Pass中,因為只支援並行和快取渲染。如果要使用並行或快取路徑,必須經過嚴格的設計,因為在InitViews之後不能修改網格繪製命令和shader繫結的任何資料。章節3.2已經出現過FParallelMeshDrawCommandPass的程式碼,不過為了進一步說明它的使用方式,下面找個相對簡潔的陰影渲染的例子:

// Engine\Source\Runtime\Renderer\Private\ShadowRendering.h

class FProjectedShadowInfo : public FRefCountedObject
{
    (......)
    
    // 宣告FParallelMeshDrawCommandPass例項
    FParallelMeshDrawCommandPass ShadowDepthPass;
    
    (......)
};


// Engine\Source\Runtime\Renderer\Private\ShadowDepthRendering.cpp

void FProjectedShadowInfo::RenderDepthInner(FRHICommandListImmediate& RHICmdList, FSceneRenderer* SceneRenderer, FBeginShadowRenderPassFunction BeginShadowRenderPass, bool bDoParallelDispatch)
{
    (......)

    // 並行模式
    if (bDoParallelDispatch)
    {
        bool bFlush = CVarRHICmdFlushRenderThreadTasksShadowPass.GetValueOnRenderThread() > 0
            || CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() > 0;
        FScopedCommandListWaitForTasks Flusher(bFlush);

        {
            // 構建並行處理集,用於存放生成的RHICommandList列表。
            FShadowParallelCommandListSet ParallelCommandListSet(*ShadowDepthView, SceneRenderer, RHICmdList, CVarRHICmdShadowDeferredContexts.GetValueOnRenderThread() > 0, !bFlush, DrawRenderState, *this, BeginShadowRenderPass);

            // 傳送繪製指令
            ShadowDepthPass.DispatchDraw(&ParallelCommandListSet, RHICmdList);
        }
    }
    // 非並行模式
    else
    {
        ShadowDepthPass.DispatchDraw(nullptr, RHICmdList);
    }
}

使用起來很簡單很方便是不?這就是UE在背後為我們做了大量的封裝和細節處理。

除了FParallelMeshDrawCommandPass,還有一種更簡單的呼叫繪製指令的方式:DrawDynamicMeshPass。DrawDynamicMeshPass只需要傳入view/RHICommandList以及一個lambda匿名函式就可,它的宣告及使用例子如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.inl

// DrawDynamicMeshPass的宣告
template<typename LambdaType>
void DrawDynamicMeshPass(const FSceneView& View, FRHICommandList& RHICmdList, const LambdaType& BuildPassProcessorLambda, bool bForceStereoInstancingOff = false);


// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

void FDeferredShadingSceneRenderer::RenderPrePassEditorPrimitives(FRHICommandList& RHICmdList, const FViewInfo& View, const FMeshPassProcessorRenderState& DrawRenderState, EDepthDrawingMode DepthDrawingMode, bool bRespectUseAsOccluderFlag) 
{
    (......)

    bool bDirty = false;
    if (!View.Family->EngineShowFlags.CompositeEditorPrimitives)
    {
        const bool bNeedToSwitchVerticalAxis = RHINeedsToSwitchVerticalAxis(ShaderPlatform);
        const FScene* LocalScene = Scene;

        // 呼叫DrawDynamicMeshPass處理深度Pass。
        DrawDynamicMeshPass(View, RHICmdList,
            [&View, &DrawRenderState, LocalScene, DepthDrawingMode, bRespectUseAsOccluderFlag](FDynamicPassMeshDrawListContext* DynamicMeshPassContext)
            {
                FDepthPassMeshProcessor PassMeshProcessor(
                    LocalScene,
                    &View,
                    DrawRenderState,
                    bRespectUseAsOccluderFlag,
                    DepthDrawingMode,
                    false,
                    DynamicMeshPassContext);

                const uint64 DefaultBatchElementMask = ~0ull;
                    
                for (int32 MeshIndex = 0; MeshIndex < View.ViewMeshElements.Num(); MeshIndex++)
                {
                    const FMeshBatch& MeshBatch = View.ViewMeshElements[MeshIndex];
                    PassMeshProcessor.AddMeshBatch(MeshBatch, DefaultBatchElementMask, nullptr);
                }
            });

        (......)
    }
}

3.3.3 靜態繪製路徑

靜態繪製路徑通常可以被快取,所以也叫快取繪製路徑,適用的物件可以是靜態模型(可在UE編輯器的網格屬性皮膚中指定,見下圖)。

靜態模型在其對應的FPrimitiveSceneInfo在呼叫AddToScene時,被執行快取處理,下面是具體的處理程式碼和解析:

// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp

void FPrimitiveSceneInfo::AddToScene(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bUpdateStaticDrawLists, bool bAddToStaticDrawLists, bool bAsyncCreateLPIs)
{
    (......)

    {
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddToScene_AddStaticMeshes, FColor::Magenta);
        // 處理靜態模型
        if (bUpdateStaticDrawLists)
        {
            AddStaticMeshes(RHICmdList, Scene, SceneInfos, bAddToStaticDrawLists);
        }
    }

    (......)
}

void FPrimitiveSceneInfo::AddStaticMeshes(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bAddToStaticDrawLists)
{
    LLM_SCOPE(ELLMTag::StaticMesh);

    {
        // 並行處理靜態圖元。
        ParallelForTemplate(SceneInfos.Num(), [Scene, &SceneInfos](int32 Index)
        {
            SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddStaticMeshes_DrawStaticElements, FColor::Magenta);
            FPrimitiveSceneInfo* SceneInfo = SceneInfos[Index];
            // 快取圖元的靜態元素。
            FBatchingSPDI BatchingSPDI(SceneInfo);
            BatchingSPDI.SetHitProxy(SceneInfo->DefaultDynamicHitProxy);
            // 呼叫Proxy的DrawStaticElements介面,將收集到的FStaticMeshBatch新增到SceneInfo->StaticMeshes中。
            SceneInfo->Proxy->DrawStaticElements(&BatchingSPDI);
            SceneInfo->StaticMeshes.Shrink();
            SceneInfo->StaticMeshRelevances.Shrink();

            check(SceneInfo->StaticMeshRelevances.Num() == SceneInfo->StaticMeshes.Num());
        });
    }

    {
        // 將所有PrimitiveSceneInfo的staticMeshBatch新增到場景的StaticMeshe列表。
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddStaticMeshes_UpdateSceneArrays, FColor::Blue);
        for (FPrimitiveSceneInfo* SceneInfo : SceneInfos)
        {
            for (int32 MeshIndex = 0; MeshIndex < SceneInfo->StaticMeshes.Num(); MeshIndex++)
            {
                FStaticMeshBatchRelevance& MeshRelevance = SceneInfo->StaticMeshRelevances[MeshIndex];
                FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshIndex];

                // Add the static mesh to the scene's static mesh list.
                // 新增靜態網格元素到場景的靜態網格列表。
                FSparseArrayAllocationInfo SceneArrayAllocation = Scene->StaticMeshes.AddUninitialized();
                Scene->StaticMeshes[SceneArrayAllocation.Index] = &Mesh;
                Mesh.Id = SceneArrayAllocation.Index;
                MeshRelevance.Id = SceneArrayAllocation.Index;

                // 處理逐元素的可見性(如果需要的話)。
                if (Mesh.bRequiresPerElementVisibility)
                {
                    // Use a separate index into StaticMeshBatchVisibility, since most meshes don't use it
                    Mesh.BatchVisibilityId = Scene->StaticMeshBatchVisibility.AddUninitialized().Index;
                    Scene->StaticMeshBatchVisibility[Mesh.BatchVisibilityId] = true;
                }
            }
        }
    }

    // 快取靜態的MeshDrawCommand
    if (bAddToStaticDrawLists)
    {
        CacheMeshDrawCommands(RHICmdList, Scene, SceneInfos);
    }
}

void FPrimitiveSceneInfo::CacheMeshDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos)
{
    //@todo - only need material uniform buffers to be created since we are going to cache pointers to them
    // Any updates (after initial creation) don't need to be forced here
    FMaterialRenderProxy::UpdateDeferredCachedUniformExpressions();

    SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_CacheMeshDrawCommands, FColor::Emerald);

    QUICK_SCOPE_CYCLE_COUNTER(STAT_CacheMeshDrawCommands);
    FMemMark Mark(FMemStack::Get());

    // 計數並行的執行緒數量。
    static constexpr int BATCH_SIZE = 64;
    const int NumBatches = (SceneInfos.Num() + BATCH_SIZE - 1) / BATCH_SIZE;

    // 執行緒回撥。
    auto DoWorkLambda = [Scene, SceneInfos](int32 Index)
    {
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_CacheMeshDrawCommand, FColor::Green);

        struct FMeshInfoAndIndex
        {
            int32 InfoIndex;
            int32 MeshIndex;
        };

        TArray<FMeshInfoAndIndex, TMemStackAllocator<>> MeshBatches;
        MeshBatches.Reserve(3 * BATCH_SIZE);

        // 遍歷當前執行緒的範圍,逐個處理PrimitiveSceneInfo
        int LocalNum = FMath::Min((Index * BATCH_SIZE) + BATCH_SIZE, SceneInfos.Num());
        for (int LocalIndex = (Index * BATCH_SIZE); LocalIndex < LocalNum; LocalIndex++)
        {
            FPrimitiveSceneInfo* SceneInfo = SceneInfos[LocalIndex];
            check(SceneInfo->StaticMeshCommandInfos.Num() == 0);
            SceneInfo->StaticMeshCommandInfos.AddDefaulted(EMeshPass::Num * SceneInfo->StaticMeshes.Num());
            FPrimitiveSceneProxy* SceneProxy = SceneInfo->Proxy;

            // 體積透明陰影需要每幀更新,不能快取。
            if (!SceneProxy->CastsVolumetricTranslucentShadow())
            {
                // 將PrimitiveSceneInfo的所有靜態網格新增到MeshBatch列表。
                for (int32 MeshIndex = 0; MeshIndex < SceneInfo->StaticMeshes.Num(); MeshIndex++)
                {
                    FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshIndex];
                    // 檢測一下是否支援快取MeshDrawCommand
                    if (SupportsCachingMeshDrawCommands(Mesh))
                    {
                        MeshBatches.Add(FMeshInfoAndIndex{ LocalIndex, MeshIndex });
                    }
                }
            }
        }

        // 遍歷所有預定義Pass,將每個靜態元素生成的MeshDrawCommand新增到對應Pass的快取列表中。
        for (int32 PassIndex = 0; PassIndex < EMeshPass::Num; PassIndex++)
        {
            const EShadingPath ShadingPath = Scene->GetShadingPath();
            EMeshPass::Type PassType = (EMeshPass::Type)PassIndex;

            if ((FPassProcessorManager::GetPassFlags(ShadingPath, PassType) & EMeshPassFlags::CachedMeshCommands) != EMeshPassFlags::None)
            {
                // 宣告快取繪製命令例項
                FCachedMeshDrawCommandInfo CommandInfo(PassType);

                // 從場景中獲取對應Pass的各種容器,以構建FCachedPassMeshDrawListContext。
                FCriticalSection& CachedMeshDrawCommandLock = Scene->CachedMeshDrawCommandLock[PassType];
                FCachedPassMeshDrawList& SceneDrawList = Scene->CachedDrawLists[PassType];
                FStateBucketMap& CachedMeshDrawCommandStateBuckets = Scene->CachedMeshDrawCommandStateBuckets[PassType];
                FCachedPassMeshDrawListContext CachedPassMeshDrawListContext(CommandInfo, CachedMeshDrawCommandLock, SceneDrawList, CachedMeshDrawCommandStateBuckets, *Scene);

                // 建立Pass的FMeshPassProcessor
                PassProcessorCreateFunction CreateFunction = FPassProcessorManager::GetCreateFunction(ShadingPath, PassType);
                FMeshPassProcessor* PassMeshProcessor = CreateFunction(Scene, nullptr, &CachedPassMeshDrawListContext);

                if (PassMeshProcessor != nullptr)
                {
                    for (const FMeshInfoAndIndex& MeshAndInfo : MeshBatches)
                    {
                        FPrimitiveSceneInfo* SceneInfo = SceneInfos[MeshAndInfo.InfoIndex];
                        FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshAndInfo.MeshIndex];
                        
                        CommandInfo = FCachedMeshDrawCommandInfo(PassType);
                        FStaticMeshBatchRelevance& MeshRelevance = SceneInfo->StaticMeshRelevances[MeshAndInfo.MeshIndex];

                        check(!MeshRelevance.CommandInfosMask.Get(PassType));

                        check(!Mesh.bRequiresPerElementVisibility);
                        uint64 BatchElementMask = ~0ull;
                        // 新增MeshBatch到PassMeshProcessor,內部會將FMeshBatch轉換到FMeshDrawCommand。
                        PassMeshProcessor->AddMeshBatch(Mesh, BatchElementMask, SceneInfo->Proxy);

                        if (CommandInfo.CommandIndex != -1 || CommandInfo.StateBucketId != -1)
                        {
                            static_assert(sizeof(MeshRelevance.CommandInfosMask) * 8 >= EMeshPass::Num, "CommandInfosMask is too small to contain all mesh passes.");
                            MeshRelevance.CommandInfosMask.Set(PassType);
                            MeshRelevance.CommandInfosBase++;

                            int CommandInfoIndex = MeshAndInfo.MeshIndex * EMeshPass::Num + PassType;
                            check(SceneInfo->StaticMeshCommandInfos[CommandInfoIndex].MeshPass == EMeshPass::Num);
                            // 將CommandInfo快取到PrimitiveSceneInfo中。
                            SceneInfo->StaticMeshCommandInfos[CommandInfoIndex] = CommandInfo;
                            
                            (......)
                        }
                    }
                    // 銷燬FMeshPassProcessor
                    PassMeshProcessor->~FMeshPassProcessor();
                }
            }
        }

        (......)
    };

    // 並行模式
    if (FApp::ShouldUseThreadingForPerformance())
    {
        ParallelForTemplate(NumBatches, DoWorkLambda, EParallelForFlags::PumpRenderingThread);
    }
    // 單執行緒模式
    else
    {
        for (int Idx = 0; Idx < NumBatches; Idx++)
        {
            DoWorkLambda(Idx);
        }
    }

    FGraphicsMinimalPipelineStateId::InitializePersistentIds();
                    
    (.....)
}

上面的程式碼可知,靜態網格在加入場景時就會快取FMeshBatch,並且可能快取對應的FMeshDrawCommand。其中判斷是否支援快取FMeshDrawCommand的關鍵介面是SupportsCachingMeshDrawCommands,它的實現如下:

// Engine\Source\Runtime\Engine\Private\PrimitiveSceneProxy.cpp

bool SupportsCachingMeshDrawCommands(const FMeshBatch& MeshBatch)
{
    return
        // FMeshBatch只有一個元素。
        (MeshBatch.Elements.Num() == 1) &&

        // 頂點工廠支援快取FMeshDrawCommand
        MeshBatch.VertexFactory->GetType()->SupportsCachingMeshDrawCommands();
}


// Engine\Source\Runtime\RenderCore\Public\VertexFactory.h

bool FVertexFactoryType::SupportsCachingMeshDrawCommands() const 
{ 
    return bSupportsCachingMeshDrawCommands; 
}

由此可見,決定是否可以快取FMeshDrawCommand的條件是FMeshBatch只有一個元素且其使用的頂點工廠支援快取。

目前只有FLocalVertexFactory (UStaticMeshComponent)支援,其它頂點工廠都需要依賴view設定shader繫結。

只要任何一個條件不滿足,則無法快取FMeshDrawCommand。更詳細地說,需要滿足以下條件:

  • 該Pass是EMeshPass::Type的列舉。
  • EMeshPassFlags::CachedMeshCommands標記在註冊自定義mesh pass processor時被正確傳遞。
  • mesh pass processor可以不依賴FSceneView就處理好所有shader繫結資料,以為快取期間FSceneView為null。

需要注意的是,快取的繪製命令所引用的任何資料發生了改變,都必須使該命令無效並重新生成。

呼叫FPrimitiveSceneInfo::BeginDeferredUpdateStaticMeshes可以讓指定繪製命令無效。

設定Scene->bScenesPrimitivesNeedStaticMeshElementUpdate為true可以讓所有快取失效,會嚴重影響效能,建議不用或少用。

使快取無效會影響渲染效能,可選的替代方案是將可變的資料放到該Pass的UniformBuffer,通過UniformBuffer去執行不同的shader邏輯,以分離對基於view的shader繫結的依賴。

與動態繪製路徑不一樣的是,在收集靜態網格元素時,呼叫的是FPrimitiveSceneProxy::DrawStaticElements介面,這個介面由具體的子類實現,下面來看看其子類FStaticMeshSceneProxy的實現過程:

// Engine\Source\Runtime\Engine\Private\StaticMeshRender.cpp

void FStaticMeshSceneProxy::DrawStaticElements(FStaticPrimitiveDrawInterface* PDI)
{
    checkSlow(IsInParallelRenderingThread());
    
    // 是否開啟bUseViewOwnerDepthPriorityGroup
    if (!HasViewDependentDPG())
    {
        // Determine the DPG the primitive should be drawn in.
        uint8 PrimitiveDPG = GetStaticDepthPriorityGroup();
        int32 NumLODs = RenderData->LODResources.Num();
        //Never use the dynamic path in this path, because only unselected elements will use DrawStaticElements
        bool bIsMeshElementSelected = false;
        const auto FeatureLevel = GetScene().GetFeatureLevel();
        const bool IsMobile = IsMobilePlatform(GetScene().GetShaderPlatform());
        const int32 NumRuntimeVirtualTextureTypes = RuntimeVirtualTextureMaterialTypes.Num();

        //check if a LOD is being forced
        if (ForcedLodModel > 0) 
        {
            // 獲取LOD級別(索引)
            int32 LODIndex = FMath::Clamp(ForcedLodModel, ClampedMinLOD + 1, NumLODs) - 1;
            const FStaticMeshLODResources& LODModel = RenderData->LODResources[LODIndex];

            // 繪製所有子模型。
            for(int32 SectionIndex = 0; SectionIndex < LODModel.Sections.Num(); SectionIndex++)
            {
                const int32 NumBatches = GetNumMeshBatches();
                PDI->ReserveMemoryForMeshes(NumBatches * (1 + NumRuntimeVirtualTextureTypes));

                // 將所有批次的元素加入PDI繪製。
                for (int32 BatchIndex = 0; BatchIndex < NumBatches; BatchIndex++)
                {
                    FMeshBatch BaseMeshBatch;

                    if (GetMeshElement(LODIndex, BatchIndex, SectionIndex, PrimitiveDPG, bIsMeshElementSelected, true, BaseMeshBatch))
                    {
                        (......)
                        {
                            // 加入到PDI執行繪製
                            PDI->DrawMesh(BaseMeshBatch, FLT_MAX);
                        }
                    }
                }
            }
        } 
        
        (......)
    }
}

由此可見,DrawStaticElements介面會傳入FStaticPrimitiveDrawInterface的例項,以收集該PrimitiveSceneProxy的所有靜態元素,下面進入FStaticPrimitiveDrawInterface及其子類FBatchingSPDI的宣告和實現,以探其真容:

// Engine\Source\Runtime\Engine\Public\SceneManagement.h

class FStaticPrimitiveDrawInterface
{
public:
    virtual void SetHitProxy(HHitProxy* HitProxy) = 0;
    virtual void ReserveMemoryForMeshes(int32 MeshNum) = 0;

    // PDI的繪製介面
    virtual void DrawMesh(const FMeshBatch& Mesh, float ScreenSize) = 0;
};


// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp

class FBatchingSPDI : public FStaticPrimitiveDrawInterface
{
public:
    (......)

    // 實現PDI的繪製介面
    virtual void DrawMesh(const FMeshBatch& Mesh, float ScreenSize) final override
    {
        if (Mesh.HasAnyDrawCalls())
        {
            FPrimitiveSceneProxy* PrimitiveSceneProxy = PrimitiveSceneInfo->Proxy;
            PrimitiveSceneProxy->VerifyUsedMaterial(Mesh.MaterialRenderProxy);

            // 建立新的FStaticMeshBatch例項,且加入到PrimitiveSceneInfo的StaticMeshe列表中。
            FStaticMeshBatch* StaticMesh = new(PrimitiveSceneInfo->StaticMeshes) FStaticMeshBatch(
                PrimitiveSceneInfo,
                Mesh,
                CurrentHitProxy ? CurrentHitProxy->Id : FHitProxyId()
                );

            const ERHIFeatureLevel::Type FeatureLevel = PrimitiveSceneInfo->Scene->GetFeatureLevel();
            StaticMesh->PreparePrimitiveUniformBuffer(PrimitiveSceneProxy, FeatureLevel);

            // Volumetric self shadow mesh commands need to be generated every frame, as they depend on single frame uniform buffers with self shadow data.
            const bool bSupportsCachingMeshDrawCommands = SupportsCachingMeshDrawCommands(*StaticMesh, FeatureLevel) && !PrimitiveSceneProxy->CastsVolumetricTranslucentShadow();

            // 處理Relevance
            bool bUseSkyMaterial = Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel)->IsSky();
            bool bUseSingleLayerWaterMaterial = Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel)->GetShadingModels().HasShadingModel(MSM_SingleLayerWater);
            FStaticMeshBatchRelevance* StaticMeshRelevance = new(PrimitiveSceneInfo->StaticMeshRelevances) FStaticMeshBatchRelevance(
                *StaticMesh, 
                ScreenSize, 
                bSupportsCachingMeshDrawCommands,
                bUseSkyMaterial,
                bUseSingleLayerWaterMaterial,
                FeatureLevel
            );
        }
    }

private:
    FPrimitiveSceneInfo* PrimitiveSceneInfo;
    TRefCountPtr<HHitProxy> CurrentHitProxy;
};

FBatchingSPDI::DrawMesh最主要作用是將PrimitiveSceneProxy轉換成FStaticMeshBatch,然後處理網格的Relevance資料。

 

3.4 渲染機制總結

3.4.1 繪製管線優化技術

前面章節已經詳細闡述了UE是如何將圖元從Component一步步地轉成最終的繪製指令,這樣做的目的主要是為了提升渲染效能,總結起來,涉及的優化技術主要有以下幾點:

  • 繪製呼叫合併

由於所有的FMeshDrawCommands 都是事先捕獲,而不是立即提交給GPU,這就給Draw Call合併提供了有利的基礎保障。不過目前版本的合併是基於D3D11的特性,根據shader繫結決定是否合併成同一個instance呼叫。基於D3D12的聚合合併目前尚未實現。

除了合併,排序也能使得相似的指令在相鄰時間繪製,提升CPU和GPU的快取命中,減少呼叫指令數量。

  • 動態例項化

為了合併兩個Draw Call,它們必須擁有一致的shader繫結(FMeshDrawCommand::MatchesForDynamicInstancing返回true)。

當前只有快取的網格繪製命令才會被動態例項化,並且受FLocalVertexFactory是否支援快取的限制。另外,有一些特殊的情況也會阻止合併:

  • Lightmap產生了很小的紋理(可調整DefaultEngine.iniMaxLightmapRadius 引數)。
  • 逐元件的頂點顏色。
  • SpeedTree帶風節點。

使用控制檯命令r.MeshDrawCommands.LogDynamicInstancingStats 1可探查動態例項的效益。

  • 並行繪製

大多數的網格繪製任務不是在渲染執行緒中執行的,而是由TaskGraph系統並行觸發。並行部分有Pass的Content設定,動態指令生成/排序/合併等。

並行的數量由執行裝置的CPU核心數量決定,並行開啟之後,存在Join階段,以等待並行的所有執行緒都執行完畢(FSceneRenderer::WaitForTasksClearSnapshotsAndDeleteSceneRenderer開啟並行繪製等待)。

  • 快取繪製指令

UE為了提升快取的比例和效率,分離了動態和靜態物體的繪製,分別形成動態繪製路徑和靜態繪製路徑,而靜態繪製路徑可以在圖元加入場景時就快取FMeshBatch和FMeshDrawCommand,這樣就達成了一次生成多次繪製帶來的高效益。

  • 提升快取命中率

CPU或GPU的快取都具體時間區域性性和空間區域性性原則。時間區域性性意味著最近訪問的資料如果再次被訪問,則快取命中的概率較大;空間區域性性意味著當前在處理的資料的相鄰資料被快取命中的概率較大,還包含預讀取(prefetch)命中率。

UE通過以下手段來提升快取命中率:

  • 基於資料驅動的設計,而非物件導向的設計。

    • 如FMeshDrawCommand的結構設計。
  • 連續儲存資料。

    • 使用TChunkedArray儲存FMeshDrawCommand。
  • 記憶體對齊。

    • 使用定製的記憶體對齊器和記憶體分配器。
  • 輕量化資料結構。

  • 連續存取資料。

    • 連續遍歷繪製指令。
  • 繪製指令排序。

    • 使相似的指令排在一起,充分利用快取的時間區域性性。

3.4.2 除錯控制檯變數

下面列出並行繪製相關的控制檯命令,以便動態設定或除錯其效能和行為:

控制變數 解析
r.MeshDrawCommands.ParallelPassSetup 開關mesh draw command並行處理Pass。
r.MeshDrawCommands.UseCachedCommands 開關繪製命令快取。
r.MeshDrawCommands.DynamicInstancing 開關動態例項化。
r.MeshDrawCommands.LogDynamicInstancingStats 輸出動態例項化的資料,常用於查探動態例項化的效益。
r.RHICmdBasePassDeferredContexts 開關base pass的並行繪製。

3.4.3 侷限性

UE目前存在的模型繪製路徑,引入了很多步驟和概念,這樣的目的就是儘可能地提升渲染效率。但這樣的做並不是只有好處而沒有壞處,正所謂天下沒有免費的午餐。總的來說,這樣的渲染機制存在以下一些弊端:

  • 系統顯得龐大且複雜,增加初學者的學習成本。
  • 增加重構和擴充套件成本,譬如無法很快捷地實現多Pass的繪製或者增加一個指定的Pass,必須得深入理解/熟悉/修改引擎底層原始碼才能實現。
  • UE這種重度的繪製管線封裝具有一定的基礎消耗,對於簡單的應用場景,效能上可能反而沒有那些未做封裝的渲染引擎好。

兩權相利取其重,這是UE長期權衡取捨和改進的結果。

但這個繪製管線是面向未來,迎合諸如虛擬化紋理和幾何體、RGD、GPU Driven Rendering Pipeline和實時光線追蹤的技術。

3.4.4 本篇作業

前兩篇沒有佈置作業,本篇開始佈置一些小作業,以便讀者們加深理解和掌握UE的渲染體系。本篇的小作業如下:

  • 簡潔地複述模型繪製管線的過程和設計概念及其作用。

  • 請闡述目前的模型繪製管線有哪些可優化的邏輯。

  • 增加一個可以繪製任意個材質的Mesh Component。

  • 增加一個專用Pass,用以繪製半透明和Masked物體的深度。

以上皆屬於開放性題目,沒有標準答案,有思路的同學歡迎在評論區回覆,筆者會盡量回復。

 

特別說明

  • 感謝所有參考文獻的作者,部分圖片來自參考文獻和網路,侵刪。
  • 本系列文章為筆者原創,只發表在部落格園上,歡迎分享本文連結,但未經同意,不允許轉載!
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目

 

參考文獻

相關文章