11.1 本篇概述
RDG全稱是Rendering Dependency Graph,意為渲染依賴性圖表,是UE4.22開始引進的全新的渲染子系統,基於有向無環圖(Directed Acyclic Graph,DAG)的排程系統,用於執行渲染管線的整幀優化。
它利用現代的圖形API(DirectX 12、Vulkan和Metal 2),實現自動非同步計算排程以及更高效的記憶體管理和屏障管理來提升效能。
傳統的圖形API(DirectX 11、OpenGL)要求驅動器呼叫複雜的啟發法,以確定何時以及如何在GPU上執行關鍵的排程操作。例如清空快取,管理和再使用記憶體,執行佈局轉換等等。由於介面存在即時模式特性,因此需要複雜的記錄和狀態跟蹤才能處理各種極端情況。這些情況最終會對效能產生負面影響,並阻礙並行。
現代的圖形API(DirectX 12、Vulkan和Metal 2)與傳統圖形API不同,將低階GPU管理的負擔轉移到應用程式。這使得應用程式可以利用渲染管線的高階情境來驅動排程,從而提高效能並且簡化渲染堆疊。
RDG的理念不在GPU上立即執行Pass,而是先收集所有需要渲染的Pass,然後按照依賴的順序對圖表進行編譯和執行,期間會執行各類裁剪和優化。
依賴性圖表資料結構的整幀認知與現代圖形API的能力相結合,使RDG能夠在後臺執行復雜的排程任務:
- 執行非同步計算通道的自動排程和隔離。
- 在幀的不相交間隔期間,使資源之間的別名記憶體保持活躍狀態。
- 儘早啟動屏障和佈局轉換,避免管線延遲。
此外,RDG利用依賴性圖表在通道設定期間提供豐富的驗證,對影響功能和效能的問題進行自動捕捉,從而改進開發流程。
RDG並非UE獨創的概念和技術,早在2017年的GDC中,寒霜就已經實現並應用了Frame Graph(幀圖)的技術。Frame Graph旨在將引擎的各類渲染功能(Feature)和上層渲染邏輯(Renderer)和下層資源(Shader、RenderContext、圖形API等)隔離開來,以便做進一步的解耦、優化,其中最重要的就是多執行緒和並行渲染。
FrameGraph是高層級的Render Pass和資源的代表,包含了一幀中所用到的所有資訊。Pass之間可以指定順序和依賴關係,下圖是其中的一個示例:
寒霜引擎採用幀圖方式實現的延遲渲染的順序和依賴圖。
可以毫不誇張地說,UE的RDG正是基於Frame Graph之上定製和實現而成的。到了UE4.26,RDG已經被大量普及,包含場景渲染、後處理、光追等等模組都使用了RDG代替原本直接呼叫RHI命令的方式。
本篇主要闡述UE RDG的以下內容:
- RDG的基本概念和型別。
- RDG的使用方法。
- RDG的內部機制和原理。
11.2 RDG基礎
本章先闡述RDG涉及的主要型別、概念、介面等。
11.2.1 RDG基礎型別
RDG基礎型別和介面主要集中於RenderGraphUtils.h和RenderGraphDefinitions.h之中。部分解析如下:
// Engine\Source\Runtime\RenderCore\Public\RenderGraphDefinitions.h
// RDG Pass型別.
enum class ERDGPassFlags : uint8
{
None = 0, // 用於無引數的AddPass函式.
Raster = 1 << 0, // Pass在圖形管道上使用光柵化.
Compute = 1 << 1, // Pass在圖形管道上使用compute.
AsyncCompute = 1 << 2, // Pass在非同步計算管道上使用compute
Copy = 1 << 3, // Pass在圖形管道上使用複製命令.
NeverCull = 1 << 4, // 不被裁剪優化, 用於特殊pass.
SkipRenderPass = 1 << 5, // 忽略BeginRenderPass/EndRenderPass, 留給使用者去呼叫. 只在Raster繫結時有效. 將禁用Pass合併.
UntrackedAccess = 1 << 6, // Pass訪問原始的RHI資源,這些資源可能被註冊到RDG中,但所有資源都保持在當前狀態. 此標誌阻止圖形排程跨越通道的分割障礙。任何拆分都被延遲到pass執行之後。資源可能不會在pass執行過程中改變狀態。影響效能的屏障。不能與AsyncCompute組合。
Readback = Copy | NeverCull, // Pass使用複製命令,但寫入暫存資源(staging resource).
CommandMask = Raster | Compute | AsyncCompute | Copy, // 標誌掩碼,表示提交給pass的RHI命令的型別.
ScopeMask = NeverCull | UntrackedAccess // 可由傳遞標誌作用域使用的標誌掩碼
};
// Buffer標記.
enum class ERDGBufferFlags : uint8
{
None = 0, // 無標記.
MultiFrame = 1 << 0 // 存續於多幀.
};
// 紋理標記.
enum class ERDGTextureFlags : uint8
{
None = 0,
MultiFrame = 1 << 0, // 存續於多幀.
MaintainCompression = 1 << 1, // 防止在此紋理上解壓後設資料.
};
// UAV標記.
enum class ERDGUnorderedAccessViewFlags : uint8
{
None = 0,
SkipBarrier = 1 << 0 // 忽略屏障.
};
// 父資源型別.
enum class ERDGParentResourceType : uint8
{
Texture,
Buffer,
MAX
};
// 檢視型別.
enum class ERDGViewType : uint8
{
TextureUAV, // 紋理UAV(用於寫入資料)
TextureSRV, // 紋理SRV(用於讀取資料)
BufferUAV, // 緩衝UAV(用於寫入資料)
BufferSRV, // 緩衝SRV(用於讀取資料)
MAX
};
// 用於在建立檢視時指定紋理後設資料平面
enum class ERDGTextureMetaDataAccess : uint8
{
None = 0, // 主平面預設壓縮使用.
CompressedSurface, // 主平面不壓縮使用.
Depth, // 深度平面預設壓縮使用.
Stencil, // 模板平面預設壓縮使用.
HTile, // HTile平面.
FMask, // FMask平面.
CMask // CMask平面.
};
// 簡單的C++物件分配器, 用MemStack分配器追蹤和銷燬物體.
class FRDGAllocator final
{
public:
FRDGAllocator();
~FRDGAllocator();
// 分配原始記憶體.
FORCEINLINE void* Alloc(uint32 SizeInBytes, uint32 AlignInBytes)
{
return MemStack.Alloc(SizeInBytes, AlignInBytes);
}
// 分配POD記憶體而不跟蹤解構函式.
template <typename PODType>
FORCEINLINE PODType* AllocPOD()
{
return reinterpret_cast<PODType*>(Alloc(sizeof(PODType), alignof(PODType)));
}
// 帶析構追蹤的C++物件分配.
template <typename ObjectType, typename... TArgs>
FORCEINLINE ObjectType* AllocObject(TArgs&&... Args)
{
TTrackedAlloc<ObjectType>* TrackedAlloc = new(MemStack) TTrackedAlloc<ObjectType>(Forward<TArgs&&>(Args)...);
check(TrackedAlloc);
TrackedAllocs.Add(TrackedAlloc);
return TrackedAlloc->Get();
}
// 不帶析構追蹤的C++物件分配. (危險, 慎用)
template <typename ObjectType, typename... TArgs>
FORCEINLINE ObjectType* AllocNoDestruct(TArgs&&... Args)
{
return new (MemStack) ObjectType(Forward<TArgs&&>(Args)...);
}
// 釋放全部已分配的記憶體.
void ReleaseAll();
private:
class FTrackedAlloc
{
public:
virtual ~FTrackedAlloc() = default;
};
template <typename ObjectType>
class TTrackedAlloc : public FTrackedAlloc
{
public:
template <typename... TArgs>
FORCEINLINE TTrackedAlloc(TArgs&&... Args) : Object(Forward<TArgs&&>(Args)...) {}
FORCEINLINE ObjectType* Get() { return &Object; }
private:
ObjectType Object;
};
// 分配器.
FMemStackBase MemStack;
// 所有已分配的物件.
TArray<FTrackedAlloc*, SceneRenderingAllocator> TrackedAllocs;
};
// Engine\Source\Runtime\RenderCore\Public\RenderGraphUtils.h
// 清理未使用的資源.
extern RENDERCORE_API void ClearUnusedGraphResourcesImpl(const FShaderParameterBindings& ShaderBindings, ...);
(......)
// 註冊外部紋理, 可附帶備用例項.
FRDGTextureRef RegisterExternalTextureWithFallback(FRDGBuilder& GraphBuilder, ...);
inline FRDGTextureRef TryRegisterExternalTexture(FRDGBuilder& GraphBuilder, ...);
inline FRDGBufferRef TryRegisterExternalBuffer(FRDGBuilder& GraphBuilder, ...);
// 計算著色器的工具類.
struct RENDERCORE_API FComputeShaderUtils
{
// 理想的組大小為8x8,在GCN上至少佔據一個wave,在Nvidia上佔據兩個warp.
static constexpr int32 kGolden2DGroupSize = 8;
static FIntVector GetGroupCount(const int32 ThreadCount, const int32 GroupSize);
// 派發計算著色器到RHI命令列表, 攜帶其引數.
template<typename TShaderClass>
static void Dispatch(FRHIComputeCommandList& RHICmdList, const TShaderRef<TShaderClass>& ComputeShader, const typename TShaderClass::FParameters& Parameters, FIntVector GroupCount);
// 派發非直接的計算著色器到RHI命令列表, 攜帶其引數.
template<typename TShaderClass>
static void DispatchIndirect(FRHIComputeCommandList& RHICmdList, const TShaderRef<TShaderClass>& ComputeShader, const typename TShaderClass::FParameters& Parameters, FRHIVertexBuffer* IndirectArgsBuffer, uint32 IndirectArgOffset);
// 派發計算著色器到render graph builder, 攜帶其引數.
template<typename TShaderClass>
static void AddPass(FRDGBuilder& GraphBuilder,FRDGEventName&& PassName,ERDGPassFlags PassFlags,const TShaderRef<TShaderClass>& ComputeShader,typename TShaderClass::FParameters* Parameters,FIntVector GroupCount);
(......)
// 清理UAV.
static void ClearUAV(FRDGBuilder& GraphBuilder, FGlobalShaderMap* ShaderMap, FRDGBufferUAVRef UAV, uint32 ClearValue);
static void ClearUAV(FRDGBuilder& GraphBuilder, FGlobalShaderMap* ShaderMap, FRDGBufferUAVRef UAV, FVector4 ClearValue);
};
// 增加拷貝紋理Pass.
void AddCopyTexturePass(FRDGBuilder& GraphBuilder, FRDGTextureRef InputTexture, FRDGTextureRef OutputTexture, const FRHICopyTextureInfo& CopyInfo);
(......)
// 增加拷貝到解析目標的Pass.
void AddCopyToResolveTargetPass(FRDGBuilder& GraphBuilder, FRDGTextureRef InputTexture, FRDGTextureRef OutputTexture, const FResolveParams& ResolveParams);
// 清理各類資源的Pass.
void AddClearUAVPass(FRDGBuilder& GraphBuilder, FRDGBufferUAVRef BufferUAV, uint32 Value);
void AddClearUAVFloatPass(FRDGBuilder& GraphBuilder, FRDGBufferUAVRef BufferUAV, float Value);
void AddClearUAVPass(FRDGBuilder& GraphBuilder, FRDGTextureUAVRef TextureUAV, const FUintVector4& ClearValues);
void AddClearRenderTargetPass(FRDGBuilder& GraphBuilder, FRDGTextureRef Texture);
void AddClearDepthStencilPass(FRDGBuilder& GraphBuilder,FRDGTextureRef Texture,bool bClearDepth,float Depth,bool bClearStencil,uint8 Stencil);
void AddClearStencilPass(FRDGBuilder& GraphBuilder, FRDGTextureRef Texture);
(......)
// 增加回讀紋理的Pass.
void AddEnqueueCopyPass(FRDGBuilder& GraphBuilder, FRHIGPUTextureReadback* Readback, FRDGTextureRef SourceTexture, FResolveRect Rect = FResolveRect());
// 增加回讀緩衝區的Pass.
void AddEnqueueCopyPass(FRDGBuilder& GraphBuilder, FRHIGPUBufferReadback* Readback, FRDGBufferRef SourceBuffer, uint32 NumBytes);
// 建立資源.
FRDGBufferRef CreateStructuredBuffer(FRDGBuilder& GraphBuilder, ...);
FRDGBufferRef CreateVertexBuffer(FRDGBuilder& GraphBuilder, ...);
// 無引數的Pass增加.
template <typename ExecuteLambdaType>
void AddPass(FRDGBuilder& GraphBuilder, FRDGEventName&& Name, ExecuteLambdaType&& ExecuteLambda);
template <typename ExecuteLambdaType>
void AddPass(FRDGBuilder& GraphBuilder, ExecuteLambdaType&& ExecuteLambda);
// 其它特殊Pass
void AddBeginUAVOverlapPass(FRDGBuilder& GraphBuilder);
void AddEndUAVOverlapPass(FRDGBuilder& GraphBuilder);
(......)
11.2.2 RDG資源
RDG資源並不是直接用RHI資源,而是包裹了RHI資源引用,然後針對不同型別的資源各自封裝,且增加了額外的資訊。部分RDG的定義如下:
// Engine\Source\Runtime\RenderCore\Public\RenderGraphResources.h
class FRDGResource
{
public:
// 刪除拷貝建構函式.
FRDGResource(const FRDGResource&) = delete;
virtual ~FRDGResource() = default;
//////////////////////////////////////////////////////////////////////////
// 下面的介面只能被RDG的Pass執行期間呼叫.
// 標記此資源是否被使用, 若非, 則會被清理掉.
#if RDG_ENABLE_DEBUG
virtual void MarkResourceAsUsed();
#else
inline void MarkResourceAsUsed() {}
#endif
// 獲取RDG的RHI資源引用.
FRHIResource* GetRHI() const
{
ValidateRHIAccess();
return ResourceRHI;
}
//////////////////////////////////////////////////////////////////////////
protected:
FRDGResource(const TCHAR* InName);
// 將此資源分配為RHI資源的簡單直通容器.
void SetPassthroughRHI(FRHIResource* InResourceRHI)
{
ResourceRHI = InResourceRHI;
#if RDG_ENABLE_DEBUG
DebugData.bAllowRHIAccess = true;
DebugData.bPassthrough = true;
#endif
}
bool IsPassthrough() const
{
#if RDG_ENABLE_DEBUG
return DebugData.bPassthrough;
#else
return false;
#endif
}
/** Verify that the RHI resource can be accessed at a pass execution. */
void ValidateRHIAccess() const
{
#if RDG_ENABLE_DEBUG
checkf(DebugData.bAllowRHIAccess,
TEXT("Accessing the RHI resource of %s at this time is not allowed. If you hit this check in pass, ")
TEXT("that is due to this resource not being referenced in the parameters of your pass."),
Name);
#endif
}
FRHIResource* GetRHIUnchecked() const
{
return ResourceRHI;
}
// RHI資源引用.
FRHIResource* ResourceRHI = nullptr;
private:
// 除錯資訊.
#if RDG_ENABLE_DEBUG
class FDebugData
{
private:
// 在執行時跟蹤資源是否被pass的lambda實際使用,以檢測對pass不必要的資源依賴.
bool bIsActuallyUsedByPass = false;
// 追蹤Pass執行期間, 底層的RHI自已是否允許被訪問.
bool bAllowRHIAccess = false;
// 如果為true,則該資源不附加到任何構建器,而是作為一個虛擬容器存在,用於將程式碼暫存到RDG.
bool bPassthrough = false;
} DebugData;
#endif
};
class FRDGUniformBuffer : public FRDGResource
{
public:
// 獲取RHI.
FRHIUniformBuffer* GetRHI() const
{
return static_cast<FRHIUniformBuffer*>(FRDGResource::GetRHI());
}
(......)
protected:
template <typename TParameterStruct>
explicit FRDGUniformBuffer(TParameterStruct* InParameters, const TCHAR* InName)
: FRDGResource(InName)
, ParameterStruct(InParameters)
, bGlobal(ParameterStruct.HasStaticSlot());
private:
// 引數結構體.
const FRDGParameterStruct ParameterStruct;
// RHI資源.
TRefCountPtr<FRHIUniformBuffer> UniformBufferRHI;
// RDG控制程式碼.
FRDGUniformBufferHandle Handle;
// 全域性繫結或區域性繫結.
uint8 bGlobal : 1;
};
// RDGUniformBuffer模板類.
template <typename ParameterStructType>
class TRDGUniformBuffer : public FRDGUniformBuffer
{
public:
const TRDGParameterStruct<ParameterStructType>& GetParameters() const;
TUniformBufferRef<ParameterStructType> GetRHIRef() const;
const ParameterStructType* operator->() const;
(......)
};
// 一種由圖跟蹤分配生命週期的渲染圖資源。可能有引用它的子資源(例如檢視)
class FRDGParentResource : public FRDGResource
{
public:
// 父資源型別.
const ERDGParentResourceType Type;
bool IsExternal() const;
protected:
FRDGParentResource(const TCHAR* InName, ERDGParentResourceType InType);
// 是否外部資源.
uint8 bExternal : 1;
// 是否被提取的資源.
uint8 bExtracted : 1;
// 此資源是否需要acquire / discard.
uint8 bTransient : 1;
// 是否最後的擁有者分配的.
uint8 bLastOwner : 1;
// 將被裁剪.
uint8 bCulled : 1;
// 是否被非同步計算Pass使用.
uint8 bUsedByAsyncComputePass : 1;
private:
// 引用數量.
uint16 ReferenceCount = 0;
// 使用者分配的資源的初始和最終狀態(如果已知)
ERHIAccess AccessInitial = ERHIAccess::Unknown;
ERHIAccess AccessFinal = ERHIAccess::Unknown;
FRDGPassHandle AcquirePass;
FRDGPassHandle FirstPass;
FRDGPassHandle LastPass;
(......)
};
// 建立渲染紋理的描述資訊.
struct RENDERCORE_API FRDGTextureDesc
{
static FRDGTextureDesc Create2D(...);
static FRDGTextureDesc Create2DArray(...);
static FRDGTextureDesc Create3D(...);
static FRDGTextureDesc CreateCube(...);
static FRDGTextureDesc CreateCubeArray(...);
bool IsTexture2D() const;
bool IsTexture3D() const;
bool IsTextureCube() const;
bool IsTextureArray() const;
bool IsMipChain() const;
bool IsMultisample() const;
FIntVector GetSize() const;
// 子資源的佈局.
FRDGTextureSubresourceLayout GetSubresourceLayout() const;
bool IsValid() const;
// 清理值.
FClearValueBinding ClearValue;
ETextureDimension Dimension = ETextureDimension::Texture2D;
// 清理標記.
ETextureCreateFlags Flags = TexCreate_None;
// 畫素格式.
EPixelFormat Format = PF_Unknown;
// 紋理在x和y中的範圍
FIntPoint Extent = FIntPoint(1, 1);
// 3D紋理的深度.
uint16 Depth = 1;
uint16 ArraySize = 1;
// 紋理層級數.
uint8 NumMips = 1;
// 取樣數.
uint8 NumSamples = 1;
};
// 將池內的RT描述轉成RDG紋理描述.
inline FRDGTextureDesc Translate(const FPooledRenderTargetDesc& InDesc, ERenderTargetTexture InTexture = ERenderTargetTexture::Targetable);
// 將RDG紋理描述轉成池內的RT描述.
inline FPooledRenderTargetDesc Translate(const FRDGTextureDesc& InDesc);
// 池內的紋理.
class RENDERCORE_API FRDGPooledTexture
{
public:
// 描述.
const FRDGTextureDesc Desc;
// 引用計數.
uint32 GetRefCount() const;
uint32 AddRef() const;
uint32 Release() const;
private:
FRDGPooledTexture(FRHITexture* InTexture, const FRDGTextureDesc& InDesc, const FUnorderedAccessViewRHIRef& FirstMipUAV);
// 初始化快取的UAV.
void InitViews(const FUnorderedAccessViewRHIRef& FirstMipUAV);
void Finalize();
void Reset();
// 對應的RHI紋理.
FRHITexture* Texture = nullptr;
// 所在的紋理物件.
FRDGTexture* Owner = nullptr;
// 子資源佈局.
FRDGTextureSubresourceLayout Layout;
// 子資源狀態.
FRDGTextureSubresourceState State;
// 為RHI紋理快取的UAV/SRV.
TArray<FUnorderedAccessViewRHIRef, TInlineAllocator<1>> MipUAVs;
TArray<TPair<FRHITextureSRVCreateInfo, FShaderResourceViewRHIRef>, TInlineAllocator<1>> SRVs;
FUnorderedAccessViewRHIRef HTileUAV;
FShaderResourceViewRHIRef HTileSRV;
FUnorderedAccessViewRHIRef StencilUAV;
FShaderResourceViewRHIRef FMaskSRV;
FShaderResourceViewRHIRef CMaskSRV;
mutable uint32 RefCount = 0;
};
// RDG紋理.
class RENDERCORE_API FRDGTexture final : public FRDGParentResource
{
public:
// 為還未傳到RDG的Pass建立一個適用於用RDG引數填充RHI統一緩衝區的直通紋理.
static FRDGTextureRef GetPassthrough(const TRefCountPtr<IPooledRenderTarget>& PooledRenderTarget);
// 描述和標記.
const FRDGTextureDesc Desc;
const ERDGTextureFlags Flags;
//////////////////////////////////////////////////////////////////////////
//! The following methods may only be called during pass execution.
IPooledRenderTarget* GetPooledRenderTarget() const
FRHITexture* GetRHI() const
//////////////////////////////////////////////////////////////////////////
FRDGTextureSubresourceLayout GetSubresourceLayout() const;
FRDGTextureSubresourceRange GetSubresourceRange() const;
FRDGTextureSubresourceRange GetSubresourceRangeSRV() const;
private:
FRDGTexture(const TCHAR* InName, const FRDGTextureDesc& InDesc, ERDGTextureFlags InFlags, ERenderTargetTexture InRenderTargetTexture);
void SetRHI(FPooledRenderTarget* PooledRenderTarget, FRDGTextureRef& OutPreviousOwner);
void Finalize();
FRHITexture* GetRHIUnchecked() const;
bool IsLastOwner() const;
FRDGTextureSubresourceState& GetState();
const ERenderTargetTexture RenderTargetTexture;
// 用於促進子資源轉換的佈局.
FRDGTextureSubresourceLayout Layout;
// 在執行期間擁有PooledTexture分配的下一個紋理.
FRDGTextureHandle NextOwner;
// 已註冊到構建器的控制程式碼.
FRDGTextureHandle Handle;
// 池內紋理.
IPooledRenderTarget* PooledRenderTarget = nullptr;
FRDGPooledTexture* PooledTexture = nullptr;
// 從池紋理快取的狀態指標
FRDGTextureSubresourceState* State = nullptr;
// 當持有強引用時嚴格有效,
TRefCountPtr<IPooledRenderTarget> Allocation;
// 在構建圖時跟蹤合併的子資源狀態
FRDGTextureTransientSubresourceStateIndirect MergeState;
// 在圖的構建過程中,追蹤傳遞每個子資源的生產者.
TRDGTextureSubresourceArray<FRDGPassHandle> LastProducers;
};
// 池化的緩衝區.
class RENDERCORE_API FRDGPooledBuffer
{
public:
const FRDGBufferDesc Desc;
FRHIUnorderedAccessView* GetOrCreateUAV(FRDGBufferUAVDesc UAVDesc);
FRHIShaderResourceView* GetOrCreateSRV(FRDGBufferSRVDesc SRVDesc);
FRHIVertexBuffer* GetVertexBufferRHI() const;
FRHIIndexBuffer* GetIndexBufferRHI() const;
FRHIStructuredBuffer* GetStructuredBufferRHI() const;
uint32 GetRefCount() const;
uint32 AddRef() const;
uint32 Release() const;
(......)
private:
FRDGPooledBuffer(const FRDGBufferDesc& InDesc);
// 頂點/索引/結構體緩衝.
FVertexBufferRHIRef VertexBuffer;
FIndexBufferRHIRef IndexBuffer;
FStructuredBufferRHIRef StructuredBuffer;
// UAV/SRV.
TMap<FRDGBufferUAVDesc, FUnorderedAccessViewRHIRef, FDefaultSetAllocator, TUAVFuncs<FRDGBufferUAVDesc, FUnorderedAccessViewRHIRef>> UAVs;
TMap<FRDGBufferSRVDesc, FShaderResourceViewRHIRef, FDefaultSetAllocator, TSRVFuncs<FRDGBufferSRVDesc, FShaderResourceViewRHIRef>> SRVs;
void Reset();
void Finalize();
const TCHAR* Name = nullptr;
// 擁有者.
FRDGBufferRef Owner = nullptr;
FRDGSubresourceState State;
mutable uint32 RefCount = 0;
uint32 LastUsedFrame = 0;
};
// 渲染圖追蹤的緩衝區.
class RENDERCORE_API FRDGBuffer final : public FRDGParentResource
{
public:
const FRDGBufferDesc Desc;
const ERDGBufferFlags Flags;
//////////////////////////////////////////////////////////////////////////
//! The following methods may only be called during pass execution.
// 獲取RHI資源.
FRHIVertexBuffer* GetIndirectRHICallBuffer() const
FRHIVertexBuffer* GetRHIVertexBuffer() const
FRHIStructuredBuffer* GetRHIStructuredBuffer() const
//////////////////////////////////////////////////////////////////////////
private:
FRDGBuffer(const TCHAR* InName, const FRDGBufferDesc& InDesc, ERDGBufferFlags InFlags);
// 設定RHI資源.
void SetRHI(FRDGPooledBuffer* InPooledBuffer, FRDGBufferRef& OutPreviousOwner);
void Finalize();
FRDGSubresourceState& GetState() const
// RDG控制程式碼.
FRDGBufferHandle Handle;
// 最後處理此資源的人.
FRDGPassHandle LastProducer;
// 下一個擁有者.
FRDGBufferHandle NextOwner;
// 賦予的池化緩衝區.
FRDGPooledBuffer* PooledBuffer = nullptr;
// 子資源狀態.
FRDGSubresourceState* State = nullptr;
TRefCountPtr<FRDGPooledBuffer> Allocation;
FRDGSubresourceState* MergeState = nullptr;
};
(......)
在RDG系統中,基本上對所有的RHI資源進行了封裝和包裹,以便進一步控制、管理RHI資源,精準控制它們的生命週期、引用關係及除錯資訊等,進一步可以優化、裁剪它們,提升渲染效能。
11.2.3 RDG Pass
RDG Pass模組涉及了屏障、資源轉換、RDGPass等概念:
// Engine\Source\Runtime\RHI\Public\RHI.h
// 用於表示RHI中掛起的資源轉換的不透明資料結構.
struct FRHITransition
{
public:
template <typename T>
inline T* GetPrivateData()
{
uintptr_t Addr = Align(uintptr_t(this + 1), GRHITransitionPrivateData_AlignInBytes);
return reinterpret_cast<T*>(Addr);
}
template <typename T>
inline const T* GetPrivateData() const
{
return const_cast<FRHITransition*>(this)->GetPrivateData<T>();
}
private:
FRHITransition(const FRHITransition&) = delete;
FRHITransition(FRHITransition&&) = delete;
FRHITransition(ERHIPipeline SrcPipelines, ERHIPipeline DstPipelines);
~FRHITransition();
// 獲取總的分配尺寸.
static uint64 GetTotalAllocationSize()
// 獲取對齊位元組數.
static uint64 GetAlignment();
// 開始標記.
inline void MarkBegin(ERHIPipeline Pipeline) const
{
int8 Mask = int8(Pipeline);
int8 PreviousValue = FPlatformAtomics::InterlockedAnd(&State, ~Mask);
if (PreviousValue == Mask)
{
Cleanup();
}
}
// 結束標記.
inline void MarkEnd(ERHIPipeline Pipeline) const
{
int8 Mask = int8(Pipeline) << int32(ERHIPipeline::Num);
int8 PreviousValue = FPlatformAtomics::InterlockedAnd(&State, ~Mask);
if (PreviousValue == Mask)
{
Cleanup();
}
}
// 清理轉換資源, 包含RHI轉換和分配的記憶體.
inline void Cleanup() const;
mutable int8 State;
#if DO_CHECK
mutable ERHIPipeline AllowedSrc;
mutable ERHIPipeline AllowedDst;
#endif
#if ENABLE_RHI_VALIDATION
// 柵欄.
RHIValidation::FFence* Fence = nullptr;
// 掛起的開始操作.
RHIValidation::FOperationsList PendingOperationsBegin;
// 掛起的結束操作.
RHIValidation::FOperationsList PendingOperationsEnd;
#endif
};
// Engine\Source\Runtime\RenderCore\Public\RenderGraphPass.h
// RDG屏障批
class RENDERCORE_API FRDGBarrierBatch
{
public:
FRDGBarrierBatch(const FRDGBarrierBatch&) = delete;
bool IsSubmitted() const
FString GetName() const;
protected:
FRDGBarrierBatch(const FRDGPass* InPass, const TCHAR* InName);
void SetSubmitted();
ERHIPipeline GetPipeline() const
private:
bool bSubmitted = false;
// Graphics或AsyncCompute
ERHIPipeline Pipeline;
#if RDG_ENABLE_DEBUG
const FRDGPass* Pass;
const TCHAR* Name;
#endif
};
// 屏障批開始
class RENDERCORE_API FRDGBarrierBatchBegin final : public FRDGBarrierBatch
{
public:
FRDGBarrierBatchBegin(const FRDGPass* InPass, const TCHAR* InName, TOptional<ERHIPipeline> InOverridePipelineForEnd = {});
~FRDGBarrierBatchBegin();
// 增加資源轉換到批次.
void AddTransition(FRDGParentResourceRef Resource, const FRHITransitionInfo& Info);
const FRHITransition* GetTransition() const;
bool IsTransitionValid() const;
void SetUseCrossPipelineFence();
// 提交屏障/資源轉換.
void Submit(FRHIComputeCommandList& RHICmdList);
private:
TOptional<ERHIPipeline> OverridePipelineToEnd;
bool bUseCrossPipelineFence = false;
// 提交後儲存的資源轉換, 它在結束批處理時被賦回null.
const FRHITransition* Transition = nullptr;
// 要執行的非同步資源轉換陣列.
TArray<FRHITransitionInfo, TInlineAllocator<1, SceneRenderingAllocator>> Transitions;
#if RDG_ENABLE_DEBUG
// 與Transitions陣列匹配的RDG資源陣列, 僅供除錯.
TArray<FRDGParentResource*, SceneRenderingAllocator> Resources;
#endif
};
// 屏障批結束
class RENDERCORE_API FRDGBarrierBatchEnd final : public FRDGBarrierBatch
{
public:
FRDGBarrierBatchEnd(const FRDGPass* InPass, const TCHAR* InName);
~FRDGBarrierBatchEnd();
// 預留記憶體.
void ReserveMemory(uint32 ExpectedDependencyCount);
// 在開始批處理上插入依賴項, 開始批可以插入多個結束批.
void AddDependency(FRDGBarrierBatchBegin* BeginBatch);
// 提交資源轉換.
void Submit(FRHIComputeCommandList& RHICmdList);
private:
// 此結束批完成後可以喚起的開始批轉換.
TArray<FRDGBarrierBatchBegin*, TInlineAllocator<1, SceneRenderingAllocator>> Dependencies;
};
// RGD通道基礎類.
class RENDERCORE_API FRDGPass
{
public:
FRDGPass(FRDGEventName&& InName, FRDGParameterStruct InParameterStruct, ERDGPassFlags InFlags);
FRDGPass(const FRDGPass&) = delete;
virtual ~FRDGPass() = default;
// 通道資料介面.
const TCHAR* GetName() const;
FORCEINLINE const FRDGEventName& GetEventName() const;
FORCEINLINE ERDGPassFlags GetFlags() const;
FORCEINLINE ERHIPipeline GetPipeline() const;
// RDG Pass引數.
FORCEINLINE FRDGParameterStruct GetParameters() const;
FORCEINLINE FRDGPassHandle GetHandle() const;
bool IsMergedRenderPassBegin() const;
bool IsMergedRenderPassEnd() const;
bool SkipRenderPassBegin() const;
bool SkipRenderPassEnd() const;
bool IsAsyncCompute() const;
bool IsAsyncComputeBegin() const;
bool IsAsyncComputeEnd() const;
bool IsGraphicsFork() const;
bool IsGraphicsJoin() const;
// 生產者控制程式碼.
const FRDGPassHandleArray& GetProducers() const;
// 跨管線生產者.
FRDGPassHandle GetCrossPipelineProducer() const;
// 跨管線消費者.
FRDGPassHandle GetCrossPipelineConsumer() const;
// 分叉Pass.
FRDGPassHandle GetGraphicsForkPass() const;
// 合併Pass.
FRDGPassHandle GetGraphicsJoinPass() const;
#if RDG_CPU_SCOPES
FRDGCPUScopes GetCPUScopes() const;
#endif
#if RDG_GPU_SCOPES
FRDGGPUScopes GetGPUScopes() const;
#endif
private:
// 前序屏障.
FRDGBarrierBatchBegin& GetPrologueBarriersToBegin(FRDGAllocator& Allocator);
FRDGBarrierBatchEnd& GetPrologueBarriersToEnd(FRDGAllocator& Allocator);
// 後序屏障.
FRDGBarrierBatchBegin& GetEpilogueBarriersToBeginForGraphics(FRDGAllocator& Allocator);
FRDGBarrierBatchBegin& GetEpilogueBarriersToBeginForAsyncCompute(FRDGAllocator& Allocator);
FRDGBarrierBatchBegin& GetEpilogueBarriersToBeginFor(FRDGAllocator& Allocator, ERHIPipeline PipelineForEnd);
//////////////////////////////////////////////////////////////////////////
//! User Methods to Override
// 執行實現.
virtual void ExecuteImpl(FRHIComputeCommandList& RHICmdList) = 0;
//////////////////////////////////////////////////////////////////////////
// 執行.
void Execute(FRHIComputeCommandList& RHICmdList);
// Pass資料.
const FRDGEventName Name;
const FRDGParameterStruct ParameterStruct;
const ERDGPassFlags Flags;
const ERHIPipeline Pipeline;
FRDGPassHandle Handle;
// Pass標記.
union
{
struct
{
uint32 bSkipRenderPassBegin : 1;
uint32 bSkipRenderPassEnd : 1;
uint32 bAsyncComputeBegin : 1;
uint32 bAsyncComputeEnd : 1;
uint32 bAsyncComputeEndExecute : 1;
uint32 bGraphicsFork : 1;
uint32 bGraphicsJoin : 1;
uint32 bUAVAccess : 1;
IF_RDG_ENABLE_DEBUG(uint32 bFirstTextureAllocated : 1);
};
uint32 PackedBits = 0;
};
// 最新的跨管道生產者的控制程式碼.
FRDGPassHandle CrossPipelineProducer;
// 最早的跨管線消費者的控制程式碼.
FRDGPassHandle CrossPipelineConsumer;
// (僅限AsyncCompute)Graphics pass,該通道是非同步計算間隔的fork / join.
FRDGPassHandle GraphicsForkPass;
FRDGPassHandle GraphicsJoinPass;
// 處理此通道的前序/後續屏障的通道.
FRDGPassHandle PrologueBarrierPass;
FRDGPassHandle EpilogueBarrierPass;
// 生產者Pass列表.
FRDGPassHandleArray Producers;
// 紋理狀態.
struct FTextureState
{
FRDGTextureTransientSubresourceState State;
FRDGTextureTransientSubresourceStateIndirect MergeState;
uint16 ReferenceCount = 0;
};
// 緩衝區狀態.
struct FBufferState
{
FRDGSubresourceState State;
FRDGSubresourceState* MergeState = nullptr;
uint16 ReferenceCount = 0;
};
// 將紋理/緩衝區對映到Pass中如何使用的資訊。
TSortedMap<FRDGTexture*, FTextureState, SceneRenderingAllocator> TextureStates;
TSortedMap<FRDGBuffer*, FBufferState, SceneRenderingAllocator> BufferStates;
// 在執行此Pass期間,計劃開始的Pass引數列表.
TArray<FRDGPass*, TInlineAllocator<1, SceneRenderingAllocator>> ResourcesToBegin;
TArray<FRDGPass*, TInlineAllocator<1, SceneRenderingAllocator>> ResourcesToEnd;
// 在acquire完成*之後*,*在丟棄*之前*獲取的紋理列表. 獲取適用於所有分配的紋理.
TArray<FRHITexture*, SceneRenderingAllocator> TexturesToAcquire;
// 在Pass完成*之後*,獲得(acquires)*之後*,丟棄的紋理列表. 丟棄僅適用於標記為瞬態(transient)的紋理,並且紋理的最後一個別名(alia)使用自動丟棄行為(為了支援更乾淨的切換到使用者或返回池).
TArray<FRHITexture*, SceneRenderingAllocator> TexturesToDiscard;
FRDGBarrierBatchBegin* PrologueBarriersToBegin = nullptr;
FRDGBarrierBatchEnd* PrologueBarriersToEnd = nullptr;
FRDGBarrierBatchBegin* EpilogueBarriersToBeginForGraphics = nullptr;
FRDGBarrierBatchBegin* EpilogueBarriersToBeginForAsyncCompute = nullptr;
EAsyncComputeBudget AsyncComputeBudget = EAsyncComputeBudget::EAll_4;
};
// RDG Pass Lambda執行函式.
template <typename ParameterStructType, typename ExecuteLambdaType>
class TRDGLambdaPass : public FRDGPass
{
(......)
TRDGLambdaPass(FRDGEventName&& InName, const ParameterStructType* InParameterStruct, ERDGPassFlags InPassFlags, ExecuteLambdaType&& InExecuteLambda);
private:
// 執行實現.
void ExecuteImpl(FRHIComputeCommandList& RHICmdList) override
{
check(!kSupportsRaster || RHICmdList.IsImmediate());
// 呼叫Lambda例項.
ExecuteLambda(static_cast<TRHICommandList&>(RHICmdList));
}
Lambda例項.
ExecuteLambdaType ExecuteLambda;
};
// 附帶空Lambda的Pass.
template <typename ExecuteLambdaType>
class TRDGEmptyLambdaPass : public TRDGLambdaPass<FEmptyShaderParameters, ExecuteLambdaType>
{
public:
TRDGEmptyLambdaPass(FRDGEventName&& InName, ERDGPassFlags InPassFlags, ExecuteLambdaType&& InExecuteLambda);
private:
FEmptyShaderParameters EmptyShaderParameters;
};
// 用於前序/後序Pass.
class FRDGSentinelPass final : public FRDGPass
{
public:
FRDGSentinelPass(FRDGEventName&& Name);
private:
void ExecuteImpl(FRHIComputeCommandList&) override;
FEmptyShaderParameters EmptyShaderParameters;
};
以上顯示RDG的Pass比較複雜,是RDG體系中最核心的型別之一,涉及了消費者、生產者、轉換依賴、各類資源狀態等等資料和處理。RDG的Pass有以下幾種型別:
RDG Pass和渲染Pass並非一一對應關係,有可能多個合併成一個渲染Pass,詳見後面章節。RDG Pass最複雜莫過於多執行緒處理、資源狀態轉換以及依賴處理,不過本節先不涉及,後續章節再詳細探討。
11.2.4 FRDGBuilder
FRDGBuilder是RDG體系的心臟和發動機,也是個大管家,負責收集渲染Pass和引數,編譯Pass、資料,處理資源依賴,裁剪和優化各類資料,還有提供執行介面。它的宣告如下:
class RENDERCORE_API FRDGBuilder
{
public:
FRDGBuilder(FRHICommandListImmediate& InRHICmdList, FRDGEventName InName = {}, const char* UnaccountedCSVStat = kDefaultUnaccountedCSVStat);
FRDGBuilder(const FRDGBuilder&) = delete;
// 查詢外部紋理, 若找不到返回null.
FRDGTextureRef FindExternalTexture(FRHITexture* Texture) const;
FRDGTextureRef FindExternalTexture(IPooledRenderTarget* ExternalPooledTexture, ERenderTargetTexture Texture) const;
// 註冊外部池內RT到RDG, 以便RDG追蹤之. 池內RT可能包含兩種RHI紋理: MSAA和非MSAA.
FRDGTextureRef RegisterExternalTexture(
const TRefCountPtr<IPooledRenderTarget>& ExternalPooledTexture,
ERenderTargetTexture Texture = ERenderTargetTexture::ShaderResource,
ERDGTextureFlags Flags = ERDGTextureFlags::None);
FRDGTextureRef RegisterExternalTexture(
const TRefCountPtr<IPooledRenderTarget>& ExternalPooledTexture,
const TCHAR* NameIfNotRegistered,
ERenderTargetTexture RenderTargetTexture = ERenderTargetTexture::ShaderResource,
ERDGTextureFlags Flags = ERDGTextureFlags::None);
// 註冊外部緩衝區到RDG, 以便RDG追蹤之.
FRDGBufferRef RegisterExternalBuffer(const TRefCountPtr<FRDGPooledBuffer>& ExternalPooledBuffer, ERDGBufferFlags Flags = ERDGBufferFlags::None);
FRDGBufferRef RegisterExternalBuffer(const TRefCountPtr<FRDGPooledBuffer>& ExternalPooledBuffer, ERDGBufferFlags Flags, ERHIAccess AccessFinal);
FRDGBufferRef RegisterExternalBuffer(
const TRefCountPtr<FRDGPooledBuffer>& ExternalPooledBuffer,
const TCHAR* NameIfNotRegistered,
ERDGBufferFlags Flags = ERDGBufferFlags::None);
// 資源建立介面.
FRDGTextureRef CreateTexture(const FRDGTextureDesc& Desc, const TCHAR* Name, ERDGTextureFlags Flags = ERDGTextureFlags::None);
FRDGBufferRef CreateBuffer(const FRDGBufferDesc& Desc, const TCHAR* Name, ERDGBufferFlags Flags = ERDGBufferFlags::None);
FRDGTextureSRVRef CreateSRV(const FRDGTextureSRVDesc& Desc);
FRDGBufferSRVRef CreateSRV(const FRDGBufferSRVDesc& Desc);
FORCEINLINE FRDGBufferSRVRef CreateSRV(FRDGBufferRef Buffer, EPixelFormat Format);
FRDGTextureUAVRef CreateUAV(const FRDGTextureUAVDesc& Desc, ERDGUnorderedAccessViewFlags Flags = ERDGUnorderedAccessViewFlags::None);
FORCEINLINE FRDGTextureUAVRef CreateUAV(FRDGTextureRef Texture, ERDGUnorderedAccessViewFlags Flags = ERDGUnorderedAccessViewFlags::None);
FRDGBufferUAVRef CreateUAV(const FRDGBufferUAVDesc& Desc, ERDGUnorderedAccessViewFlags Flags = ERDGUnorderedAccessViewFlags::None);
FORCEINLINE FRDGBufferUAVRef CreateUAV(FRDGBufferRef Buffer, EPixelFormat Format, ERDGUnorderedAccessViewFlags Flags = ERDGUnorderedAccessViewFlags::None);
template <typename ParameterStructType>
TRDGUniformBufferRef<ParameterStructType> CreateUniformBuffer(ParameterStructType* ParameterStruct);
// 分配記憶體, 記憶體由RDG管理生命週期.
void* Alloc(uint32 SizeInBytes, uint32 AlignInBytes);
template <typename PODType>
PODType* AllocPOD();
template <typename ObjectType, typename... TArgs>
ObjectType* AllocObject(TArgs&&... Args);
template <typename ParameterStructType>
ParameterStructType* AllocParameters();
// 增加附帶引數和Lambda的Pass.
template <typename ParameterStructType, typename ExecuteLambdaType>
FRDGPassRef AddPass(FRDGEventName&& Name, const ParameterStructType* ParameterStruct, ERDGPassFlags Flags, ExecuteLambdaType&& ExecuteLambda);
// 增加沒有引數只有Lambda的Pass.
template <typename ExecuteLambdaType>
FRDGPassRef AddPass(FRDGEventName&& Name, ERDGPassFlags Flags, ExecuteLambdaType&& ExecuteLambda);
// 在Builder執行末期, 提取池內紋理到指定的指標. 對於RDG建立的資源, 這將延長GPU資源的生命週期,直到執行,指標被填充. 如果指定,紋理將轉換為AccessFinal狀態, 否則將轉換為kDefaultAccessFinal狀態.
void QueueTextureExtraction(FRDGTextureRef Texture, TRefCountPtr<IPooledRenderTarget>* OutPooledTexturePtr);
void QueueTextureExtraction(FRDGTextureRef Texture, TRefCountPtr<IPooledRenderTarget>* OutPooledTexturePtr, ERHIAccess AccessFinal);
// 在Builder執行末期, 提取緩衝區到指定的指標.
void QueueBufferExtraction(FRDGBufferRef Buffer, TRefCountPtr<FRDGPooledBuffer>* OutPooledBufferPtr);
void QueueBufferExtraction(FRDGBufferRef Buffer, TRefCountPtr<FRDGPooledBuffer>* OutPooledBufferPtr, ERHIAccess AccessFinal);
// 預分配資源. 只對RDG建立的資源, 會強制立即分配底層池內資源, 有效地將其推廣到外部資源. 這將增加記憶體壓力,但允許使用GetPooled{Texture, Buffer}查詢池中的資源. 主要用於增量地將程式碼移植到RDG.
void PreallocateTexture(FRDGTextureRef Texture);
void PreallocateBuffer(FRDGBufferRef Buffer);
// 立即獲取底層資源, 只允許用於註冊或預分配的資源.
const TRefCountPtr<IPooledRenderTarget>& GetPooledTexture(FRDGTextureRef Texture) const;
const TRefCountPtr<FRDGPooledBuffer>& GetPooledBuffer(FRDGBufferRef Buffer) const;
// 設定執行之後的狀態.
void SetTextureAccessFinal(FRDGTextureRef Texture, ERHIAccess Access);
void SetBufferAccessFinal(FRDGBufferRef Buffer, ERHIAccess Access);
void RemoveUnusedTextureWarning(FRDGTextureRef Texture);
void RemoveUnusedBufferWarning(FRDGBufferRef Buffer);
// 執行佇列Pass,管理渲染目標(RHI RenderPasses)的設定,資源轉換和佇列紋理提取.
void Execute();
// 渲染圖形資源池的每幀更新.
static void TickPoolElements();
// RDG使用的命令列表.
FRHICommandListImmediate& RHICmdList;
private:
static const ERHIAccess kDefaultAccessInitial = ERHIAccess::Unknown;
static const ERHIAccess kDefaultAccessFinal = ERHIAccess::SRVMask;
static const char* const kDefaultUnaccountedCSVStat;
// RDG使用的AsyncCompute命令列表.
FRHIAsyncComputeCommandListImmediate& RHICmdListAsyncCompute;
FRDGAllocator Allocator;
const FRDGEventName BuilderName;
ERDGPassFlags OverridePassFlags(const TCHAR* PassName, ERDGPassFlags Flags, bool bAsyncComputeSupported);
FORCEINLINE FRDGPassHandle GetProloguePassHandle() const;
FORCEINLINE FRDGPassHandle GetEpiloguePassHandle() const;
// RDG物件登錄檔.
FRDGPassRegistry Passes;
FRDGTextureRegistry Textures;
FRDGBufferRegistry Buffers;
FRDGViewRegistry Views;
FRDGUniformBufferRegistry UniformBuffers;
// 已被裁剪的Pass.
FRDGPassBitArray PassesToCull;
// 沒有引數的Pass.
FRDGPassBitArray PassesWithEmptyParameters;
// 跟蹤外部資源到已註冊的渲染圖對應項,以進行重複資料刪除.
TSortedMap<FRHITexture*, FRDGTexture*, TInlineAllocator<4, SceneRenderingAllocator>> ExternalTextures;
TSortedMap<const FRDGPooledBuffer*, FRDGBuffer*, TInlineAllocator<4, SceneRenderingAllocator>> ExternalBuffers;
FRDGPass* ProloguePass = nullptr;
FRDGPass* EpiloguePass = nullptr;
// 待提取資源的列表.
TArray<TPair<FRDGTextureRef, TRefCountPtr<IPooledRenderTarget>*>, TInlineAllocator<4, SceneRenderingAllocator>> ExtractedTextures;
TArray<TPair<FRDGBufferRef, TRefCountPtr<FRDGPooledBuffer>*>, TInlineAllocator<4, SceneRenderingAllocator>> ExtractedBuffers;
// 用於中間操作的紋理狀態, 儲存在這裡以避免重新分配.
FRDGTextureTransientSubresourceStateIndirect ScratchTextureState;
EAsyncComputeBudget AsyncComputeBudgetScope = EAsyncComputeBudget::EAll_4;
// 編譯.
void Compile();
// 清理.
void Clear();
// 開始資源轉換.
void BeginResourceRHI(FRDGUniformBuffer* UniformBuffer);
void BeginResourceRHI(FRDGPassHandle, FRDGTexture* Texture);
void BeginResourceRHI(FRDGPassHandle, FRDGTextureSRV* SRV);
void BeginResourceRHI(FRDGPassHandle, FRDGTextureUAV* UAV);
void BeginResourceRHI(FRDGPassHandle, FRDGBuffer* Buffer);
void BeginResourceRHI(FRDGPassHandle, FRDGBufferSRV* SRV);
void BeginResourceRHI(FRDGPassHandle, FRDGBufferUAV* UAV);
// 結束資源轉換.
void EndResourceRHI(FRDGPassHandle, FRDGTexture* Texture, uint32 ReferenceCount);
void EndResourceRHI(FRDGPassHandle, FRDGBuffer* Buffer, uint32 ReferenceCount);
// Pass介面.
void SetupPassInternal(FRDGPass* Pass, FRDGPassHandle PassHandle, ERHIPipeline PassPipeline);
void SetupPass(FRDGPass* Pass);
void SetupEmptyPass(FRDGPass* Pass);
void ExecutePass(FRDGPass* Pass);
// Pass前序後序.
void ExecutePassPrologue(FRHIComputeCommandList& RHICmdListPass, FRDGPass* Pass);
void ExecutePassEpilogue(FRHIComputeCommandList& RHICmdListPass, FRDGPass* Pass);
// 收集資源和屏障.
void CollectPassResources(FRDGPassHandle PassHandle);
void CollectPassBarriers(FRDGPassHandle PassHandle, FRDGPassHandle& LastUntrackedPassHandle);
// 增加Pass依賴.
void AddPassDependency(FRDGPassHandle ProducerHandle, FRDGPassHandle ConsumerHandle);
// 增加後序轉換.
void AddEpilogueTransition(FRDGTextureRef Texture, FRDGPassHandle LastUntrackedPassHandle);
void AddEpilogueTransition(FRDGBufferRef Buffer, FRDGPassHandle LastUntrackedPassHandle);
// 增加普通轉換.
void AddTransition(FRDGPassHandle PassHandle, FRDGTextureRef Texture, const FRDGTextureTransientSubresourceStateIndirect& StateAfter, FRDGPassHandle LastUntrackedPassHandle);
void AddTransition(FRDGPassHandle PassHandle, FRDGBufferRef Buffer, FRDGSubresourceState StateAfter, FRDGPassHandle LastUntrackedPassHandle);
void AddTransitionInternal(
FRDGParentResource* Resource,
FRDGSubresourceState StateBefore,
FRDGSubresourceState StateAfter,
FRDGPassHandle LastUntrackedPassHandle,
const FRHITransitionInfo& TransitionInfo);
// 獲取渲染Pass資訊.
FRHIRenderPassInfo GetRenderPassInfo(const FRDGPass* Pass) const;
// 分配子資源.
FRDGSubresourceState* AllocSubresource(const FRDGSubresourceState& Other);
#if RDG_ENABLE_DEBUG
void VisualizePassOutputs(const FRDGPass* Pass);
void ClobberPassOutputs(const FRDGPass* Pass);
#endif
};
作為RDG系統的驅動器,FRDGBuilder負責儲存資料、處理狀態轉換、自動管理資源生命週期和屏障、裁剪無效資源,以及收集、編譯、執行Pass,提取紋理或緩衝等等功能。它的內部執行機制比較複雜,後續的章節會詳盡地剖析之。
11.3 RDG機制
本節將主要闡述RDG的工作機制、過程和原理,以及它在渲染方面的優勢和特性。
有的同學如果只想學習如何使用RDG,則可以跳過本章而直接閱讀11.4 RDG開發。
11.3.1 RDG機制概述
渲染依賴圖框架(Rendering Dependency Graph Framework),它設定Lambda範圍,該範圍設計為Pass,利用延遲執行向RHI發出GPU命令。它們是通過FRDGBuilder::AddPass()建立的。當建立一個Pass時,它需要Shader引數。 可以是任何著色器引數,但框架最感興趣的是渲染圖形資源。
儲存所有Pass引數的結構應該使用FRDGBuilder::AllocParameters()分配,以確保正確的生命週期,因為Lambda的執行是被延遲的。
用FRDGBuilder::CreateTexture()或FRDGBuilder::CreateBuffer()建立的一個渲染圖資源只記錄資源描述符。當資源需要時,將按圖表進行分配。渲染圖將跟蹤資源的生命週期,並在剩餘的Pass不再引用它時釋放和重用記憶體。
Pass使用的所有渲染圖資源必須在FRDGBuilder::AddPass()給出的Pass引數中,因為渲染圖需要知道每個Pass正在使用哪些資源。
只保證在執行Pass時分配資源。 因此,訪問它們應該只在使用FRDGBuilder::AddPass()建立的Pass的Lambda範圍內完成。未列出Pass使用的一些資源可能會導致問題。
重要的是不要在引數中引用比Pass需要的更多的圖資源,因為這人為地增加了關於該資源生命週期的圖資訊。這可能會導致記憶體使用的增加或防止Pass的重疊地執行。一個例子是ClearUnusedGraphResources(),它可以自動清除Shader中沒有使用的資源引用。如果資源在Pass中沒有被使用,則會發出警告。
Pass執行的lambda範圍可能發生在FRDGBuilder::AddPass()之後的任何時候。出於除錯的目的,它可能直接發生在具有Immediate模式的AddPass()中。當在傳遞執行過程中發生錯誤時,立即模式允許您使用可能包含錯誤源原因的Pass設定的呼叫堆疊。Immediate模式可以通過命令列命令-rdgimmediate
或控制檯變數r.RDG.ImmediateMode=1
來啟用。
由遺留程式碼生成的池管理資源紋理FPooledRenderTarget可以通過使用FRDGBuilder::RegisterExternalTexture()在渲染圖中使用。
有了Pass依賴關係的資訊,執行可能會對不同的硬體目標進行優先順序排序,例如對記憶體壓力或Pass GPU執行併發進行優先順序排序。因此,不能保證Pass的執行順序。Pass的執行順序只能保證將在中間資源上執行工作,就像立即模式在GPU上執行工作一樣。
渲染圖通道不應該修改外部資料結構的狀態,因為這可能會根據Pass的執行順序導致邊界情況。應該使用FRDGBuilder::QueueTextureExtraction()提取執行完成後倖存的渲染圖資源(例如viewport back buffer、TAA歷史記錄…)。如果檢測到一個Pass對生成任何計劃提取的資源或修改外部紋理沒有用處,這個Pass甚至可能不會執行警告。
除非是出於強大的技術原因(比如為VR一次性渲染多個檢視的立體渲染),否則不要在同一Pass中將多個工作捆綁在不同的資源上。這將最終在一組工作上建立更多的依賴關係,單個工作可能只需要這些依賴關係的一個子集。排程程式可能會將其中的一部分與其它GPU工作重疊。這也可能保留分配的瞬態資源更長的時間,潛在地增加整幀的最高記憶體壓力峰值。
雖然AddPass()只希望lambda範圍有延遲執行,但這並不意味著我們需要編寫一個。通過使用一個更簡單的工具箱(如FComputeShaderUtils、FPixelShaderUtils)就可以滿足大多數情況的需求了。
11.3.2 FRDGBuilder::AddPass
FRDGBuilder::AddPass是向RDG系統增加一個包含Pass引數和Lambda的Pass,其具體的邏輯如下:
// Engine\Source\Runtime\RenderCore\Public\RenderGraphBuilder.inl
template <typename ParameterStructType, typename ExecuteLambdaType>
FRDGPassRef FRDGBuilder::AddPass(FRDGEventName&& Name, const ParameterStructType* ParameterStruct, ERDGPassFlags Flags, ExecuteLambdaType&& ExecuteLambda)
{
using LambdaPassType = TRDGLambdaPass<ParameterStructType, ExecuteLambdaType>;
(......)
// 分配RDG Pass例項.
FRDGPass* Pass = Allocator.AllocObject<LambdaPassType>(
MoveTemp(Name),
ParameterStruct,
OverridePassFlags(Name.GetTCHAR(), Flags, LambdaPassType::kSupportsAsyncCompute),
MoveTemp(ExecuteLambda));
// 加入Pass列表.
Passes.Insert(Pass);
// 設定Pass.
SetupPass(Pass);
return Pass;
}
AddPass的邏輯比較簡單,將傳入的資料構造一個FRDGPass例項,然後加入列表並設定Pass資料。下面是SetupPass的具體邏輯:
void FRDGBuilder::SetupPass(FRDGPass* Pass)
{
// 獲取Pass資料.
const FRDGParameterStruct PassParameters = Pass->GetParameters();
const FRDGPassHandle PassHandle = Pass->GetHandle();
const ERDGPassFlags PassFlags = Pass->GetFlags();
const ERHIPipeline PassPipeline = Pass->GetPipeline();
bool bPassUAVAccess = false;
// ----處理紋理狀態----
Pass->TextureStates.Reserve(PassParameters.GetTextureParameterCount() + (PassParameters.HasRenderTargets() ? (MaxSimultaneousRenderTargets + 1) : 0));
// 遍歷所有紋理, 對每個紋理執行狀態/資料/引用等處理.
EnumerateTextureAccess(PassParameters, PassFlags, [&](FRDGViewRef TextureView, FRDGTextureRef Texture, ERHIAccess Access, FRDGTextureSubresourceRange Range)
{
const FRDGViewHandle NoUAVBarrierHandle = GetHandleIfNoUAVBarrier(TextureView);
const EResourceTransitionFlags TransitionFlags = GetTextureViewTransitionFlags(TextureView, Texture);
auto& PassState = Pass->TextureStates.FindOrAdd(Texture);
PassState.ReferenceCount++;
const bool bWholeTextureRange = Range.IsWholeResource(Texture->GetSubresourceLayout());
bool bWholePassState = IsWholeResource(PassState.State);
// Convert the pass state to subresource dimensionality if we've found a subresource range.
if (!bWholeTextureRange && bWholePassState)
{
InitAsSubresources(PassState.State, Texture->Layout);
bWholePassState = false;
}
const auto AddSubresourceAccess = [&](FRDGSubresourceState& State)
{
State.Access = MakeValidAccess(State.Access | Access);
State.Flags |= TransitionFlags;
State.NoUAVBarrierFilter.AddHandle(NoUAVBarrierHandle);
State.Pipeline = PassPipeline;
};
if (bWholePassState)
{
AddSubresourceAccess(GetWholeResource(PassState.State));
}
else
{
EnumerateSubresourceRange(PassState.State, Texture->Layout, Range, AddSubresourceAccess);
}
bPassUAVAccess |= EnumHasAnyFlags(Access, ERHIAccess::UAVMask);
});
// ----處理緩衝區狀態----
Pass->BufferStates.Reserve(PassParameters.GetBufferParameterCount());
// 遍歷所有緩衝區, 對每個緩衝區執行狀態/資料/引用等處理.
EnumerateBufferAccess(PassParameters, PassFlags, [&](FRDGViewRef BufferView, FRDGBufferRef Buffer, ERHIAccess Access)
{
const FRDGViewHandle NoUAVBarrierHandle = GetHandleIfNoUAVBarrier(BufferView);
auto& PassState = Pass->BufferStates.FindOrAdd(Buffer);
PassState.ReferenceCount++;
PassState.State.Access = MakeValidAccess(PassState.State.Access | Access);
PassState.State.NoUAVBarrierFilter.AddHandle(NoUAVBarrierHandle);
PassState.State.Pipeline = PassPipeline;
bPassUAVAccess |= EnumHasAnyFlags(Access, ERHIAccess::UAVMask);
});
Pass->bUAVAccess = bPassUAVAccess;
const bool bEmptyParameters = !Pass->TextureStates.Num() && !Pass->BufferStates.Num();
PassesWithEmptyParameters.Add(bEmptyParameters);
// 在Graphics管線, Pass可以開始/結束Pass自己的資源. 非同步計算則在編譯期間編排.
if (PassPipeline == ERHIPipeline::Graphics && !bEmptyParameters)
{
Pass->ResourcesToBegin.Add(Pass);
Pass->ResourcesToEnd.Add(Pass);
}
// 內部設定Pass.
SetupPassInternal(Pass, PassHandle, PassPipeline);
}
下面繼續解析SetupPassInternal:
void FRDGBuilder::SetupPassInternal(FRDGPass* Pass, FRDGPassHandle PassHandle, ERHIPipeline PassPipeline)
{
// 設定各種Pass為自身控制程式碼.
Pass->GraphicsJoinPass = PassHandle;
Pass->GraphicsForkPass = PassHandle;
Pass->PrologueBarrierPass = PassHandle;
Pass->EpilogueBarrierPass = PassHandle;
(......)
// 如果是立即模式且非後序Pass,
if (GRDGImmediateMode && Pass != EpiloguePass)
{
// 簡單地將merge狀態重定向成pass狀態,因為不會編譯圖.
// 紋理的Merge狀態.
for (auto& TexturePair : Pass->TextureStates)
{
auto& PassState = TexturePair.Value;
const uint32 SubresourceCount = PassState.State.Num();
PassState.MergeState.SetNum(SubresourceCount);
for (uint32 Index = 0; Index < SubresourceCount; ++Index)
{
if (PassState.State[Index].Access != ERHIAccess::Unknown)
{
PassState.MergeState[Index] = &PassState.State[Index];
PassState.MergeState[Index]->SetPass(PassHandle);
}
}
}
// 緩衝區的Merge狀態.
for (auto& BufferPair : Pass->BufferStates)
{
auto& PassState = BufferPair.Value;
PassState.MergeState = &PassState.State;
PassState.MergeState->SetPass(PassHandle);
}
FRDGPassHandle LastUntrackedPassHandle = GetProloguePassHandle();
// 收集Pass資源.
CollectPassResources(PassHandle);
// 收集Pass屏障.
CollectPassBarriers(PassHandle, LastUntrackedPassHandle);
// 直接執行Pass.
ExecutePass(Pass);
}
}
總結起來,AddPass會根據傳入的引數構建一個RDG Pass的例項,然後設定該Pass的紋理和緩衝區資料,接著用內部設定Pass的依賴Pass等控制程式碼,如果是立即模式,會重定向紋理和緩衝區的Merge狀態成Pass狀態,並且直接執行。
11.3.3 FRDGBuilder::Compile
FRDGBuilder的編譯邏輯非常複雜,執行了很多處理和優化,具體如下:
void FRDGBuilder::Compile()
{
uint32 RasterPassCount = 0;
uint32 AsyncComputePassCount = 0;
// Pass標記位.
FRDGPassBitArray PassesOnAsyncCompute(false, Passes.Num());
FRDGPassBitArray PassesOnRaster(false, Passes.Num());
FRDGPassBitArray PassesWithUntrackedOutputs(false, Passes.Num());
FRDGPassBitArray PassesToNeverCull(false, Passes.Num());
const FRDGPassHandle ProloguePassHandle = GetProloguePassHandle();
const FRDGPassHandle EpiloguePassHandle = GetEpiloguePassHandle();
const auto IsCrossPipeline = [&](FRDGPassHandle A, FRDGPassHandle B)
{
return PassesOnAsyncCompute[A] != PassesOnAsyncCompute[B];
};
const auto IsSortedBefore = [&](FRDGPassHandle A, FRDGPassHandle B)
{
return A < B;
};
const auto IsSortedAfter = [&](FRDGPassHandle A, FRDGPassHandle B)
{
return A > B;
};
// 在圖中構建生產者/消費者依賴關係,並構建打包的後設資料位陣列,以便在搜尋符合特定條件的Pass時獲得更好的快取一致性.
// 搜尋根也被用來進行篩選. 攜帶了不跟蹤的RHI輸出(e.g. SHADER_PARAMETER_{BUFFER, TEXTURE}_UAV)的Pass不能被裁剪, 也不能寫入外部資源的任何Pass.
// 資源提取將生命週期延長到尾聲(epilogue)Pass,尾聲Pass總是圖的根。前言和尾聲是輔助Pass,因此永遠不會被淘汰。
{
SCOPED_NAMED_EVENT(FRDGBuilder_Compile_Culling_Dependencies, FColor::Emerald);
// 增加裁剪依賴.
const auto AddCullingDependency = [&](FRDGPassHandle& ProducerHandle, FRDGPassHandle PassHandle, ERHIAccess Access)
{
if (Access != ERHIAccess::Unknown)
{
if (ProducerHandle.IsValid())
{
// 增加Pass依賴.
AddPassDependency(ProducerHandle, PassHandle);
}
// 如果可寫, 則儲存新的生產者.
if (IsWritableAccess(Access))
{
ProducerHandle = PassHandle;
}
}
};
// 遍歷所有Pass, 處理每個Pass的紋理和緩衝區狀態等.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
FRDGPass* Pass = Passes[PassHandle];
bool bUntrackedOutputs = Pass->GetParameters().HasExternalOutputs();
// 處理Pass的所有紋理狀態.
for (auto& TexturePair : Pass->TextureStates)
{
FRDGTextureRef Texture = TexturePair.Key;
auto& LastProducers = Texture->LastProducers;
auto& PassState = TexturePair.Value.State;
const bool bWholePassState = IsWholeResource(PassState);
const bool bWholeProducers = IsWholeResource(LastProducers);
// 生產者陣列需要至少和pass狀態陣列一樣大.
if (bWholeProducers && !bWholePassState)
{
InitAsSubresources(LastProducers, Texture->Layout);
}
// 增加裁剪依賴.
for (uint32 Index = 0, Count = LastProducers.Num(); Index < Count; ++Index)
{
AddCullingDependency(LastProducers[Index], PassHandle, PassState[bWholePassState ? 0 : Index].Access);
}
bUntrackedOutputs |= Texture->bExternal;
}
// 處理Pass的所有緩衝區狀態.
for (auto& BufferPair : Pass->BufferStates)
{
FRDGBufferRef Buffer = BufferPair.Key;
AddCullingDependency(Buffer->LastProducer, PassHandle, BufferPair.Value.State.Access);
bUntrackedOutputs |= Buffer->bExternal;
}
// 處理Pass的其它標記和資料.
const ERDGPassFlags PassFlags = Pass->GetFlags();
const bool bAsyncCompute = EnumHasAnyFlags(PassFlags, ERDGPassFlags::AsyncCompute);
const bool bRaster = EnumHasAnyFlags(PassFlags, ERDGPassFlags::Raster);
const bool bNeverCull = EnumHasAnyFlags(PassFlags, ERDGPassFlags::NeverCull);
PassesOnRaster[PassHandle] = bRaster;
PassesOnAsyncCompute[PassHandle] = bAsyncCompute;
PassesToNeverCull[PassHandle] = bNeverCull;
PassesWithUntrackedOutputs[PassHandle] = bUntrackedOutputs;
AsyncComputePassCount += bAsyncCompute ? 1 : 0;
RasterPassCount += bRaster ? 1 : 0;
}
// prologue/epilogue設定為不追蹤, 它們分別負責外部資源的匯入/匯出.
PassesWithUntrackedOutputs[ProloguePassHandle] = true;
PassesWithUntrackedOutputs[EpiloguePassHandle] = true;
// 處理提取紋理的裁剪依賴.
for (const auto& Query : ExtractedTextures)
{
FRDGTextureRef Texture = Query.Key;
for (FRDGPassHandle& ProducerHandle : Texture->LastProducers)
{
AddCullingDependency(ProducerHandle, EpiloguePassHandle, Texture->AccessFinal);
}
Texture->ReferenceCount++;
}
// 處理提取緩衝區的裁剪依賴.
for (const auto& Query : ExtractedBuffers)
{
FRDGBufferRef Buffer = Query.Key;
AddCullingDependency(Buffer->LastProducer, EpiloguePassHandle, Buffer->AccessFinal);
Buffer->ReferenceCount++;
}
}
// -------- 處理Pass裁剪 --------
if (GRDGCullPasses)
{
TArray<FRDGPassHandle, TInlineAllocator<32, SceneRenderingAllocator>> PassStack;
// 所有Pass初始化為剔除.
PassesToCull.Init(true, Passes.Num());
// 收集Pass的根列表, 符合條件的是那些不追蹤的輸出或標記為永不剔除的Pass.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
if (PassesWithUntrackedOutputs[PassHandle] || PassesToNeverCull[PassHandle])
{
PassStack.Add(PassHandle);
}
}
// 非遞迴迴圈的棧遍歷, 採用深度優先搜尋方式, 標記每個根可達的Pass節點為不裁剪.
while (PassStack.Num())
{
const FRDGPassHandle PassHandle = PassStack.Pop();
if (PassesToCull[PassHandle])
{
PassesToCull[PassHandle] = false;
PassStack.Append(Passes[PassHandle]->Producers);
#if STATS
--GRDGStatPassCullCount;
#endif
}
}
}
else // 不啟用Pass裁剪, 所有Pass初始化為不裁剪.
{
PassesToCull.Init(false, Passes.Num());
}
// -------- 處理Pass屏障 --------
// 遍歷經過篩選的圖,併為每個子資源編譯屏障, 某些過渡是多餘的, 例如read-to-read。
// RDG採用了保守的啟發式,選擇不合並不一定意味著就要執行轉換.
// 它們是兩個不同的步驟。合併狀態跟蹤第一次和最後一次的Pass間隔. Pass的引用也會累積到每個資源上.
// 必須在剔除後發生,因為剔除後的Pass不能提供引用.
{
SCOPED_NAMED_EVENT(FRDGBuilder_Compile_Barriers, FColor::Emerald);
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
// 跳過被裁剪或無引數的Pass.
if (PassesToCull[PassHandle] || PassesWithEmptyParameters[PassHandle])
{
continue;
}
// 合併子資源狀態.
const auto MergeSubresourceStates = [&](ERDGParentResourceType ResourceType, FRDGSubresourceState*& PassMergeState, FRDGSubresourceState*& ResourceMergeState, const FRDGSubresourceState& PassState)
{
// 跳過未知狀態的資源合併.
if (PassState.Access == ERHIAccess::Unknown)
{
return;
}
if (!ResourceMergeState || !FRDGSubresourceState::IsMergeAllowed(ResourceType, *ResourceMergeState, PassState))
{
// 跨管線、不可合併的狀態改變需要一個新的pass依賴項來進行防護.
if (ResourceMergeState && ResourceMergeState->Pipeline != PassState.Pipeline)
{
AddPassDependency(ResourceMergeState->LastPass, PassHandle);
}
// 分配一個新的掛起的合併狀態,並將其分配給pass狀態.
ResourceMergeState = AllocSubresource(PassState);
ResourceMergeState->SetPass(PassHandle);
}
else
{
// 合併Pass狀態進合併後的狀態.
ResourceMergeState->Access |= PassState.Access;
ResourceMergeState->LastPass = PassHandle;
}
PassMergeState = ResourceMergeState;
};
const bool bAsyncComputePass = PassesOnAsyncCompute[PassHandle];
// 獲取當前處理的Pass例項.
FRDGPass* Pass = Passes[PassHandle];
// 處理當前Pass的紋理狀態.
for (auto& TexturePair : Pass->TextureStates)
{
FRDGTextureRef Texture = TexturePair.Key;
auto& PassState = TexturePair.Value;
// 增加引用數量.
Texture->ReferenceCount += PassState.ReferenceCount;
Texture->bUsedByAsyncComputePass |= bAsyncComputePass;
const bool bWholePassState = IsWholeResource(PassState.State);
const bool bWholeMergeState = IsWholeResource(Texture->MergeState);
// 為簡單起見,合併/Pass狀態維度應該匹配.
if (bWholeMergeState && !bWholePassState)
{
InitAsSubresources(Texture->MergeState, Texture->Layout);
}
else if (!bWholeMergeState && bWholePassState)
{
InitAsWholeResource(Texture->MergeState);
}
const uint32 SubresourceCount = PassState.State.Num();
PassState.MergeState.SetNum(SubresourceCount);
// 合併子資源狀態.
for (uint32 Index = 0; Index < SubresourceCount; ++Index)
{
MergeSubresourceStates(ERDGParentResourceType::Texture, PassState.MergeState[Index], Texture->MergeState[Index], PassState.State[Index]);
}
}
// 處理當前Pass的緩衝區狀態.
for (auto& BufferPair : Pass->BufferStates)
{
FRDGBufferRef Buffer = BufferPair.Key;
auto& PassState = BufferPair.Value;
Buffer->ReferenceCount += PassState.ReferenceCount;
Buffer->bUsedByAsyncComputePass |= bAsyncComputePass;
MergeSubresourceStates(ERDGParentResourceType::Buffer, PassState.MergeState, Buffer->MergeState, PassState.State);
}
}
}
// 處理非同步計算Pass.
if (AsyncComputePassCount > 0)
{
SCOPED_NAMED_EVENT(FRDGBuilder_Compile_AsyncCompute, FColor::Emerald);
FRDGPassBitArray PassesWithCrossPipelineProducer(false, Passes.Num());
FRDGPassBitArray PassesWithCrossPipelineConsumer(false, Passes.Num());
// 遍歷正在執行的活動Pass,以便為每個Pass找到最新的跨管道生產者和最早的跨管道消費者, 以便後續構建非同步計算重疊區域時縮小搜尋空間.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
if (PassesToCull[PassHandle] || PassesWithEmptyParameters[PassHandle])
{
continue;
}
FRDGPass* Pass = Passes[PassHandle];
// 遍歷生產者, 處理生產者和消費者的引用關係.
for (FRDGPassHandle ProducerHandle : Pass->GetProducers())
{
const FRDGPassHandle ConsumerHandle = PassHandle;
if (!IsCrossPipeline(ProducerHandle, ConsumerHandle))
{
continue;
}
FRDGPass* Consumer = Pass;
FRDGPass* Producer = Passes[ProducerHandle];
// 為生產者查詢另一個管道上最早的消費者.
if (Producer->CrossPipelineConsumer.IsNull() || IsSortedBefore(ConsumerHandle, Producer->CrossPipelineConsumer))
{
Producer->CrossPipelineConsumer = PassHandle;
PassesWithCrossPipelineConsumer[ProducerHandle] = true;
}
// 為消費者查詢另一個管道上的最新生產者.
if (Consumer->CrossPipelineProducer.IsNull() || IsSortedAfter(ProducerHandle, Consumer->CrossPipelineProducer))
{
Consumer->CrossPipelineProducer = ProducerHandle;
PassesWithCrossPipelineProducer[ConsumerHandle] = true;
}
}
}
// 為非同步計算建立fork / join重疊區域, 用於柵欄及資源分配/回收. 在fork/join完成之前,非同步計算Pass不能分配/釋放它們的資源引用,因為兩個管道是並行執行的。因此,非同步計算的所有資源生命週期都被擴充套件到整個非同步區域。
const auto IsCrossPipelineProducer = [&](FRDGPassHandle A)
{
return PassesWithCrossPipelineConsumer[A];
};
const auto IsCrossPipelineConsumer = [&](FRDGPassHandle A)
{
return PassesWithCrossPipelineProducer[A];
};
// 查詢跨管道生產者.
const auto FindCrossPipelineProducer = [&](FRDGPassHandle PassHandle)
{
FRDGPassHandle LatestProducerHandle = ProloguePassHandle;
FRDGPassHandle ConsumerHandle = PassHandle;
// 期望在其它管道上找到最新的生產者,以便建立一個分叉點. 因為可以用N個生產者通道消耗N個資源,所以只關心最後一個.
while (ConsumerHandle != Passes.Begin())
{
if (!PassesToCull[ConsumerHandle] && !IsCrossPipeline(ConsumerHandle, PassHandle) && IsCrossPipelineConsumer(ConsumerHandle))
{
const FRDGPass* Consumer = Passes[ConsumerHandle];
if (IsSortedAfter(Consumer->CrossPipelineProducer, LatestProducerHandle))
{
LatestProducerHandle = Consumer->CrossPipelineProducer;
}
}
--ConsumerHandle;
}
return LatestProducerHandle;
};
// 查詢跨管道消費者.
const auto FindCrossPipelineConsumer = [&](FRDGPassHandle PassHandle)
{
check(PassHandle != EpiloguePassHandle);
FRDGPassHandle EarliestConsumerHandle = EpiloguePassHandle;
FRDGPassHandle ProducerHandle = PassHandle;
// 期望找到另一個管道上最早的使用者,因為這在管道之間建立了連線點。因為可以在另一個管道上為N個消費者生產,所以只關心第一個執行的消費者.
while (ProducerHandle != Passes.End())
{
if (!PassesToCull[ProducerHandle] && !IsCrossPipeline(ProducerHandle, PassHandle) && IsCrossPipelineProducer(ProducerHandle))
{
const FRDGPass* Producer = Passes[ProducerHandle];
if (IsSortedBefore(Producer->CrossPipelineConsumer, EarliestConsumerHandle))
{
EarliestConsumerHandle = Producer->CrossPipelineConsumer;
}
}
++ProducerHandle;
}
return EarliestConsumerHandle;
};
// 將圖形Pass插入到非同步計算Pass的分叉中.
const auto InsertGraphicsToAsyncComputeFork = [&](FRDGPass* GraphicsPass, FRDGPass* AsyncComputePass)
{
FRDGBarrierBatchBegin& EpilogueBarriersToBeginForAsyncCompute = GraphicsPass->GetEpilogueBarriersToBeginForAsyncCompute(Allocator);
GraphicsPass->bGraphicsFork = 1;
EpilogueBarriersToBeginForAsyncCompute.SetUseCrossPipelineFence();
AsyncComputePass->bAsyncComputeBegin = 1;
AsyncComputePass->GetPrologueBarriersToEnd(Allocator).AddDependency(&EpilogueBarriersToBeginForAsyncCompute);
};
// 將非同步計算Pass插入到圖形Pass的合併中.
const auto InsertAsyncToGraphicsComputeJoin = [&](FRDGPass* AsyncComputePass, FRDGPass* GraphicsPass)
{
FRDGBarrierBatchBegin& EpilogueBarriersToBeginForGraphics = AsyncComputePass->GetEpilogueBarriersToBeginForGraphics(Allocator);
AsyncComputePass->bAsyncComputeEnd = 1;
EpilogueBarriersToBeginForGraphics.SetUseCrossPipelineFence();
GraphicsPass->bGraphicsJoin = 1;
GraphicsPass->GetPrologueBarriersToEnd(Allocator).AddDependency(&EpilogueBarriersToBeginForGraphics);
};
FRDGPass* PrevGraphicsForkPass = nullptr;
FRDGPass* PrevGraphicsJoinPass = nullptr;
FRDGPass* PrevAsyncComputePass = nullptr;
// 遍歷所有Pass, 擴充套件資源的生命週期, 處理圖形Pass和非同步計算Pass的交叉和合並節點.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
if (!PassesOnAsyncCompute[PassHandle] || PassesToCull[PassHandle])
{
continue;
}
FRDGPass* AsyncComputePass = Passes[PassHandle];
// 找到分叉Pass和合並Pass.
const FRDGPassHandle GraphicsForkPassHandle = FindCrossPipelineProducer(PassHandle);
const FRDGPassHandle GraphicsJoinPassHandle = FindCrossPipelineConsumer(PassHandle);
AsyncComputePass->GraphicsForkPass = GraphicsForkPassHandle;
AsyncComputePass->GraphicsJoinPass = GraphicsJoinPassHandle;
FRDGPass* GraphicsForkPass = Passes[GraphicsForkPassHandle];
FRDGPass* GraphicsJoinPass = Passes[GraphicsJoinPassHandle];
// 將非同步計算中使用的資源的生命週期延長到fork/join圖形Pass。
GraphicsForkPass->ResourcesToBegin.Add(AsyncComputePass);
GraphicsJoinPass->ResourcesToEnd.Add(AsyncComputePass);
// 將圖形分叉Pass插入到非同步計算分叉Pass.
if (PrevGraphicsForkPass != GraphicsForkPass)
{
InsertGraphicsToAsyncComputeFork(GraphicsForkPass, AsyncComputePass);
}
// 將非同步計算合併Pass插入到圖形合併Pass.
if (PrevGraphicsJoinPass != GraphicsJoinPass && PrevAsyncComputePass)
{
InsertAsyncToGraphicsComputeJoin(PrevAsyncComputePass, PrevGraphicsJoinPass);
}
PrevAsyncComputePass = AsyncComputePass;
PrevGraphicsForkPass = GraphicsForkPass;
PrevGraphicsJoinPass = GraphicsJoinPass;
}
// 圖中的最後一個非同步計算Pass需要手動連線回epilogue pass.
if (PrevAsyncComputePass)
{
InsertAsyncToGraphicsComputeJoin(PrevAsyncComputePass, EpiloguePass);
PrevAsyncComputePass->bAsyncComputeEndExecute = 1;
}
}
// 遍歷所有圖形管道Pass, 並且合併所有具有相同RT的光柵化Pass到同一個RHI渲染Pass中.
if (GRDGMergeRenderPasses && RasterPassCount > 0)
{
SCOPED_NAMED_EVENT(FRDGBuilder_Compile_RenderPassMerge, FColor::Emerald);
TArray<FRDGPassHandle, SceneRenderingAllocator> PassesToMerge;
FRDGPass* PrevPass = nullptr;
const FRenderTargetBindingSlots* PrevRenderTargets = nullptr;
const auto CommitMerge = [&]
{
if (PassesToMerge.Num())
{
const FRDGPassHandle FirstPassHandle = PassesToMerge[0];
const FRDGPassHandle LastPassHandle = PassesToMerge.Last();
// 給定一個Pass的間隔合併成一個單一的渲染Pass: [B, X, X, X, X, E], 開始Pass(B)和結束Pass(E)會分別呼叫BeginRenderPass/EndRenderPass.
// 另外,begin將處理整個合併間隔的所有序言屏障,end將處理所有尾聲屏障, 這可以避免渲染通道內的資源轉換,並更有效地批量處理資源轉換.
// 假設已經在遍歷期間完成了過濾來自合併集的Pass之間的依賴關係.
// (B)是合併序列裡的首個Pass.
{
FRDGPass* Pass = Passes[FirstPassHandle];
Pass->bSkipRenderPassEnd = 1;
Pass->EpilogueBarrierPass = LastPassHandle;
}
// (X)是中間Pass.
for (int32 PassIndex = 1, PassCount = PassesToMerge.Num() - 1; PassIndex < PassCount; ++PassIndex)
{
const FRDGPassHandle PassHandle = PassesToMerge[PassIndex];
FRDGPass* Pass = Passes[PassHandle];
Pass->bSkipRenderPassBegin = 1;
Pass->bSkipRenderPassEnd = 1;
Pass->PrologueBarrierPass = FirstPassHandle;
Pass->EpilogueBarrierPass = LastPassHandle;
}
// (E)是合併序列裡的最後Pass.
{
FRDGPass* Pass = Passes[LastPassHandle];
Pass->bSkipRenderPassBegin = 1;
Pass->PrologueBarrierPass = FirstPassHandle;
}
#if STATS
GRDGStatRenderPassMergeCount += PassesToMerge.Num();
#endif
}
PassesToMerge.Reset();
PrevPass = nullptr;
PrevRenderTargets = nullptr;
};
// 遍歷所有光柵Pass, 合併所有相同RT的Pass到同一個渲染Pass中.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
// 跳過已被裁剪的Pass.
if (PassesToCull[PassHandle])
{
continue;
}
// 是光柵Pass才處理.
if (PassesOnRaster[PassHandle])
{
FRDGPass* NextPass = Passes[PassHandle];
// 使用者控制渲染Pass的Pass不能與其他Pass合併,光柵UAV的Pass由於潛在的相互依賴也不能合併.
if (EnumHasAnyFlags(NextPass->GetFlags(), ERDGPassFlags::SkipRenderPass) || NextPass->bUAVAccess)
{
CommitMerge();
continue;
}
// 圖形分叉Pass不能和之前的光柵Pass合併.
if (NextPass->bGraphicsFork)
{
CommitMerge();
}
const FRenderTargetBindingSlots& RenderTargets = NextPass->GetParameters().GetRenderTargets();
if (PrevPass)
{
// 對比RT, 以判定是否可以合併.
if (PrevRenderTargets->CanMergeBefore(RenderTargets)
#if WITH_MGPU
&& PrevPass->GPUMask == NextPass->GPUMask
#endif
)
{
// 如果可以, 新增Pass到PassesToMerge列表.
if (!PassesToMerge.Num())
{
PassesToMerge.Add(PrevPass->GetHandle());
}
PassesToMerge.Add(PassHandle);
}
else
{
CommitMerge();
}
}
PrevPass = NextPass;
PrevRenderTargets = &RenderTargets;
}
else if (!PassesOnAsyncCompute[PassHandle])
{
// 圖形管道上的非光柵Pass將使RT合併無效.
CommitMerge();
}
}
CommitMerge();
}
}
以上程式碼顯示RDG編譯期間的邏輯非常複雜,步驟繁多,先後經歷構建生產者和消費者的依賴關係,確定Pass的裁剪等各類標記,調整資源的生命週期,裁剪Pass,處理Pass的資源轉換和屏障,處理非同步計算Pass的依賴和引用關係,查詢並建立分叉和合並Pass節點,合併所有具體相同渲染目標的光柵化Pass等步驟。
以上程式碼還涉及了一些重要介面,下面一一分析之:
// 增加Pass依賴, 將生產者(ProducerHandle)加入到消費者(ConsumerHandle)的生產者列表(Producers)中.
void FRDGBuilder::AddPassDependency(FRDGPassHandle ProducerHandle, FRDGPassHandle ConsumerHandle)
{
FRDGPass* Consumer = Passes[ConsumerHandle];
auto& Producers = Consumer->Producers;
if (Producers.Find(ProducerHandle) == INDEX_NONE)
{
Producers.Add(ProducerHandle);
}
};
// 初始化為子資源.
template <typename ElementType, typename AllocatorType>
inline void InitAsSubresources(TRDGTextureSubresourceArray<ElementType, AllocatorType>& SubresourceArray, const FRDGTextureSubresourceLayout& Layout, const ElementType& Element = {})
{
const uint32 SubresourceCount = Layout.GetSubresourceCount();
SubresourceArray.SetNum(SubresourceCount, false);
for (uint32 SubresourceIndex = 0; SubresourceIndex < SubresourceCount; ++SubresourceIndex)
{
SubresourceArray[SubresourceIndex] = Element;
}
}
// 初始化為整資源.
template <typename ElementType, typename AllocatorType>
FORCEINLINE void InitAsWholeResource(TRDGTextureSubresourceArray<ElementType, AllocatorType>& SubresourceArray, const ElementType& Element = {})
{
SubresourceArray.SetNum(1, false);
SubresourceArray[0] = Element;
}
// 分配子資源.
FRDGSubresourceState* FRDGBuilder::AllocSubresource(const FRDGSubresourceState& Other)
{
FRDGSubresourceState* State = Allocator.AllocPOD<FRDGSubresourceState>();
*State = Other;
return State;
}
11.3.4 FRDGBuilder::Execute
經過前述的收集Pass(AddPass)、編譯渲染圖之後,便可以執行渲染圖了,由FRDGBuilder::Execute承擔:
void FRDGBuilder::Execute()
{
SCOPED_NAMED_EVENT(FRDGBuilder_Execute, FColor::Emerald);
// 在編譯之前,在圖的末尾建立epilogue pass.
EpiloguePass = Passes.Allocate<FRDGSentinelPass>(Allocator, RDG_EVENT_NAME("Graph Epilogue"));
SetupEmptyPass(EpiloguePass);
const FRDGPassHandle ProloguePassHandle = GetProloguePassHandle();
const FRDGPassHandle EpiloguePassHandle = GetEpiloguePassHandle();
FRDGPassHandle LastUntrackedPassHandle = ProloguePassHandle;
// 非立即模式.
if (!GRDGImmediateMode)
{
// 執行之前先編譯, 具體見11.3.3章節.
Compile();
{
SCOPE_CYCLE_COUNTER(STAT_RDG_CollectResourcesTime);
// 收集Pass資源.
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
if (!PassesToCull[PassHandle])
{
CollectPassResources(PassHandle);
}
}
// 結束紋理提取.
for (const auto& Query : ExtractedTextures)
{
EndResourceRHI(EpiloguePassHandle, Query.Key, 1);
}
// 結束緩衝區提取.
for (const auto& Query : ExtractedBuffers)
{
EndResourceRHI(EpiloguePassHandle, Query.Key, 1);
}
}
// 收集Pass的屏障.
{
SCOPE_CYCLE_COUNTER(STAT_RDG_CollectBarriersTime);
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
if (!PassesToCull[PassHandle])
{
CollectPassBarriers(PassHandle, LastUntrackedPassHandle);
}
}
}
}
// 遍歷所有紋理, 每個紋理增加尾聲轉換.
for (FRDGTextureHandle TextureHandle = Textures.Begin(); TextureHandle != Textures.End(); ++TextureHandle)
{
FRDGTextureRef Texture = Textures[TextureHandle];
if (Texture->GetRHIUnchecked())
{
AddEpilogueTransition(Texture, LastUntrackedPassHandle);
Texture->Finalize();
}
}
// 遍歷所有緩衝區, 每個緩衝區增加尾聲轉換.
for (FRDGBufferHandle BufferHandle = Buffers.Begin(); BufferHandle != Buffers.End(); ++BufferHandle)
{
FRDGBufferRef Buffer = Buffers[BufferHandle];
if (Buffer->GetRHIUnchecked())
{
AddEpilogueTransition(Buffer, LastUntrackedPassHandle);
Buffer->Finalize();
}
}
// 執行Pass.
if (!GRDGImmediateMode)
{
QUICK_SCOPE_CYCLE_COUNTER(STAT_FRDGBuilder_Execute_Passes);
for (FRDGPassHandle PassHandle = Passes.Begin(); PassHandle != Passes.End(); ++PassHandle)
{
// 執行非裁剪的Pass.
if (!PassesToCull[PassHandle])
{
ExecutePass(Passes[PassHandle]);
}
}
}
else
{
ExecutePass(EpiloguePass);
}
RHICmdList.SetGlobalUniformBuffers({});
#if WITH_MGPU
(......)
#endif
// 執行紋理提取.
for (const auto& Query : ExtractedTextures)
{
*Query.Value = Query.Key->PooledRenderTarget;
}
// 執行緩衝區提取.
for (const auto& Query : ExtractedBuffers)
{
*Query.Value = Query.Key->PooledBuffer;
}
// 清理.
Clear();
}
在執行過程中涉及到了執行Pass的介面ExecutePass,其邏輯如下:
void FRDGBuilder::ExecutePass(FRDGPass* Pass)
{
QUICK_SCOPE_CYCLE_COUNTER(STAT_FRDGBuilder_ExecutePass);
SCOPED_GPU_MASK(RHICmdList, Pass->GPUMask);
IF_RDG_CPU_SCOPES(CPUScopeStacks.BeginExecutePass(Pass));
// 使用GPU範圍.
#if RDG_GPU_SCOPES
const bool bUsePassEventScope = Pass != EpiloguePass && Pass != ProloguePass;
if (bUsePassEventScope)
{
GPUScopeStacks.BeginExecutePass(Pass);
}
#endif
#if WITH_MGPU
if (!bWaitedForTemporalEffect && NameForTemporalEffect != NAME_None)
{
RHICmdList.WaitForTemporalEffect(NameForTemporalEffect);
bWaitedForTemporalEffect = true;
}
#endif
// 執行pass的順序: 1.prologue -> 2.pass主體 -> 3.epilogue.
// 整個過程使用指定管道上的命令列表執行.
FRHIComputeCommandList& RHICmdListPass = (Pass->GetPipeline() == ERHIPipeline::AsyncCompute)
? static_cast<FRHIComputeCommandList&>(RHICmdListAsyncCompute)
: RHICmdList;
// 1.執行prologue
ExecutePassPrologue(RHICmdListPass, Pass);
// 2.執行pass主體
Pass->Execute(RHICmdListPass);
// 3.執行epilogue
ExecutePassEpilogue(RHICmdListPass, Pass);
#if RDG_GPU_SCOPES
if (bUsePassEventScope)
{
GPUScopeStacks.EndExecutePass(Pass);
}
#endif
// 非同步計算完成, 則立即派發.
if (Pass->bAsyncComputeEnd)
{
FRHIAsyncComputeCommandListImmediate::ImmediateDispatch(RHICmdListAsyncCompute);
}
// 如果是除錯模式且非非同步計算,則提交命令並重新整理到GPU, 然後等待GPU處理完成.
if (GRDGDebugFlushGPU && !GRDGAsyncCompute)
{
RHICmdList.SubmitCommandsAndFlushGPU();
RHICmdList.BlockUntilGPUIdle();
}
}
執行Pass主要有3個步驟:1. prologue、2. pass主體、3. epilogue,它們的執行邏輯如下:
// 1. prologue
void FRDGBuilder::ExecutePassPrologue(FRHIComputeCommandList& RHICmdListPass, FRDGPass* Pass)
{
// 提交前序開始屏障.
if (Pass->PrologueBarriersToBegin)
{
Pass->PrologueBarriersToBegin->Submit(RHICmdListPass);
}
// 提交前序結束屏障.
if (Pass->PrologueBarriersToEnd)
{
Pass->PrologueBarriersToEnd->Submit(RHICmdListPass);
}
// 由於訪問檢查將允許在RDG資源上呼叫GetRHI,所以在第一次使用時將初始化統一緩衝區.
Pass->GetParameters().EnumerateUniformBuffers([&](FRDGUniformBufferRef UniformBuffer)
{
BeginResourceRHI(UniformBuffer);
});
// 設定非同步計算預算(Budget).
if (Pass->GetPipeline() == ERHIPipeline::AsyncCompute)
{
RHICmdListPass.SetAsyncComputeBudget(Pass->AsyncComputeBudget);
}
const ERDGPassFlags PassFlags = Pass->GetFlags();
if (EnumHasAnyFlags(PassFlags, ERDGPassFlags::Raster))
{
if (!EnumHasAnyFlags(PassFlags, ERDGPassFlags::SkipRenderPass) && !Pass->SkipRenderPassBegin())
{
// 呼叫命令佇列的BeginRenderPass介面.
static_cast<FRHICommandList&>(RHICmdListPass).BeginRenderPass(Pass->GetParameters().GetRenderPassInfo(), Pass->GetName());
}
}
}
// 2. pass主體
void FRDGPass::Execute(FRHIComputeCommandList& RHICmdList)
{
QUICK_SCOPE_CYCLE_COUNTER(STAT_FRDGPass_Execute);
// 設定統一緩衝區.
RHICmdList.SetGlobalUniformBuffers(ParameterStruct.GetGlobalUniformBuffers());
// 執行Pass的實現.
ExecuteImpl(RHICmdList);
}
void TRDGLambdaPass::ExecuteImpl(FRHIComputeCommandList& RHICmdList) override
{
// 執行Lambda.
ExecuteLambda(static_cast<TRHICommandList&>(RHICmdList));
}
// 3. epilogue
void FRDGBuilder::ExecutePassEpilogue(FRHIComputeCommandList& RHICmdListPass, FRDGPass* Pass)
{
QUICK_SCOPE_CYCLE_COUNTER(STAT_FRDGBuilder_ExecutePassEpilogue);
const ERDGPassFlags PassFlags = Pass->GetFlags();
// 呼叫命令佇列的EndRenderPass.
if (EnumHasAnyFlags(PassFlags, ERDGPassFlags::Raster) && !EnumHasAnyFlags(PassFlags, ERDGPassFlags::SkipRenderPass) && !Pass->SkipRenderPassEnd())
{
static_cast<FRHICommandList&>(RHICmdListPass).EndRenderPass();
}
// 放棄資源轉換.
for (FRHITexture* Texture : Pass->TexturesToDiscard)
{
RHIDiscardTransientResource(Texture);
}
// 獲取(Acquire)轉換資源.
for (FRHITexture* Texture : Pass->TexturesToAcquire)
{
RHIAcquireTransientResource(Texture);
}
const FRDGParameterStruct PassParameters = Pass->GetParameters();
// 提交用於圖形管線的尾聲屏障.
if (Pass->EpilogueBarriersToBeginForGraphics)
{
Pass->EpilogueBarriersToBeginForGraphics->Submit(RHICmdListPass);
}
// 提交用於非同步計算的尾聲屏障.
if (Pass->EpilogueBarriersToBeginForAsyncCompute)
{
Pass->EpilogueBarriersToBeginForAsyncCompute->Submit(RHICmdListPass);
}
}
由上可知,執行期間,會先編譯所有Pass,然後依次執行Pass的前序、主體和後續,相當於將命令佇列的BeginRenderPass、執行渲染程式碼、EndRenderPass分散在它們之間。Pass執行主體實際很簡單,就是呼叫該Pass的Lambda例項,傳入使用的命令佇列例項。
執行的最後階段是清理,見下面的分析:
void FRDGBuilder::Clear()
{
// 清理外部資源.
ExternalTextures.Empty();
ExternalBuffers.Empty();
// 清理提取資源.
ExtractedTextures.Empty();
ExtractedBuffers.Empty();
// 清理主體資料.
Passes.Clear();
Views.Clear();
Textures.Clear();
Buffers.Clear();
// 清理統一緩衝區和分配器.
UniformBuffers.Clear();
Allocator.ReleaseAll();
}
11.3.5 RDG機制總結
UE的RDG體系預設執行於渲染執行緒,雖然會合並具有相同RT的RDG Pass,但不意味著它們會被並行地執行,而是被序列地執行。在普通情況下,每個Pass執行的末期不會立即提交併等待GPU完成,但如果是除錯模式且非非同步計算,則會。
FRDGBuilder並沒有全域性唯一的例項,通常是將它宣告為區域性變數,在一定生命週期內完成Pass的收集、編譯和執行的整套流程。宣告FRDGBuilder例項的模組有:距離場、渲染紋理、場景渲染器、場景捕捉器、光線追蹤、後處理、毛髮、虛擬紋理等等。
FRDGBuilder的執行週期可劃分為4個階段:收集Pass、編譯Pass、執行Pass和清理。
收集Pass階段,主要是收集渲染模組的所有能夠產生RHI渲染指令的Pass(Lambda),收集之後並非立即執行,將被延遲執行。AddPass的步驟是先建立FRDGPass的例項,並加入到Pass列表,隨後執行SetupPass。SetupPass的過程主要是處理紋理和緩衝區的狀態、引用、依賴和標記等。
編譯Pass階段,則比較複雜,步驟甚多。主要包含構建生產者和消費者的依賴關係,確定Pass的裁剪等各類標記,調整資源的生命週期,裁剪Pass,處理Pass的資源轉換和屏障,處理非同步計算Pass的依賴和引用關係,查詢並建立分叉和合並Pass節點,合併所有具體相同渲染目標的光柵化Pass等步驟。
執行Pass階段,首先會執行編譯,再根據編譯結果執行所有符合條件的Pass。執行單個Pass時依次執行前序、主體和後續,相當於執行命令佇列的BeginRenderPass、執行Pass主體(Lambda)渲染程式碼、EndRenderPass。執行Pass主體時過程簡潔,就是呼叫該Pass的Lambda例項。
最後便是清理階段,將清理或重置FRDGBuilder例項內的所有資料和記憶體。
在FRDGBuilder執行的整個過程中,和直接使用RHICommandList相比,FRDGBuilder的特性和優化措施如下:
-
RDG Pass引用的資源都應該由RDG分配或管理,即便是外部註冊的資源,也應該在RDG期間保證生命週期。RDG會自動管理資源的生命週期,延遲它們在交叉、合併Pass期間的生命週期,並在使用完無引用時釋放並重用之。
-
資源的分配並非即時響應,而是在初次被使用時才分配或建立。
-
擁有子資源(Subresource)的概念,通過合理的佈局將它們整合成大的資源塊,可以派發一個子資源到另外一個,也可以自動建立子資源的檢視(View)和別名(Aliase),建立由未來渲染Pass建立的資源別名。進而有效地管理資源的分配、釋放、重用,減少記憶體總體佔用和記憶體碎片,減少CPU和GPU的IO,提升記憶體使用效率。
-
以FRDGBuilder的例項為單位管理RDG Pass,自動排序、引用、分叉和合並Pass,處理Pass的資源引用和依賴,裁剪無用的Pass和資源。RDG也可以正確處理Graphics Pass和Async Compute Pass之間的依賴和引用,將它們有序地按照DAG圖串聯起來,並正確地處理它們的資源交叉使用和狀態轉換。
-
RDG能夠合併RDG Pass的渲染,前提是這些RDG Pass使用了相同的渲染紋理。這樣可以減少RHI層的Begin/EndRenderPass的呼叫,減少RHI渲染命令和資源狀態的轉換。
-
RDG能夠自動處理Pass之間的資源依賴、屏障和狀態轉換,摒棄無效的狀態轉換(如read-to-read、write-to-write),並且可以合併、批量轉換資源的狀態,進一步減少渲染指令的數量。
-
RDG Pass的執行是在渲染執行緒中發生,並且是序列,而沒有使用TaskGraph並行地執行。
理論上是可以並行執行的,不過這只是猜想,是否可行還需要實踐驗證。
-
RDG擁有豐富的除錯模式和資訊,支援即時執行模式,協助開發人員快速定位問題,減少Bug查復時間和難度。
當然,FRDGBuilder也存在一些副作用:
-
在渲染體系增加了一層概念和封裝,提高渲染層的複雜性,增加學習成本。
-
增加開發複雜度,由於是延時執行,有些bug不能得到即時反饋。
-
某些Pass或資源的生命週期可能被額外延長。
來自Frostbite非同步計算示意圖。其中SSAO、SSAO Filter的Pass放入到非同步佇列,它們會寫入和讀取Raw AO的紋理,即便在同步點之前結束,但Raw AO的生命週期依然會被延長到同步點。
11.4 RDG開發
本章主要闡述如何使用UE的RDG系統。
11.4.1 建立RDG資源
建立RDG資源(紋理、緩衝區、UAV、SRV等)的示例程式碼如下:
// ---- 建立RDG紋理示範 ----
// 建立RDG紋理描述
FRDGTextureDesc TextureDesc = Input.Texture->Desc;
TextureDesc.Reset();
TextureDesc.Format = PF_FloatRGBA;
TextureDesc.ClearValue = FClearValueBinding::None;
TextureDesc.Flags &= ~TexCreate_DepthStencilTargetable;
TextureDesc.Flags |= TexCreate_RenderTargetable;
TextureDesc.Extent = OutputViewport.Extent;
// 建立RDG紋理.
FRDGTextureRef MyRDGTexture = GraphBuilder.CreateTexture(TextureDesc, TEXT("MyRDGTexture"));
// ---- 建立RDG紋理UAV示範 ----
FRDGTextureUAVRef MyRDGTextureUAV = GraphBuilder.CreateUAV(MyRDGTexture);
// ---- 建立RDG紋理SRV示範 ----
FRDGTextureSRVRef MyRDGTextureSRV = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateWithPixelFormat(MyRDGTexture, PF_FloatRGBA));
建立紋理等資源前需要建立資源的描述符,而建立資源的UAV和SRV時,可以用之前建立的資源作為例項傳進去,從而達到複用的目的。建立SRV需要將資源例項作為描述符的引數,建立描述符後再建立SRV。
上述程式碼以建立紋理的相關資源為例,緩衝區的建立也類似,不再舉例。
11.4.2 註冊外部資源
上一節的資源由RDG建立和管理,資源的生命週期也由RDG負責。如果我們已有非RDG建立的資源,可以在RDG使用麼?答案是可以,通過FRDGBuilder::RegisterExternalXXX介面可以完成將外部資源註冊到RDG系統中。下面以註冊紋理為例:
// 在RDG外建立RHI資源.
FRHIResourceCreateInfo CreateInfo;
FTexture2DRHIRef MyRHITexture = RHICreateTexture2D(1024, 768, PF_B8G8R8A8, 1, 1, TexCreate_CPUReadback, CreateInfo);
// 將外部建立的RHI資源註冊成RDG資源.
FRDGTextureRef MyExternalRDGTexture = GraphBuilder.RegisterExternalTexture(MyRHITexture);
需要注意的是,外部註冊的資源,RDG無法控制和管理其生命週期,需要保證RDG使用期間外部資源的生命週期處於正常狀態,否則將引發異常甚至程式崩潰。
如果想從RDG資源獲取RHI資源的例項,以下程式碼可達成:
FRHITexture* MyRHITexture = MyRDGTexture.GetRHI();
用圖例展示RHI資源和RDG資源之間的轉換關係:
上述程式碼以註冊紋理的相關資源為例,緩衝區的註冊也類似。
11.4.3 提取資源
上一章RDG機制中已經提到了,RDG收集Pass之後並非立即執行,而是延遲執行(包括資源被延遲建立或分配),這就導致了一個問題:如果想將渲染後的資源賦值給某個變數,無法使用立即模式,需要適配延遲執行模式。這種適配延遲執行的資源提取是通過以下介面來實現的:
- FRDGBuilder::QueueTextureExtraction
- FRDGBuilder::QueueBufferExtraction
使用示例如下:
// 建立RDG紋理.
FRDGTextureRef MyRDGTexture;
FRDGTextureDesc MyTextureDesc = FRDGTextureDesc::Create2D(OutputExtent, HistoryPixelFormat, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
MyRDGTexture = GraphBuilder.CreateTexture(MyTextureDesc, "MyRDGTexture", ERDGTextureFlags::MultiFrame);
// 建立UAV並作為Pass的shader引數.
(......)
PassParameters->MyRDGTextureUAV = GraphBuilder.CreateUAV(MyRDGTexture);
(......)
// 增加Pass, 以便渲染影像到MyRDGTextureUAV.
FComputeShaderUtils::AddPass(GraphBuilder, RDG_EVENT_NAME("MyCustomPass", ...), ComputeShader, PassParameters, FComputeShaderUtils::GetGroupCount(8, 8));
// 入隊提取資源.
TRefCountPtr<IPooledRenderTarget>* OutputRT;
GraphBuilder.QueueTextureExtraction(MyRDGTexture, &OutputRT);
// 對提取的OutputRT進行後續操作.
(......)
不過需要注意的是,由於Pass、資源建立和提取都是被延遲的,意味著提取的資源僅可返回,提供給下一幀使用。
小思考:如果要在本幀使用提取後的資源,增加特殊的無引數Pass對提取後的資源進行操作可行嗎?為什麼?
11.4.4 增加Pass
整個RDG體系執行的單位是RDG Pass,它們的依賴、引用、輸入、輸出都是通過FRDGBuilder::AddPass完成,以下是其中一個示例:
// 建立Pass的shader引數.
FMyPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FMyPS::FParameters>();
PassParameters->InputTexture = InputTexture;
PassParameters->RenderTargets = FRenderTargetBinding(InputTexture, InputTextureLoadAction);
PassParameters->InputSampler = BilinearSampler;
// 處理著色器.
TShaderMapRef<FScreenPassVS> VertexShader(View.ShaderMap);
TShaderMapRef<FMyPS> PixelShader(View.ShaderMap);
const FScreenPassPipelineState PipelineState(VertexShader, PixelShader, AdditiveBlendState);
// 增加RDG Pass.
GraphBuilder.AddPass(
RDG_EVENT_NAME("MyRDGPass"),
PassParameters,
ERDGPassFlags::Raster,
// Pass的Lambda
[PixelShader, PassParameters, PipelineState] (FRHICommandListImmediate& RHICmdList)
{
// 設定視口.
RHICmdList.SetViewport(0, 0, 0.0f, 1024, 768, 1.0f);
// 設定PSO.
SetScreenPassPipelineState(RHICmdList, PipelineState);
// 設定著色器引數.
SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
// 繪製矩形區域.
DrawRectangle(RHICmdList, 0, 0, 1024, 768, 0, 0, 1.0f, 1.0f, FIntPoint(1024, 768), FIntPoint(1024, 768), PipelineState.VertexShader, EDRF_Default);
});
向RDG系統增加的Pass可以是傳統的Graphics Pass,也可以是Compute Shader,還可以是無引數的Pass。RDG Pass和RHI Pass並非一一對應關係,若干個RDG Pass可能合併成一個RHI Pass執行。具體見上一章節11.3.4 FRDGBuilder::Execute。
11.4.5 建立FRDGBuilder
建立和使用FRDGBuilder的程式碼非常簡單,如下所示:
void RenderMyStuff(FRHICommandListImmediate& RHICmdList)
{
// ----建立FRDGBuilder的區域性物件----
FRDGBuilder GraphBuilder(RHICmdList, RDG_EVENT_NAME("GraphBuilder_RenderMyStuff"));
(......)
// ----增加Pass----
GraphBuilder.AddPass(...);
(......)
GraphBuilder.AddPass(...);
(......)
// ----增加資源提取----
GraphBuilder.QueueTextureExtraction(...);
(......)
// ---- 執行FRDGBuilder ----
GraphBuilder.Execute();
}
需要特別指出的是,FRDGBuilder的例項通常都是區域性的,在UE體系中存在若干個FRDGBuilder的例項,主要用於比較獨立的模組,例如場景渲染器、後處理、光線追蹤等等模組。
FRDGBuilder執行實際有三個步驟:收集Pass、編譯Pass、執行Pass,不過FRDGBuilder::Execute已經包含了編譯和執行Pass,所以我們不再需要顯示地呼叫FRDGBuilder::Compile介面。
11.4.6 RDG除錯
RDG系統存在一些控制檯命令,其名稱和描述如下:
控制檯變數 | 描述 |
---|---|
r.RDG.AsyncCompute | 控制非同步計算策略:0-禁用;1-為非同步計算Pass啟用標記(預設);2-開啟所有使用compute命令列表的計算通道。 |
r.RDG.Breakpoint | 當滿足某些條件時,斷點到偵錯程式的斷點位置。0-禁用,1~4-不同的特殊除錯模式。 |
r.RDG.ClobberResources | 在分配時間用指定的清理顏色清除所有渲染目標和紋理/緩衝UAV。用於除錯。 |
r.RDG.CullPasses | RDG是否開啟裁剪無用的Pass。0-禁用,1-開啟(預設)。 |
r.RDG.Debug | 允許輸出在連線和執行過程中發現的效率低下的警告。 |
r.RDG.Debug.FlushGPU | 開啟每次Pass執行後重新整理指令到GPU。當設定(r.RDG.AsyncCompute=0)時禁用非同步計算。 |
r.RDG.Debug.GraphFilter | 將某些除錯事件過濾到特定的圖中。 |
r.RDG.Debug.PassFilter | 將某些除錯事件過濾到特定的Pass。 |
r.RDG.Debug.ResourceFilter | 將某些除錯事件過濾到特定的資源。 |
r.RDG.DumpGraph | 將多個視覺化日誌轉儲到磁碟。0-禁用,1-顯示生產者、消費者Pass依賴,2-顯示資源狀態和轉換,3-顯示圖形、非同步計算的重疊。 |
r.RDG.ExtendResourceLifetimes | RDG將把資源生命週期擴充套件到圖的全部長度。會增加記憶體的佔用。 |
r.RDG.ImmediateMode | 在建立Pass時執行Pass。當在Pass的Lambda中崩潰時,連線程式碼的呼叫堆疊非常有用。 |
r.RDG.MergeRenderPasses | 圖形將合併相同的、連續的渲染通道到一個單一的渲染通道。0-禁用,1-開啟(預設)。 |
r.RDG.OverlapUAVs | RDG將在需要時重疊UAV的工作。如果禁用,UAV屏障總是插入。 |
r.RDG.TransitionLog | 輸出資源轉換到控制檯。 |
r.RDG.VerboseCSVStats | 控制RDG的CSV分析統計的詳細程度。0-為圖形執行生成一個CSV配置檔案,1-為圖形執行的每個階段生成一個CSV檔案。 |
除了以上列出的RDG控制檯,還有一些命令可以顯示RDG系統執行過程中的有用資訊。
vis
列出所有有效的紋理,輸入之後可能顯示如下所示的資訊:
VisualizeTexture/Vis <CheckpointName> [<Mode>] [PIP/UV0/UV1/UV2] [BMP] [FRAC/SAT] [FULL]:
Mode (examples):
RGB = RGB in range 0..1 (default)
*8 = RGB * 8
A = alpha channel in range 0..1
R = red channel in range 0..1
G = green channel in range 0..1
B = blue channel in range 0..1
A*16 = Alpha * 16
RGB/2 = RGB / 2
SubResource:
MIP5 = Mip level 5 (0 is default)
INDEX5 = Array Element 5 (0 is default)
InputMapping:
PIP = like UV1 but as picture in picture with normal rendering (default)
UV0 = UV in left top
UV1 = full texture
UV2 = pixel perfect centered
Flags:
BMP = save out bitmap to the screenshots folder (not on console, normalized)
STENCIL = Stencil normally displayed in alpha channel of depth. This option is used for BMP to get a stencil only BMP.
FRAC = use frac() in shader (default)
SAT = use saturate() in shader
FULLLIST = show full list, otherwise we hide some textures in the printout
BYNAME = sort list by name
BYSIZE = show list by size
TextureId:
0 = <off>
LogConsoleResponse: 13 = (2D 1x1 PF_DepthStencil) DepthDummy 1 KB
LogConsoleResponse: 18 = (2D 976x492 PF_FloatRGBA RT) SceneColor 3752 KB
LogConsoleResponse: 19 = (2D 128x32 PF_G16R16) PreintegratedGF 16 KB
LogConsoleResponse: 23 = (2D 64x64 PF_FloatRGBA VRam) LTCMat 32 KB VRamInKB(Start/Size):<NONE>
LogConsoleResponse: 24 = (2D 64x64 PF_G16R16F VRam) LTCAmp 16 KB VRamInKB(Start/Size):<NONE>
LogConsoleResponse: 26 = (2D 976x492 PF_FloatRGBA UAV) SSRTemporalAA 3752 KB
LogConsoleResponse: 27 = (2D 976x492 PF_FloatR11G11B10 RT UAV) SSGITemporalAccumulation0 1876 KB
LogConsoleResponse: 29 = (2D 976x492 PF_R32_UINT RT UAV) DenoiserMetadata0 1876 KB
LogConsoleResponse: 30 = (2D 976x492 PF_FloatRGBA RT UAV VRam) SceneColorDeferred 3752 KB VRamInKB(Start/Size):<NONE>
LogConsoleResponse: 31 = (2D 976x492 PF_DepthStencil VRam) SceneDepthZ 2345 KB VRamInKB(Start/Size):<NONE>
LogConsoleResponse: 37 = (3D 64x64x16 PF_FloatRGBA UAV) HairLUT 512 KB
LogConsoleResponse: 38 = (3D 64x64x16 PF_FloatRGBA UAV) HairLUT 512 KB
LogConsoleResponse: 39 = (2D 64x64 PF_R32_FLOAT UAV) HairCoverageLUT 16 KB
LogConsoleResponse: 47 = (2D 98x64 PF_A16B16G16R16) SSProfiles 49 KB
LogConsoleResponse: 48 = (2D 256x64 PF_FloatRGBA RT) AtmosphereTransmittance 128 KB
LogConsoleResponse: 49 = (2D 64x16 PF_FloatRGBA RT) AtmosphereIrradiance 8 KB
LogConsoleResponse: 50 = (2D 64x16 PF_FloatRGBA RT) AtmosphereDeltaE 8 KB
LogConsoleResponse: 51 = (3D 256x128x2 PF_FloatRGBA RT) AtmosphereInscatter 512 KB
LogConsoleResponse: 52 = (3D 256x128x2 PF_FloatRGBA RT) AtmosphereDeltaSR 512 KB
LogConsoleResponse: 53 = (3D 256x128x2 PF_FloatRGBA RT) AtmosphereDeltaSM 512 KB
LogConsoleResponse: 54 = (3D 256x128x2 PF_FloatRGBA RT) AtmosphereDeltaJ 512 KB
LogConsoleResponse: 55 = (2D 1x1 PF_A32B32G32R32F RT UAV) EyeAdaptation 1 KB
LogConsoleResponse: 56 = (3D 32x32x32 PF_A2B10G10R10 RT VRam) CombineLUTs 128 KB VRamInKB(Start/Size):<NONE>
LogConsoleResponse: 68 = (2D 976x492 PF_R8G8 RT UAV) SSGITemporalAccumulation1 938 KB
LogConsoleResponse: 89 = (2D 976x246 PF_R32_UINT RT UAV) QuadOverdrawBuffer 938 KB
LogConsoleResponse: 91 = (2D 976x492 PF_FloatRGBA RT UAV) LightAccumulation 3752 KB
LogConsoleResponse: 92 = (Cube[2] 128 PF_FloatRGBA) ReflectionEnvs 2048 KB
LogConsoleResponse: 93 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolumeDir2 2048 KB
LogConsoleResponse: 95 = (2D 1x1 PF_A32B32G32R32F RT UAV) EyeAdaptation 1 KB
LogConsoleResponse: 96 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolume2 2048 KB
LogConsoleResponse: 97 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolumeDir1 2048 KB
LogConsoleResponse: 98 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolume1 2048 KB
LogConsoleResponse: 99 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolumeDir0 2048 KB
LogConsoleResponse: 101 = (3D 64x64x64 PF_FloatRGBA RT UAV) TranslucentVolume0 2048 KB
LogConsoleResponse: 102 = (2D 976x492 PF_G8 RT UAV) ScreenSpaceAO 469 KB
LogConsoleResponse: 106 = (2D 488x246 PF_DepthStencil) SmallDepthZ 1173 KB
LogConsoleResponse: 107 = (2D 1x1 PF_A32B32G32R32F RT UAV) EyeAdaptation 1 KB
LogConsoleResponse: CheckpointName (what was rendered this frame, use <Name>@<Number> to get intermediate versions):
LogConsoleResponse: Pool: 43/112 MB (referenced/allocated)
11.5 本篇總結
本篇主要闡述了UE的RDG的、基礎概念、使用方法、渲染流程和主要機制,使得讀者對RDG有著大致的理解,至於更多技術細節和原理,需要讀者自己去研讀UE原始碼發掘。有很多RDG的使用細節本篇沒有涉及,可以閱讀官方的RDG 101: A Crash Cours彌補。
11.5.1 本篇思考
按慣例,本篇也佈置一些小思考,以助理解和加深RDG的掌握和理解:
- RDG的步驟有哪些?每個步驟的作用是什麼?各有什麼特點?
- RDG的資源和RHI的資源有什麼區別和聯絡?如何相互轉換?
- 利用RDG實現自定義的CS和PS繪製程式碼。
特別說明
- 感謝所有參考文獻的作者,部分圖片來自參考文獻和網路,侵刪。
- 本系列文章為筆者原創,只發表在部落格園上,歡迎分享本文連結,但未經同意,不允許轉載!
- 系列文章,未完待續,完整目錄請戳內容綱目。
- 系列文章,未完待續,完整目錄請戳內容綱目。
- 系列文章,未完待續,完整目錄請戳內容綱目。