剖析虛幻渲染體系(06)- UE5特輯Part 1(特性和Nanite)

0嚮往0發表於2021-06-24

 

 

6.1 本篇概述

早在2020年5月,虛幻官方放出了一個展示虛幻5代渲染特性的視訊Lumen in the Land of Nanite,視訊展示了基於虛擬微多邊形幾何體的Nanite和實時全域性光照的Lumen技術,給實時遊戲帶來了影視級的視聽體驗。

當時的虛幻官方承諾在2021年上半年放出UE5預覽版,果然守信如斯沒有食言,在2021年5月下旬成功釋出預覽版UE5 Early Access(EA)。於是,我們可以研究UE5的編輯器、工具鏈、新的渲染特性,以及對應的UE5 EA版原始碼和隨同釋出的資源工程AncientGame。

UE5編輯器一覽,主場景是隨同UE5 EA釋出的工程AncientWorld。

6.1.1 本篇內容

本篇主要根據UE5 Early Access(EA)版本闡述UE5的以下內容:

  • UE5基礎和入門。
  • UE5新特性。
  • UE5渲染體系變動。
  • Nanite虛擬微幾何技術。
  • Lumen實時全域性動態光照技術。
  • 其它UE5相關的技術。

為了限制篇幅的長度,將分為兩部分,第一部分講述編輯器、新特性和Nanite技術,第二部分講述Lumen、其它渲染技術和總結。

6.1.2 基礎概念

本篇涉及的部分渲染基礎概念及解析如下表:

概念 縮寫 中文譯名 解析
Lumen - 流明 UE5的實時全域性光照技術。
Nanite - 奈米機器人 UE5的虛擬微多邊形技術。

 

6.2 UE5新特性

本節將闡述UE5的安裝、編輯器,以及與UE4不一樣的新特性。

6.2.1 UE5編輯器

6.2.1.1 下載編輯器及資源

  • 下載UE5引擎

第一步,更新Epic Game Launcher,重啟它。

第二步,點選UE5頁面,點選“下載搶先體驗版”。

第三步,在“庫”頁面,找到5.0.0按鈕,點選下載。

  • 下載AncientGame工程

切到UE5頁面,往下拉倒最底,點選演示UE5新功能的示例專案中的獲取示例按鈕:

  • 下載EA原始碼

開啟UE5頁面,點選“訪問原始碼”按鈕:

或者直接開啟頁面5.0 Early Access,下載Source Code(zip)或Source Code(tar.gz),解壓之後就可以按照UE4的流程進行設定和編譯。

6.2.1.2 啟動示例工程

利用下載好或者編譯好的UE5編輯器開啟AncientGame工程,若順利啟動,會出現如下提示頁面:

按Ctrl+Space彈出Content Drawer,進入AncientContent/Maps,開啟AncientWorld的地圖:

開啟關卡後,螢幕會出現一片黑,不要慌,正常現象,那是因為UE5在正常渲染場景前需要執行很多資料預處理:

等待漫長的資料處理結束後,就可以預覽到AncientWold的主場景了:

6.2.1.3 編輯器功能區

UE5的編輯器主介面相較UE4,排版和UI風格都有了明顯的變化。UI風格變得扁平化,更像DCC工具,排版上突出了關卡編輯區域,縮小如元件、內容瀏覽器等區域的佔用:

如上圖,每個區域的功能如下:

1、選單欄,和UE4類似。

2、元件新增、內容搜尋、關卡藍圖、Sequence等工具元件。

3、地圖筆刷編輯模式。

4、播放和預覽。

5、設定,包含世界、工程和外掛等設定。

6、世界分割槽(World Partition)、資料層(Data Layer)等頁面。

7、關卡Actor列表及屬性皮膚,和UE4類似。

8、後臺任務狀態和版本控制。

9、內容瀏覽器和命令列工具。內容瀏覽器通過快捷鍵Ctrl+Space可以快速顯示、隱藏。命令列不需要像UE4那樣需要按~鍵了,更加方便設定控制檯變數,提升除錯效率。

10、關卡編輯主視窗,它的具體功能和UE4類似,但也有不一樣的地方。比如Lit增加了Nanite可視組,用以顯示虛擬微多邊形技術的相關資訊:

關卡編輯的Lit模式增加了Ninite視覺化組,背景的噪點不是bug,是顯示了Nanite的三角形模式。

6.2.2 新渲染特性

本小節將闡述UE5的新渲染特性。

6.2.2.1 Nanite虛擬微多邊形

Nanite意為奈米機器人,UE5用它來作為新一代的網格處理著色技術命名,意圖明顯,就是替換和升級傳統以網格LOD為粒度的剔除和光柵化著色技術,利用極小粒度處理網格和三角形。

UE5的Nanite全稱是Nanite Virtualized Geometry(Nanite虛擬微幾何,Nanite虛擬微多邊形),它支援自動化處理高精度的網格模型,支援畫素級別的三角形的高細節表面和海量物體。它只會在合適的層級處理需要且僅需的資料,防止表面細節丟失,或者處理過多的資料。Nanite在渲染前會對網格、紋理、動畫等資料執行很多預處理,儲存在高度壓縮和細粒度的二進位制流中,並且自動處理它們的LOD。

AncientGame示例工程的Nanite微多邊形技術一覽。左上:AncientGame的神廟;右上:左上對應的微多邊形視覺化;左下和右下分別是工程的Boss和山體細節。

UE5種開啟網格的Nanite技術有3種方式:

  • 匯入靜態網格時,可在設定介面開啟。

  • 可在網格編輯器的網格屬性皮膚設定。

  • 通過資源瀏覽器的右鍵選單可批量開啟。

啟用Nanite技術後,可獲得諸多益處:

  • 相比UE4,支援實時處理和渲染多了數量級的幾何體複雜性、更高的三角形和物體數量。
  • 實時的幀效能不再受多邊形數、繪製呼叫和網格記憶體使用的限制。
  • (可能)支援直接匯入從ZBrush等DCC軟體製作的高精度模型。
  • 直接使用高精度的網格,而不是使用預先烘焙的法線貼圖。
  • LOD自動化處理,不再需要手動額外設定和處理網格LOD。
  • LOD過渡中,不存在或僅少量的質量損失。

Nanite網格和傳統的靜態網格類似,本質上仍然是一個三角形網格,但其核心不同點是大量的細節和高度的資料壓縮。最重要的是,Nanite使用了一個全新的系統,以一種極其有效的方式渲染資料格式。

傳統的靜態模型需要設定標記去啟用Nanite技術。Nanite網格可以支援多組UV和頂點色。材質被分配到網格的一部分,因此這些材質可以使用不同的著色模型和動態效果(可在shader中執行)。材質分配可以被動態交換,就像任何其它靜態網格,Nanite不需要額外處理來烘焙材質。

由於Nanite有著更好的渲染效能,更少的記憶體和硬碟佔用,所以儘可能地開啟靜態網格的Nanite屬性。靜態網格為了更好地利用Nanite技術,最好滿足以下幾點:

  • 包含海量三角形或者三角形在螢幕佔著很小的尺寸。
  • 在同一場景中擁有很多例項。
  • 可以作為遮擋其它Nanite幾何體的遮擋體。

目前版本,Nanite支援以下元件型別:

  • 靜態網格(Static Mesh)
  • 例項化靜態網格(Instanced Static Mesh)
  • 層級例項化靜態網格(Hierarchy Instanced Static Mesh)
  • 幾何資料集(Geometry Collection)

Nanite【不】支援的帶動畫的型別包括但不限於:

  • 骨骼動畫(Skeletal animation)
  • 形變目標(Morph Targets,Blend Shape)
  • 帶世界位置偏移的材質(World Position Offset in materials)
  • 樣條網格(Spline meshes)

此外,Nanite還【不】支援以下特性:

  • 自定義深度模板(Custom depth or stencil)
  • 繪製例項頂點資料(Vertex painting on instances)

需要注意的是,Nanite網格的逐頂點切線並不是像傳統靜態網格那樣儲存在網格資料中(官方文件解釋是為了減少資料尺寸),因此,切線是在畫素著色器動態計算出來的。Nanite由於切線空間和傳統方式存在著使用上的差異,可能會導致邊緣處的不連續。

Nanite也【不】支援有著以下配置的材質:

  • 除Opaque之外的Blend Mode。
    • Masked和半透明的都不支援。
  • 延遲貼花。
  • 線框模式。
  • 畫素深度偏移。
  • 世界位置偏移。
  • 自定義逐例項資料。
  • 雙面的材質。

Nanite無法正常渲染使用了以下特性的材質(可能會消失):

  • Vertex Interpolator節點。
  • 自定義UV。

Nanite【不】支援以下渲染特性:

  • 使用了以下檢視相關的物體過濾:

    • 最小螢幕半徑
    • 距離裁剪
    • FPrimitiveSceneProxy::IsShown()的過濾
    • 帶有以下特性的場景捕捉(Scene Capture):
      • 隱藏元件
      • 隱藏Actor
      • 僅顯示的元件
      • 僅顯示的Actor
  • 前向渲染

  • VR模式的Stereo渲染

  • 分屏

  • MSAA

  • 燈光通道

  • 針對擁有完整細節的Nanite網格的光線追蹤

  • 部分視覺化模式

Nanite支援擁有最新驅動的以下GPU的PlayStation 5、Xbox Series S|X、PC等平臺:

  • NVIDIA: Maxwell或更新的顯示卡
  • AMD: GCN或更新的顯示卡

Nanite為了監測效能,需關注以下幾點:

  • 聚合幾何體(Aggregate Geometry):Aggregate Geometry將許多微小的、不相連的東西在遠處合成一個體積(Volume),比如頭髮、葉子、草。它會影響LOD和遮擋剔除技術。

  • 緊密堆積的表面(Closely Stacked Surfaces):Nanite會將那些靠近檢視最頂層表面的物體合併起來,會將堆積在一起的物體都繪製出來,而不考慮它們之間的遮擋和隱藏關係。

    上圖是AncientGame的正常畫面,下圖是對應的使用了Closely Stacked Surfaces的例項視覺化。黑色部分是因為Nanite不支援動態角色導致。

    大多數情況下,Closely Stacked Surfaces會降低繪製呼叫,但某些情形可能帶來相反效果。移動攝像機在Overdraw視覺化模式下可以顯示這些堆疊的表面是如何被渲染的:

  • 分面和硬邊法線(Faceted and Hard-edge Normals):理想情況下,網格的頂點數應少於三角形數。如果頂點數和三角形的比率為 2:1 或更高,則可能存在問題,尤其是在三角形計數很高的情況下。比率為 3:1 意味著網格是完全分面的,其中每個三角形都有自己的三個頂點,沒有一個與另一個三角形共享,通常這是因為它們沒有平滑導致的法線不相同。下面兩圖展示了多面和平滑法線的異同:

    上圖左側使用了分面法線,上圖右側使用了平滑組法線。

    上圖左側使用了分面法線的Nanite三角形視覺化,上圖右側使用了平滑組法線的Nanite三角形視覺化。由此可知使用平滑組的法線會採用更少的三角形繪製。

除了以上提到的幾種視覺化模式,UE5還提供了其它多種視覺化方式,通過Overview模式可以統覽所有資料:

UE5提供了Nanite的多種視覺化模式,分別顯示不同資料。

開啟Nanite的Overview會顯示所有視覺化模式縮圖。

利用控制檯變數Nanitestats可以實時檢視當前畫面的Nanite統計資料:

右側顯示了Nanite的裁剪資料和幾何資料。

6.2.2.2 Lumen全域性動態光照

Lumen是UE5的全動態全域性光照和反射系統,它是UE5的預設全域性光照和反射方式。Lumen可以在從毫米級到公里級的大範圍、帶細節的環境中呈現無限反彈和間接鏡面反射的漫反射。

為了開啟Lumen,需要在工程設定中開啟以下選項(預設都已開啟):

開啟Lumen之後,Lumen GI會代替SSGI和DFAO,Lumen反射會代替SSR,並且靜態光照會被禁用,所有光照圖都被隱藏。

Lumen支援的渲染特性如下:

  • Lumen全域性光照

Lumen GI解決的是場景物體的非直接光照部分,例如直接光的畫素會影響附近的畫素,這種現象也被稱為色彩溢位(Color Bleeding)。同時,由於網格會遮擋和吸收部分非直接光,Lumen也能正確處理非直接光的陰影遮擋。

Lumen全域性光照能夠實時動態地處理非直接光的光照和陰影效果。

Lumen實現了全解析度的法線細節,同時用較低解析度計算間接照明以達到實時渲染的目的。

  • Lumen天空光照

Lumen在Final Gather階段解決天空光照,使得戶內和戶外的天空光有著明顯的區別,戶內更暗。此外,Lumen的天空光還支援低質量的透明光照和體積霧的GI效果。

  • Lumen發光材質

發光材質通過Lumen的Final Gather完成光照傳播,而沒有額外的消耗。但也對發光材質的輻射區域大小和亮度都有限制,否則會引起噪點瑕疵。

  • Lumen反射

Lumen為所有範圍粗糙度的材質解決了間接鏡面反射效果。

此外,Lumen也支援清漆材質的反射。

在EA版本,Lumen對光源特性的支援說明如下:

  • 支援所有光源型別,包含定向光、點光源、聚光燈、矩形光、天空光。
  • 【僅】支援定向光的光照函式。
  • 【不】支援光源的靜態屬性,因為開啟Lumen後會禁用靜態光和光照圖。

在工程設定和後處理體積中,可以設定Lumen的諸多引數,如軟光追模式、細節追蹤模式、全域性追蹤模式、硬體追蹤模式以及GI和反射。

6.2.2.3 虛擬陰影圖

虛擬陰影圖(VirtualShadowMap,VSM)是一種新的陰影投射方法,用於提供一致的、高解析度的陰影、與電影質量的資產和大型開放世界的動態照明。

在工程設定中的陰影圖方法中開啟VSM(預設已開啟):

VSM開啟後,會替換傳統的陰影技術,包含固定預計算陰影、距離場陰影、預覽陰影、逐物體陰影、級聯陰影、移動動態陰影等。

VSM開啟後,陰影圖光線追蹤(Shadow Map Ray Tracing ,SMRT)便可以利用其來實現很多更精準和清晰的陰影及相關特性,包含半影、軟陰影、接觸硬陰影等。

利用SMRT技術,點光源實現了軟陰影和接觸硬陰影的特性。

左:PCF會模糊並刪除表面重要細節;右:SMRT提供更可信的軟陰影和接觸硬陰影。

我們都知道,在傳統的渲染中,為了優化定向光的陰影,會採用CSM。與之類似,UE5會為定向光采用剪輯圖(clipmap)技術。

一個單獨的虛擬陰影貼圖並不能提供足夠的解析度來覆蓋更大的區域。定向光使用了圍繞攝像機擴充套件範圍的剪輯圖結構,每個剪輯圖級別都有自己的16K VSM。每個剪貼圖級別的解析度是相同的,但覆蓋的半徑是前一個級別的兩倍。

左:剪輯圖視覺化;右:VSM頁視覺化。

聚光燈採用了單個16k帶mip鏈的VSM處理陰影的LOD,而不是clipmap;點光源使用了cube map,每個面擁有16k VSM,共6個。

上:聚光燈效果;下:對應的單個VSM視覺化。

為了陰影渲染效能優化,UE5採用了快取、粗糙頁(Coarse Pages)等技術。

6.2.2.4 時間超解析度

時間超解析度(Temporal Super Resolution,TSR)是新一代的時間抗鋸齒演算法,用來替換傳統的TAA。它支援的特性如下:

  • 利用1080p的低解析度輸入,輸出接近原生4K渲染質量,允許更高的幀率和更好的渲染保真度。
  • 在高頻背景下更少的鬼影。
  • 減少高複雜性幾何物體上的閃爍。
  • 支援執行於Shader Model 5相容的硬體: D3D11、D3D12、Vulkan、PS5、XSX,Metal暫未支援。
  • 著色器專門為PS5和XSX的GPU架構進行優化。

時間超解析度可以在工程配置中開啟或關閉,預設情況下,UE5已經開啟了此技術。

左:4K原生解析度渲染,幀率是20.57;右:利用時間超解析度技術將1080p輸出4K畫質,幀率提升至44.22。

6.2.2.5 移動端渲染

UE5針對移動端改進了部分渲染模組,包含:

  • 移動端渲染器採用了RDG(渲染依賴圖)。
  • 支援了Distance Field Ambient Occlusion(距離場AO)和Global Distance Fields(全域性距離場)。
  • DirectX Shader Compiler(DXC)成為安卓Vulkan的預設shader編譯器。此外,還增加了對OpenGL ES的DXC支援。
  • 加強了移動端延遲渲染器。改進了部分模組的穩定性和效能,包括IBL、延遲貼花、IES光照,以及其它匹配PC級別品質的光照特性。此外,使用了更少的著色器排列。

6.2.3 其它新特性

6.2.3.1 世界分割槽

世界分割槽(World Partition)是一種新的資料管理和流式系統,在編輯器和執行時都可以使用,它完全消除了手動將世界劃分成無數子級別來管理流資料和減少資料爭用的要求。

使用世界分割槽,世界作為一個單一的持久關卡存在。在編輯器中,世界被分割成一個個格子,可以根據感興趣的區域部分地載入地圖資料。當烘焙或啟動PIE時,世界被劃分為針對執行時流優化的網格單元,從而成為獨立的流關卡。

可以在選單欄Window/World Partition開啟世界分割槽編輯器,拖動左鍵快速選擇指定範圍的所有格子,右擊可彈出操作選單,包含載入選中的格子、解除安裝選中的格子、移動攝像機到當前格子:

此外,世界還支援資料層(Data Layer)、HLOD(Hierarchical Level of Detail)、關卡例項化(Level Instancing)、一個Actor一個檔案(One File Per Actor)等功能。

6.2.3.2 動畫

UE5的動畫模組增加了全身IK(Full Body IK)、控制繫結(Control Rig)、運動變形(Motion Warping)、動畫工具指令碼以及在Sequencer方面的支援。

Full Body IK示意圖。

Control Rig效果示意圖。

Motion Warping效果示意圖。

6.2.3.3 物理

UE5的物理效果在新特性中也非常搶眼,主要有以下特性:

  • Chaos

Chaos是UE5的輕量級物理模擬解決方案,是為了滿足下一代遊戲的需求而從頭開始建造的。它支援的主要特性有:

1、動態剛體(Rigid Body Dynamics)

2、剛體動畫節點和布料物理(Rigid Body Animation Nodes and Cloth Physics)

3、破壞(Destruction)

4、布娃娃物理(Ragdoll Physics)

5、車輛(Vehicles)

6、物理場(Physics Fields)。物理場系統使使用者可以在執行時在特定的空間區域直接影響Chaos物理模擬。這些場可以配置為以各種方式影響物理模擬,例如對剛體施加力,破壞幾何集合簇(Geometry Collection Clusters),錨定或禁用斷裂的剛體。

7、流體模擬(Fluid Simulation)

8、頭髮模擬(Hair Simulation)

6.2.3.4 GamePlay

GamePlay框架增加了遊戲和模組邏輯、資料登錄檔(Data Registries)、增加的輸入系統。

資料登錄檔編輯器。

6.2.3.5 效能和平臺管理

  • Memory Analysis Tools for Unreal Insights

UE5通過Unreal Insights的Memory Insights模組改進了記憶體追蹤和除錯支援。可支援以下特性:

1、檢視會話期間任意給定時間內所有已分配記憶體的快照。

2、在兩個不同的時間比較所有已分配記憶體的快照。

3、檢視每個記憶體分配的呼叫堆疊。

4、鑑定長期和短期(或臨時)的記憶體分配。

5、查詢記憶體洩漏。

Unreal Insights的Memory Insights編輯器一覽。

  • 平臺工具鏈

提高了mac上iOS遠端構建過程的可靠性,並新增了一個跨平臺庫,以提高通過USB與iOS裝置互動的可靠性。

此外,還有增加的音訊音效、重新設計的VR模板、Unreal Turnkey等等。

 

6.3 UE5渲染體系變化

本章將對比UE4.26版本的原始碼,系統地闡述UE5 EA版本原始碼和UE4.26的不同。為了方便對比,使用Beyond Compare的資料夾對比功能:

兩個版本的原始碼差異還蠻大,不過本章後續章節將專注在基礎模組、渲染體系、Shader等方面的差異,主要集中在以下資料夾:

  • Engine\Source\Runtime\Core
  • Engine\Source\Runtime\CoreUObject
  • Engine\Source\Runtime\D3D12RHI
  • Engine\Source\Runtime\Engine
  • Engine\Source\Runtime\MeshDescription
  • Engine\Source\Runtime\Renderer
  • Engine\Source\Runtime\RHI
  • Engine\Source\Runtime\RHICore
  • Engine\Shaders

6.3.1 Core和CoreUObject

  • Core:
    • 修改和增加Algo、IO、Async、Hash、Memory、Math、Container、String、Serialization、UObject等基礎模組。
    • 修改FileCache、HAL、Logging、Misc、Modules、Stats等支援模組。
    • 修改Android、Apple、IOS、HoloLens、Mac、Unix、Windows等系統模組。
  • UObject:
    • 大量修改和完善了UObject的各個屬性模組,包含Class、Obj、Package、Property及關聯模組。
    • 增加ObjectHandle、ObjectPathId、PackageResourceManager、PropertyClassPrt、PropertyObjectPrt等等模組。
  • AssetRegistry:
    • 修改了AssetData的部分邏輯。
    • 改進AssetBundle相關的邏輯,增加AssetDataTagMap概念。
    • 增加AssetDataTagMapSerializationDetails。
  • Package:
    • 增加或完善PackageName的部分介面。
    • 增加PackageAccessTracking、PackageAccessTrackingOps、PackagePath等模組邏輯。
  • Serialization:
    • 修改了序列化的BulkData、ArchiveUObject、Json讀寫等基礎模組。
    • 大量修改了載入、非同步載入、寫入文件等模組。
    • 增加FilePackageStore、MappedName等模組。

由此可見,在核心和基礎模組,基本上都做了大量的改動和重構,涉及方方面面。

6.3.2 RHI和RHICore

  • RHI:

    • 增加部分渲染特性的標記及支援,包含移動端的若干特性。
    • 增加MSAA、BufferUsageFlags、RHIPipeline、EResourceTransitionFlags、Transition等概念和介面。
    • 刪除FRHIVertexBuffer、FRHIIndexBuffer和FRHIStructuredBuffer型別,統一成FRHIBuffer,修改相關介面。
    • 增加Texture2DArray相關介面,刪除舊的RHI各類資源建立介面,統一使用RHICreate。
    • 完善光線追蹤型別和介面,如RayTraceDispatchIndirect、IsRayTracingShaderFrequency等。
    • GlobalUniformBuffer改成了StaticUniformBuffer。
    • ERHIFeatureLevel增加SM6,完善平臺相關的檢測或資料介面。
    • 增加PCD3D_SM6的ShaderPlatform,刪除部分ShaderPlatform,如PS4、XBOXONE_D3D12、OPENGL_ES31_EXT等。
    • FGenericDataDrivenShaderPlatformInfo增加了很多標記及對應的獲取介面。
    • 增加EShaderCodeResourceBindingType、EUniformBufferBindingFlags、FRHITextureCreateInfo、ERHITextureMetaDataAccess、ERHITextureSRVOverrideSRGBType、FRHITextureSRVCreateInfo、FRHITextureUAVCreateInfo、FRHIBufferCreateInfo、FRHIBufferSRVCreateInfo、FRHIBufferUAVCreateInfo等型別和介面。
    • 完善FRHIGraphicsPipelineState、FRHITextureReference、RHIValidation等型別的介面。
    • 增加RHITransientResourceAllocator、RHIValidationTransientResourceAllocator等資源轉換模組。
    • 修改或增加FTransientState、FGlobalUniformBuffers、FPipelineState等型別。
  • RHICore:

    • 新增的模組,包含了RHICoreShader、RHICoreTransientResourceAllocator、RHIPoolAllocator等模組。

6.3.3 Renderer

  • 增加例項化裁剪模組:InstanceCulling、InstanceCullingManager等,包含FInstanceCullingRdgParams、EInstanceCullingMode、FInstanceCullingContext、FInstanceCullingManager等型別,主要用於Nanite技術。

  • 虛擬紋理增加或完善了資料讀寫和FVirtualTextureFeedbackBuffer、RenderPages、RenderPagesStandAlone等介面。

  • 全域性距離場資料(FGlobalDistanceFieldParameterData)增加了Mipmap和VT資料和介面。

  • HairStrand增加EHairBindingType、EHairInterpolationType、FHairStrandsInstance等型別。

  • 增加或完善FVertexFactoryShaderPermutationParameters的型別。

  • MeshPassProcessor:

    • EMeshPass增加VSMShadowDepth、LumenCardCapture、EditorLevelInstance等專用通道。
    • 增加EFVisibleMeshDrawCommandFlags、FCompareFMeshDrawCommands、FMeshPassProcessorRenderState等型別。
    • 增加FSimpleMeshDrawCommandPass型別,用於處理不需要並行處理繪製命令的地方,減少開銷。
    • FVisibleMeshDrawCommand增加EFVisibleMeshDrawCommandFlags以及用於GPUCull的FMeshDrawCommandSortKey、InRunArray等。
  • FDynamicPassMeshDrawListContext的FinalizeCommand階段增加了NewVisibleMeshDrawCommand.Setup階段。

  • 摒棄SetInstancedViewUniformBuffer、SetPassUniformBuffer、GetInstancedViewUniformBuffer等UniformBuffer介面。

  • 新增Nanite模組:

    • 新增NaniteRender:FNaniteCommandInfo、ENaniteMeshPass、FNaniteDrawListContext、FCullingContext、FRasterContext、FRasterResults、FNaniteShader、FNaniteMaterialVS、FNaniteMeshProcessor、FNaniteMaterialTables、ERasterTechnique、ERasterScheduling、EOutputBufferMode、FPackedView等等型別及處理介面。
    • FPrimitiveFlagsCompact增加bIsNaniteMesh標記。
    • FPrimitiveSceneInfo增加NaniteCommandInfos、NaniteMaterialIds、LumenPrimitiveIndex以及CachedRayTracingMeshCommandsHashPerLOD、bRegisteredWithVelocityData、InstanceDataOffset、NumInstanceDataEntries例項化和光追相關的等資料和處理介面。
  • 新增Lumen模組:

    • 涉及的模組非常多,總結起來有DiffuseIndirect、Scene、HardwareRayTracing、Mesh、Probe、Radiance、Radiosity、TranslucencyVolume、Voxel以及資料結構、工具箱等。
  • 替換傳統Shader繫結介面到RDG,如SHADER_PARAMETER_TEXTURE改成SHADER_PARAMETER_RDG_TEXTURE。

  • 增加或完善RuntimeVirtualTextureProducer、FSceneTextureExtracts、EMobileSceneTextureSetupMode、Strata地層。

  • 增強後處理效果,如增加了Temporal Super Resolution(TSR)。

  • 增強了光照追蹤模組,如RTGI、RTAO、RTR、RTShadow、RTSkyLight等。

  • DeferredShadingRenderer:

    • 增加Lumen、IndirectLightRender、SSRT、TranslucencyLightingVolume等相關的模組、型別和步驟。
    • 增加FLumenCardRenderer、TPipelineState、EDiffuseIndirectMethod、EAmbientOcclusionMethod、EReflectionsMethod、FPerViewPipelineState、FFamilyPipelineState等型別。
  • 增加或增強了BasePass、DepthPass、SDF、GPUScene、GlobalDistanceField、BVH、GenerateConservativeDepthBuffer、LightRendering、InrectLightRendering、MeshDrawCommands、Shader、ScreenSpace、Shadow、MobileRender等等渲染模組。

從UE4.26到5.0EA,重要和基礎的渲染模組都做了大大小小的修改。

6.3.4 Engine

  • 新增IRenderCaptureProvider、NaniteResources、NaniteStreamingManager等模組。
  • 新增LevelInstance模組,用於處理關卡例項化的資料讀寫、打包、渲染、編輯器等。
  • 新增ActorReferencesUtils、AssetCompilingManager、AsyncCompilationHelpers、CanvasRender、ComputeKernelCollection、DerivedMeshDataTaskUtils、InstancedStaticMeshDelegates、InstanceUniformShaderParameters、MeshCardRepresentation、NaniteSceneProxy、ObjectCacheContext、StaticMeshCompiler、TextureCompiler等模組,涉及各類資源編譯、例項化、Buffer、Mesh、Nanite等。
  • PrimitiveSceneProxy:
    • 增加LevelInstance、bLevelInstanceEditingState、TranslucencySortDistanceOffset、MeshCardRepresentation等資料和介面。
    • 增加SupportsNaniteRendering、IsNaniteMesh、bSupportsMeshCardRepresentation、IsAlwaysVisible、GetPrimitiveInstances、RayTracingGroupId、RayTracingGroupCullingPriority等資料和介面。
  • SceneInterface和SceneManagement:
    • 增加FInstanceCullingManagerResources等型別,使用FGPUScenePrimitiveCollector代替FPrimitiveUniformShaderParameters。
    • 增加HairStrand、PhysicsField、LumenSceneCard、ComputeFramework等操作介面。
  • SceneView:
    • 增加ComputeNearPlane、GetScreenPositionScaleBias等介面。
    • 增加FViewShaderParameters、GeneralPurposeTweak2、PrecomputedIndirectLightingColorScale、GlobalDistanceField、VirtualTexture、PhysicsField、Lumen、Instance、Page等shader繫結。
  • 增強距離場、GPUSkinCache、Material、StaticMesh、SkeletalMesh、Texture等型別和介面。

6.3.5 Shaders

  • 新增Nanite:

    • 新增了ClusterCulling、Culling、HZBCull、InstanceCulling、MaterialCulling、Shadow、GBuffer、Impsoter、DataDecode、DataPacked、Rasterizer、WritePixel等。
  • 新增Lumen:

    • 與C++類似,涉及的模組非常多,總結起來有DiffuseIndirect、Scene、HardwareRayTracing、Mesh、Probe、Radiance、Radiosity、TranslucencyVolume、Voxel以及資料結構、工具箱等。
    • 新增FinalGather,處理LumenProbe的Hierarchy、Occlusion以及Index、Sample等。
  • 新增Strata:

    • DeferredLighting、EnvironmentLighting、Evaluation、ForwardLighting、Material、等模組。
  • 新增VirtualShadowMap:

    • 新增構建每頁繪製命令的模組。
    • 新增快取管理。
    • 新增Page、Projection、SMRT等模組。
  • 新增InstanceCulling:

    • BuildInstanceDrawCommands模組,提供GPUScene在Compute Shader中動態裁剪和生成繪製指令。
    • CullInstances模組,在Compute Shader中裁剪視野外或被遮擋的圖元。
    • InstanceCullingCommon模組,定義基礎型別和介面。
  • 增強了HairStrands模組,如HairCards、HairScatter、BsdfPlot、ClusterCulling、Shadow、DeepShadow、EnvironmentLighting、GBuffer、Material、Visibility、Voxel等模組。

  • 增強PathTracing、RayTracing、SSD、SSRT、TAA等模組。

  • 刪除了LPV模組。

  • 完善基礎、材質、著色、光照、陰影、專用Pass等模組,如AnisotropyPass、BasePass、MobileBasePass、BRDF、BurleySSS、CapsuleLigh、RectLight、TranslucentLighting、ClusteredDeferredShading、LightGrid、ForwardLighting、DeferredLight、DiffuseIndirect、DistanceFieldAO、DistanceFieldLighting、DistanceFieldShadow、GlobalDistanceField、Math、Decal、GpuSkin、Halton、MonteCarlo、HZB、LocalVertexFactory、MaterialTemplate、Particle、PlanarReflection、PostProcess、Reflection、SceneData、ShadingCommon、ShadingModels、Shadow、Volumetric等。

6.3.6 UE5渲染體系總結

從前面幾個小節可以看出來,最大的改變在於增加了Nanite、Lumen、VSM、InstanceCull、LevelInstance等模組和技術,同時修改了Engine、Render模組的相關型別和介面。

RHI層的變動主要在於將各種頂點、索引Buffer統一成了FRHIBuffer。

Renderer層增強了光線追蹤,特別是螢幕空間的光線追蹤,加強了距離場的各種應用,同時刪除了LPV。

Engine層主要圍繞著RHI、Renderer層的變動做了相應修改和調整。

 

6.4 Nanite

本章將闡述UE5的Nanite虛擬微多邊形的預處理、渲染、優化技術。

在UE5 EA原始碼工程搜尋“Nanite”字眼,發現了195個檔案供3026處匹配:

由於涉及面太廣,當然不可能每個細節都闡述,筆者經過篩查,將集中精力剖析以下模組的Nanite原始碼:

  • Editor的Nanite Mesh構建過程。
  • Engine模組關於Nanite資源的管理、載入、組裝等。
  • Renderer模組關於Nanite的渲染過程、優化技術。
  • Shader中Nanite的渲染步驟、演算法。

6.4.1 Nanite基礎

本節主要闡述Nanite相關的基本概念、型別和基礎知識。

6.4.1.1 FMeshNaniteSettings

// Engine\Source\Runtime\Engine\Classes\Engine\EngineTypes.h

// 陰影圖方法.
namespace EShadowMapMethod
{
    enum Type
    {
        // 傳統的陰影圖. 逐元件裁剪, 在高多邊形場景造成較差的效能.
        ShadowMaps UMETA(DisplayName = "Shadow Maps"),
        // 為陰影渲染幾何體到虛擬深度圖, 用簡單設定便可提供高質量的次世代投影. 與Nanite配合使用時,可實現高效裁剪. 
        VirtualShadowMaps UMETA(DisplayName = "Virtual Shadow Maps (Beta)")
    };
}

// 應用於Nanite資料構建時的配置.
struct FMeshNaniteSettings
{
    // 是否啟用Nanite網格.
    uint8 bEnabled : 1;
    // 位置精度. 步長為2^(-PositionPrecision) cm. MIN_int32表示自動設定.
    int32 PositionPrecision;
    // 從LOD0的三角形百分比. 1.0表示沒有任何減面, 0.0表示沒有三角形.
    float PercentTriangles;

    FMeshNaniteSettings(): bEnabled(false), PositionPrecision(MIN_int32), PercentTriangles(0.0f){}
    FMeshNaniteSettings(const FMeshNaniteSettings& Other);
    
    bool operator==(const FMeshNaniteSettings& Other) const;
    bool operator!=(const FMeshNaniteSettings& Other) const;
};

6.4.1.2 StaticMesh

// Engine\Source\Runtime\Engine\Classes\Engine\StaticMesh.h

class UStaticMesh : public UStreamableRenderAsset, (......)
{
    (......)
    
public:
    // 靜態網格的Nanite配置資料.
    FMeshNaniteSettings NaniteSettings;
    
    // 如果網格存在有效的Nanite渲染資料則返回true.
    bool HasValidNaniteData() const
    {
        if (const FStaticMeshRenderData* SMRenderData = GetRenderData())
        {
            return SMRenderData->NaniteResources.PageStreamingStates.Num() > 0;
        }
        return false;
    }
    
    (......)
    
    // 超高解析度的源模型相關的介面.
    FStaticMeshSourceModel& GetHiResSourceModel();
    const FStaticMeshSourceModel& GetHiResSourceModel() const;
    FStaticMeshSourceModel&& MoveHiResSourceModel();
    void SetHiResSourceModel(FStaticMeshSourceModel&& SourceModel);
    
    bool LoadHiResMeshDescription(FMeshDescription& OutMeshDescription) const;
    bool CloneHiResMeshDescription(FMeshDescription& OutMeshDescription) const;
    FMeshDescription* CreateHiResMeshDescription();
    FMeshDescription* CreateHiResMeshDescription(FMeshDescription MeshDescription);
    FMeshDescription* GetHiResMeshDescription() const;
    bool IsHiResMeshDescriptionValid() const;
    void CommitHiResMeshDescription(const FCommitMeshDescriptionParams& Params);
    void ClearHiResMeshDescription();
    
    (......)
    
private:
    // 超高解析度的源模型.
    FStaticMeshSourceModel HiResSourceModel;
    
    (......)
};


// Engine\Source\Runtime\Engine\Public\StaticMeshResources.h

// 靜態網格所需的渲染資料.
class FStaticMeshRenderData
{
public:
    (......)
    
    // Nanite渲染資源.
    Nanite::FResources NaniteResources;

    (......)
};

6.4.1.3 NaniteResource

// Engine\Source\Runtime\Engine\Public\Rendering\NaniteResources.h

// 最大數量的常量.
#define MAX_STREAMING_REQUESTS                    ( 128u * 1024u )
#define MAX_CLUSTER_TRIANGLES                    128
#define MAX_CLUSTER_VERTICES                    256
#define MAX_CLUSTER_INDICES                        ( MAX_CLUSTER_TRIANGLES * 3 )
#define MAX_NANITE_UVS                            4
#define NUM_ROOT_PAGES                            1u

// 是否使用三角形帶索引.
#define USE_STRIP_INDICES                        1

// CLUSTER常量.
#define CLUSTER_PAGE_GPU_SIZE_BITS                17
#define CLUSTER_PAGE_GPU_SIZE                    ( 1 << CLUSTER_PAGE_GPU_SIZE_BITS )
#define CLUSTER_PAGE_DISK_SIZE                    ( CLUSTER_PAGE_GPU_SIZE * 2 )
#define MAX_CLUSTERS_PER_PAGE_BITS                10
#define MAX_CLUSTERS_PER_PAGE_MASK                ( ( 1 << MAX_CLUSTERS_PER_PAGE_BITS ) - 1 )
#define MAX_CLUSTERS_PER_PAGE                    ( 1 << MAX_CLUSTERS_PER_PAGE_BITS )
#define MAX_CLUSTERS_PER_GROUP_BITS                9
#define MAX_CLUSTERS_PER_GROUP_MASK                ( ( 1 << MAX_CLUSTERS_PER_GROUP_BITS ) - 1 )
#define MAX_CLUSTERS_PER_GROUP                    ( ( 1 << MAX_CLUSTERS_PER_GROUP_BITS ) - 1 )
#define MAX_CLUSTERS_PER_GROUP_TARGET            128

// 層級, GPU頁, 例項化, 組等的常量.
#define MAX_HIERACHY_CHILDREN_BITS                6
#define MAX_HIERACHY_CHILDREN                    ( 1 << MAX_HIERACHY_CHILDREN_BITS )
#define MAX_GPU_PAGES_BITS                        14
#define    MAX_GPU_PAGES                            ( 1 << MAX_GPU_PAGES_BITS )
#define MAX_INSTANCES_BITS                        24
#define MAX_INSTANCES                            ( 1 << MAX_INSTANCES_BITS )
#define MAX_NODES_PER_PRIMITIVE_BITS            16
#define MAX_RESOURCE_PAGES_BITS                    20
#define MAX_RESOURCE_PAGES                        (1 << MAX_RESOURCE_PAGES_BITS)
#define MAX_GROUP_PARTS_BITS                    3
#define MAX_GROUP_PARTS_MASK                    ((1 << MAX_GROUP_PARTS_BITS) - 1)
#define MAX_GROUP_PARTS                            (1 << MAX_GROUP_PARTS_BITS)

#define PERSISTENT_CLUSTER_CULLING_GROUP_SIZE    64

// BVH
#define MAX_BVH_NODE_FANOUT_BITS                3
#define MAX_BVH_NODE_FANOUT                        (1 << MAX_BVH_NODE_FANOUT_BITS)
#define MAX_BVH_NODES_PER_GROUP                    (PERSISTENT_CLUSTER_CULLING_GROUP_SIZE / MAX_BVH_NODE_FANOUT)

#define NUM_CULLING_FLAG_BITS                    3
#define NUM_PACKED_CLUSTER_FLOAT4S                8
#define MAX_POSITION_QUANTIZATION_BITS            21  // (21*3 = 63) < 64
#define NORMAL_QUANTIZATION_BITS                 9

#define MAX_TEXCOORD_QUANTIZATION_BITS            15
#define MAX_COLOR_QUANTIZATION_BITS                 8

#define NUM_STREAMING_PRIORITY_CATEGORY_BITS     2
#define STREAMING_PRIORITY_CATEGORY_MASK        ((1u << NUM_STREAMING_PRIORITY_CATEGORY_BITS) - 1u)

#define VIEW_FLAG_HZBTEST                        0x1

#define MAX_TRANSCODE_GROUPS_PER_PAGE            128

#define VERTEX_COLOR_MODE_WHITE                    0
#define VERTEX_COLOR_MODE_CONSTANT                1
#define VERTEX_COLOR_MODE_VARIABLE                2

#define NANITE_USE_SCRATCH_BUFFERS                1

#define NANITE_CLUSTER_FLAG_LEAF                0x1


namespace Nanite
{

// 整形向量.
struct FUIntVector
{
    uint32 X, Y, Z;

    bool operator==(const FUIntVector& V) const;
    FORCEINLINE friend FArchive& operator<<(FArchive& Ar, FUIntVector& V);
};

// 打包的層級節點.
struct FPackedHierarchyNode
{
    FSphere        LODBounds[MAX_BVH_NODE_FANOUT]; // 用球體做LOD包圍盒.
    
    struct
    {
        FVector        BoxBoundsCenter;
        uint32        MinLODError_MaxParentLODError;
    } Misc0[MAX_BVH_NODE_FANOUT];

    struct
    {
        FVector        BoxBoundsExtent;
        uint32        ChildStartReference;
    } Misc1[MAX_BVH_NODE_FANOUT];
    
    struct
    {
        uint32        ResourcePageIndex_NumPages_GroupPartSize;
    } Misc2[MAX_BVH_NODE_FANOUT];
};

// 材質三角形.
struct FMaterialTriangle
{
    uint32 Index0;
    uint32 Index1;
    uint32 Index2;
    uint32 MaterialIndex;
    uint32 RangeCount;
};

// 從Value中獲取指定位數和偏移的值.
uint32 GetBits(uint32 Value, uint32 NumBits, uint32 Offset)
{
    uint32 Mask = (1u << NumBits) - 1u;
    return (Value >> Offset) & Mask;
}
// 將指定位數和偏移的值合併到Value中.
void SetBits(uint32& Value, uint32 Bits, uint32 NumBits, uint32 Offset)
{
    uint32 Mask = (1u << NumBits) - 1u;
    Mask <<= Offset;
    Value = (Value & ~Mask) | (Bits << Offset);
}

// 被GPU使用的打包的Cluster.
struct FPackedCluster
{
    // 光柵化所需的資料成員.
    FIntVector    QuantizedPosStart;
    uint32        NumVerts_PositionOffset;                // NumVerts:9, PositionOffset:23

    FVector        MeshBoundsMin;
    uint32        NumTris_IndexOffset;                    // NumTris:8, IndexOffset: 24

    FVector        MeshBoundsDelta;
    uint32        BitsPerIndex_QuantizedPosShift_PosBits;    // BitsPerIndex:4, QuantizedPosShift:6, QuantizedPosBits:5.5.5
    
    // 裁剪所需的資料成員.
    FSphere        LODBounds;

    FVector        BoxBoundsCenter;
    uint32        LODErrorAndEdgeLength;
    
    FVector        BoxBoundsExtent;
    uint32        Flags;

    // 材質所需的資料成員.
    uint32        AttributeOffset_BitsPerAttribute;    // AttributeOffset: 22, BitsPerAttribute: 10
    uint32        DecodeInfoOffset_NumUVs_ColorMode;    // DecodeInfoOffset: 22, NumUVs: 3, ColorMode: 2
    uint32        UV_Prec;                            // U0:4, V0:4, U1:4, V1:4, U2:4, V2:4, U3:4, V3:4
    uint32        PackedMaterialInfo;

    uint32        ColorMin;
    uint32        ColorBits;                            // R:4, G:4, B:4, A:4
    uint32        GroupIndex;                            // Debug only
    uint32        Pad0;

    uint32        GetNumVerts() const                        { return GetBits(NumVerts_PositionOffset, 9, 0); }
    uint32        GetPositionOffset() const                { return GetBits(NumVerts_PositionOffset, 23, 9); }

    uint32        GetNumTris() const                        { return GetBits(NumTris_IndexOffset, 8, 0); }
    uint32        GetIndexOffset() const                    { return GetBits(NumTris_IndexOffset, 24, 8); }

    uint32        GetBitsPerIndex() const                    { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 4, 0); }
    uint32        GetQuantizedPosShift() const            { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 6, 4); }
    uint32        GetPosBitsX() const                        { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 10); }
    uint32        GetPosBitsY() const                        { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 15); }
    uint32        GetPosBitsZ() const                        { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 20); }

    uint32        GetAttributeOffset() const                { return GetBits(AttributeOffset_BitsPerAttribute, 22, 0); }
    uint32        GetBitsPerAttribute() const                { return GetBits(AttributeOffset_BitsPerAttribute, 10, 22); }
    
    void        SetNumVerts(uint32 NumVerts)            { SetBits(NumVerts_PositionOffset, NumVerts, 9, 0); }
    void        SetPositionOffset(uint32 Offset)        { SetBits(NumVerts_PositionOffset, Offset, 23, 9); }

    void        SetNumTris(uint32 NumTris)                { SetBits(NumTris_IndexOffset, NumTris, 8, 0); }
    void        SetIndexOffset(uint32 Offset)            { SetBits(NumTris_IndexOffset, Offset, 24, 8); }

    void        SetBitsPerIndex(uint32 BitsPerIndex)    { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, BitsPerIndex, 4, 0); }
    void        SetQuantizedPosShift(uint32 PosShift)    { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, PosShift, 6, 4); }
    void        SetPosBitsX(uint32 NumBits)                { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 10); }
    void        SetPosBitsY(uint32 NumBits)                { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 15); }
    void        SetPosBitsZ(uint32 NumBits)                { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 20); }

    void        SetAttributeOffset(uint32 Offset)        { SetBits(AttributeOffset_BitsPerAttribute, Offset, 22, 0); }
    void        SetBitsPerAttribute(uint32 Bits)        { SetBits(AttributeOffset_BitsPerAttribute, Bits, 10, 22); }

    void        SetDecodeInfoOffset(uint32 Offset)        { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Offset, 22, 0); }
    void        SetNumUVs(uint32 Num)                    { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Num, 3, 22); }
    void        SetColorMode(uint32 Mode)                { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Mode, 2, 22+3); }
};

// 頁面流狀態.
struct FPageStreamingState
{
    uint32            BulkOffset;
    uint32            BulkSize;
    uint32            PageUncompressedSize;
    uint32            DependenciesStart;
    uint32            DependenciesNum;
};

// 層級修正.
class FHierarchyFixup
{
public:
    FHierarchyFixup() {}

    FHierarchyFixup( uint32 InPageIndex, uint32 NodeIndex, uint32 ChildIndex, uint32 InClusterGroupPartStartIndex, uint32 PageDependencyStart, uint32 PageDependencyNum )
    {
        PageIndex = InPageIndex;
        HierarchyNodeAndChildIndex = ( NodeIndex << MAX_HIERACHY_CHILDREN_BITS ) | ChildIndex;
        ClusterGroupPartStartIndex = InClusterGroupPartStartIndex;
        PageDependencyStartAndNum = (PageDependencyStart << MAX_GROUP_PARTS_BITS) | PageDependencyNum;
    }

    uint32 GetPageIndex() const                        { return PageIndex; }
    uint32 GetNodeIndex() const                        { return HierarchyNodeAndChildIndex >> MAX_HIERACHY_CHILDREN_BITS; }
    uint32 GetChildIndex() const                    { return HierarchyNodeAndChildIndex & ( MAX_HIERACHY_CHILDREN - 1 ); }
    uint32 GetClusterGroupPartStartIndex() const    { return ClusterGroupPartStartIndex; }
    uint32 GetPageDependencyStart() const            { return PageDependencyStartAndNum >> MAX_GROUP_PARTS_BITS; }
    uint32 GetPageDependencyNum() const                { return PageDependencyStartAndNum & MAX_GROUP_PARTS_MASK; }

    uint32 PageIndex;
    uint32 HierarchyNodeAndChildIndex;
    uint32 ClusterGroupPartStartIndex;
    uint32 PageDependencyStartAndNum;
};

// Cluster修正.
class FClusterFixup
{
public:
    FClusterFixup() {}

    FClusterFixup( uint32 PageIndex, uint32 ClusterIndex, uint32 PageDependencyStart, uint32 PageDependencyNum )
    {
        PageAndClusterIndex = ( PageIndex << MAX_CLUSTERS_PER_PAGE_BITS ) | ClusterIndex;
        PageDependencyStartAndNum = (PageDependencyStart << MAX_GROUP_PARTS_BITS) | PageDependencyNum;
    }
    
    uint32 GetPageIndex() const                { return PageAndClusterIndex >> MAX_CLUSTERS_PER_PAGE_BITS; }
    uint32 GetClusterIndex() const            { return PageAndClusterIndex & (MAX_CLUSTERS_PER_PAGE - 1u); }
    uint32 GetPageDependencyStart() const    { return PageDependencyStartAndNum >> MAX_GROUP_PARTS_BITS; }
    uint32 GetPageDependencyNum() const        { return PageDependencyStartAndNum & MAX_GROUP_PARTS_MASK; }

    uint32 PageAndClusterIndex;
    uint32 PageDependencyStartAndNum;
};

// 頁面磁碟頭.
struct FPageDiskHeader
{
    uint32 GpuSize;
    uint32 NumClusters;
    uint32 NumRawFloat4s;
    uint32 NumTexCoords;
    uint32 DecodeInfoOffset;
    uint32 StripBitmaskOffset;
    uint32 VertexRefBitmaskOffset;
};

// Cluster磁碟頭.
struct FClusterDiskHeader
{
    uint32 IndexDataOffset;
    uint32 VertexRefDataOffset;
    uint32 PositionDataOffset;
    uint32 AttributeDataOffset;
    uint32 NumPrevRefVerticesBeforeDwords;
    uint32 NumPrevNewVerticesBeforeDwords;
};

// Chunk修正.
class FFixupChunk    //TODO: rename to something else
{
public:
    struct FHeader
    {
        uint16 NumClusters = 0;
        uint16 NumHierachyFixups = 0;
        uint16 NumClusterFixups = 0;
        uint16 Pad = 0;
    } Header;
    
    uint8 Data[ sizeof(FHierarchyFixup) * MAX_CLUSTERS_PER_PAGE + sizeof( FClusterFixup ) * MAX_CLUSTERS_PER_PAGE ];

    FClusterFixup&        GetClusterFixup( uint32 Index ) const { check( Index < Header.NumClusterFixups );  return ( (FClusterFixup*)( Data + Header.NumHierachyFixups * sizeof( FHierarchyFixup ) ) )[ Index ]; }
    FHierarchyFixup&    GetHierarchyFixup( uint32 Index ) const { check( Index < Header.NumHierachyFixups ); return ((FHierarchyFixup*)Data)[ Index ]; }
    uint32                GetSize() const { return sizeof( Header ) + Header.NumHierachyFixups * sizeof( FHierarchyFixup ) + Header.NumClusterFixups * sizeof( FClusterFixup ); }
};

// 例項繪製引數.
struct FInstanceDraw
{
    uint32 InstanceId;
    uint32 ViewId;
};

// Nanite渲染資源.
struct FResources
{
    // 持久狀態.
    TArray< uint8 >                    RootClusterPage;        // Root page is loaded on resource load, so we always have something to draw.
    FByteBulkData                    StreamableClusterPages;    // Remaining pages are streamed on demand.
    TArray< uint16 >                ImposterAtlas;
    TArray< FPackedHierarchyNode >    HierarchyNodes;
    TArray< uint32 >                HierarchyRootOffsets;
    TArray< FPageStreamingState >    PageStreamingStates;
    TArray< uint32 >                PageDependencies;
    int32                            PositionPrecision = 0;
    bool    bLZCompressed            = false;

    // 執行時狀態.
    uint32    RuntimeResourceID        = 0xFFFFFFFFu;
    int32    HierarchyOffset            = INDEX_NONE;
    int32    RootPageIndex            = INDEX_NONE;
    uint32    NumHierarchyNodes        = 0;
    
    (......)
    
    ENGINE_API void InitResources();
    ENGINE_API bool ReleaseResources();
    ENGINE_API void Serialize(FArchive& Ar, UObject* Owner);
};

// GPU端Buffer, 包含了Nanite資源資料.
class FGlobalResources : public FRenderResource
{
public:
    struct PassBuffers
    {
        // 候選的(即未裁剪的)節點和Cluster緩衝區.
        TRefCountPtr<FRDGPooledBuffer> CandidateNodesAndClustersBuffer;
        TRefCountPtr<FRDGPooledBuffer> StatsRasterizeArgsSWHWBuffer;
    };

    uint32 StatsRenderFlags = 0;
    uint32 StatsDebugFlags  = 0;

public:
    virtual void InitRHI() override;
    virtual void ReleaseRHI() override;

    ENGINE_API void    Update(FRDGBuilder& GraphBuilder); // Called once per frame before any Nanite rendering has occurred.

    ENGINE_API static uint32 GetMaxCandidateClusters();
    ENGINE_API static uint32 GetMaxVisibleClusters();
    ENGINE_API static uint32 GetMaxNodes();

    (......)
    
private:
    PassBuffers MainPassBuffers;
    PassBuffers PostPassBuffers;

    class FVertexFactory* VertexFactory = nullptr;

    TRefCountPtr<FRDGPooledBuffer> StatsBuffer;

    // Dummy structured buffer with stride8
    TRefCountPtr<FRDGPooledBuffer> StructureBufferStride8;

#if NANITE_USE_SCRATCH_BUFFERS
    TRefCountPtr<FRDGPooledBuffer> PrimaryVisibleClustersBuffer;
    // Used for scratch memory (transient only)
    TRefCountPtr<FRDGPooledBuffer> ScratchVisibleClustersBuffer;
    TRefCountPtr<FRDGPooledBuffer> ScratchOccludedInstancesBuffer;
#endif
};

extern ENGINE_API TGlobalResource< FGlobalResources > GGlobalResources;

} // namespace Nanite

6.4.1.4 Cluster, ClusterGroup, Page

由於構建Nanite資料時涉及的概念眾多,這裡集中闡述一下。

Nanite涉及到最核心最基礎的概念便是Cluster,一個Cluster是一組相鄰三角形的集合:

上:正常渲染;中:三角形視覺化;下:Cluster視覺化。

Cluster可以和相鄰的Cluster或者相鄰LOD的Cluster動態合批,使得畫面不違和,不產生明顯的跳變,具體見下視訊:

Cluster技術並非UE獨創,而在早前已被育碧和寒霜引擎使用,具體可參見論文:GPU-Driven Rendering PipelineOptimizing the Graphics Pipeline with Compute

下面是Cluster及其它基礎型別的定義:

// Engine\Source\Developer\NaniteBuilder\Private\Cluster.h

// 網格簇, 將模型劃分為若干個簇.
class FCluster
{
public:
    FCluster();
    FCluster( FCluster& SrcCluster, uint32 TriBegin, uint32 TriEnd, const TArray< uint32 >& TriIndexes );
    FCluster( const TArray< const FCluster*, TInlineAllocator<16> >& MergeList );
    FCluster(const TArray< FStaticMeshBuildVertex >& InVerts,const TArrayView< const uint32 >& InIndexes,
const TArrayView< const int32 >& InMaterialIndexes,const TBitArray<>& InBoundaryEdges,uint32 TriBegin, uint32 TriEnd, const TArray< uint32 >& TriIndexes, uint32 NumTexCoords, bool bHasColors );

    // 簡化Cluster, 可以指定期望的三角形數量.
    float    Simplify( uint32 NumTris );
    // 拆分Cluster.
    void    Split( FGraphPartitioner& Partitioner ) const;

    (......)

    static const uint32    ClusterSize = 128;

    // 計數器.
    uint32        NumVerts = 0;
    uint32        NumTris = 0;
    uint32        NumTexCoords = 0;
    bool        bHasColors = false;

    // 網格資料.
    TArray< float >        Verts; // 頂點
    TArray< uint32 >    Indexes; // 索引
    TArray< int32 >        MaterialIndexes; // 材質索引.
    TBitArray<>            BoundaryEdges; // 邊界邊.
    TBitArray<>            ExternalEdges; // 擴充套件邊.
    uint32                NumExternalEdges; // 擴充套件邊數量.

    TMap< uint32, uint32 >    AdjacentClusters; // 相鄰的Cluster.

    // 包圍盒資料.
    FBounds        Bounds; // 包圍盒.
    FSphere        SphereBounds;
    FSphere        LODBounds;
    FVector        MeshBoundsMin; //網格包圍盒.
    FVector        MeshBoundsDelta;
    
    float        SurfaceArea = 0.0f;
    uint32        GUID = 0;
    int32        MipLevel = 0;

    // 量化位置的資料.
    TArray<FIntVector>    QuantizedPositions;
    FIntVector    QuantizedPosStart    = { 0u, 0u, 0u };
    uint32        QuantizedPosShift    = 0u;
    FIntVector  QuantizedPosBits    = {};

    float        EdgeLength = 0.0f;
    float        LODError = 0.0f;
    
    // 所在的Group資料.
    uint32        GroupIndex            = MAX_uint32;
    uint32        GroupPartIndex        = MAX_uint32;
    uint32        GeneratingGroupIndex= MAX_uint32;

    // 材質範圍.
    TArray<FMaterialRange, TInlineAllocator<4>> MaterialRanges;

    // 帶狀索引資料.
    FStripDesc        StripDesc;
    TArray<uint8>    StripIndexData;
};

// Engine\Source\Developer\NaniteBuilder\Private\ClusterDAG.h

// 簇組, 集合了若干個Cluster.
struct FClusterGroup
{
    // 包圍盒.
    FSphere                Bounds;
    FSphere                LODBounds;
    // 誤差.
    float                MinLODError;
    float                MaxParentLODError;
    // 層級和網格索引.
    int32                MipLevel;
    uint32                MeshIndex;
    
    // 頁表索引.
    uint32                PageIndexStart;
    uint32                PageIndexNum;
    // 子節點索引.
    TArray< uint32 >    Children;

    friend FArchive& operator<<(FArchive& Ar, FClusterGroup& Group);
};

// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp

// FClusterGroup分拆後的全部或一部分.
struct FClusterGroupPart
{
    TArray<uint32>    Clusters;    // 在頁面分配期間可能重新排序,因此需要在這裡儲存一個列表。
    FBounds            Bounds;      // 包圍盒.
    uint32            PageIndex;   // 頁表索引.
    uint32            GroupIndex;     // 所在的Group索引.
    uint32            HierarchyNodeIndex;  // 層次結構節點索引.
    uint32            HierarchyChildIndex; // 層次結構子節點索引.
    uint32            PageClusterOffset;   // 頁表Cluster列表偏移.
};

// 頁表的一部分.
struct FPageSections
{
    uint32 Cluster            = 0;
    uint32 MaterialTable    = 0;
    uint32 DecodeInfo        = 0;
    uint32 Index            = 0;
    uint32 Position            = 0;
    uint32 Attribute        = 0;

    uint32 GetMaterialTableSize() const        { return Align(MaterialTable, 16); }
    uint32 GetClusterOffset() const            { return 0; }
    uint32 GetMaterialTableOffset() const    { return Cluster; }
    uint32 GetDecodeInfoOffset() const        { return Cluster + GetMaterialTableSize(); }
    uint32 GetIndexOffset() const            { return Cluster + GetMaterialTableSize() + DecodeInfo; }
    uint32 GetPositionOffset() const        { return Cluster + GetMaterialTableSize() + DecodeInfo + Index; }
    uint32 GetAttributeOffset() const        { return Cluster + GetMaterialTableSize() + DecodeInfo + Index + Position; }
    uint32 GetTotal() const                    { return Cluster + GetMaterialTableSize() + DecodeInfo + Index + Position + Attribute; }

    FPageSections GetOffsets() const
    {
        return FPageSections{ GetClusterOffset(), GetMaterialTableOffset(), GetDecodeInfoOffset(), GetIndexOffset(), GetPositionOffset(), GetAttributeOffset() };
    }

    void operator+=(const FPageSections& Other)
    {
        Cluster            +=    Other.Cluster;
        MaterialTable    +=    Other.MaterialTable;
        DecodeInfo        +=    Other.DecodeInfo;
        Index            +=    Other.Index;
        Position        +=    Other.Position;
        Attribute        +=    Other.Attribute;
    }
};

// Clsuter頁表.
struct FPage
{
    uint32    PartsStartIndex = 0; // FClusterGroupPart起始索引.
    uint32    PartsNum = 0; // FClusterGroupPart數量.
    uint32    NumClusters = 0; // Cluster數量.

    FPageSections    GpuSizes; // GPU尺寸.
};

// 編碼資訊.
struct FEncodingInfo
{
    uint32 BitsPerIndex; // 每個索引的位數.
    uint32 BitsPerAttribute; // 每個屬性的位數.
    uint32 UVPrec; // UV精度.
    
    uint32        ColorMode; // 顏色模式.
    FIntVector4 ColorMin;  // 最小顏色.
    FIntVector4 ColorBits; // 顏色位數.

    FPageSections GpuSizes; // GPU尺寸.

    // UV編碼資訊.
    FGeometryEncodingUVInfo UVInfos[MAX_NANITE_UVS];
};

// Cluster Hierarchy的中間節點, 用於構建Hierarchy.
struct FIntermediateNode
{
    uint32                PartIndex    = MAX_uint32; // FClusterGroupPart索引.
    uint32                MipLevel    = MAX_int32;  // Mip層級.
    bool                bLeaf        = false; // 是否葉子節點.
    
    FBounds                Bound;    // 包圍盒.
    TArray< uint32 >    Children; // 子節點列表.
};

// Engine\Source\Developer\NaniteBuilder\Private\ImposterAtlas.h

// Cluster光柵化進的圖集.
class FImposterAtlas
{
public:
    static constexpr uint32    AtlasSize    = 12;
    static constexpr uint32    TileSize    = 12;

                FImposterAtlas( TArray< uint16 >& InPixels, const FBounds& MeshBounds );
    // 光柵化指定Cluster的所有三角形到此FImposterAtlas.
    void        Rasterize( const FIntPoint& TilePos, const FCluster& Cluster, uint32 ClusterIndex );

private:
    TArray< uint16 >&    Pixels; 

    FVector        BoundsCenter;
    FVector        BoundsExtent;

    FMatrix        GetLocalToImposter( const FIntPoint& TilePos ) const;
};

6.4.2 Nanite資料構建

本小節主要闡述Nanite在渲染前執行的預處理,包含Nanite靜態資料的構建、呼叫過程等。

6.4.2.1 BuildNaniteFromHiResSourceModel

Nanite通過BuildNaniteFromHiResSourceModel介面從最高解析度的模型構建需要的資料,類似於FStaticMeshBuilder::Build()介面,但會忽略減面過程,這個過程被稱作Nanite切分(Nanite-fractional-cut),具體過程如下:

// Engine\Source\Developer\MeshBuilder\Private\StaticMeshBuilder.cpp

static bool BuildNaniteFromHiResSourceModel(
    UStaticMesh* StaticMesh, 
    const FMeshNaniteSettings NaniteSettings, 
    FBoxSphereBounds& HiResBoundsOut, 
    Nanite::FResources& NaniteResourcesOut)
{
    // 忽略沒有高解析度的靜態網格.
    if (ensure(StaticMesh->IsHiResMeshDescriptionValid()) == false)
    {
        return false;
    }

    TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshBuilder::BuildNaniteFromHiResSourceModel);

    // 獲取模型資料
    FMeshDescription HiResMeshDescription = *StaticMesh->GetHiResMeshDescription();
    FStaticMeshSourceModel& HiResSrcModel = StaticMesh->GetHiResSourceModel();
    FMeshBuildSettings& HiResBuildSettings = HiResSrcModel.BuildSettings;

    // 計算切線, 光照圖UV等等.
    FMeshDescriptionHelper MeshDescriptionHelper(&HiResBuildSettings);
    MeshDescriptionHelper.SetupRenderMeshDescription(StaticMesh, HiResMeshDescription);

    // 構建臨時的RenderData資料, 以便傳遞到後續的Nanite構建階段.
    FStaticMeshRenderData HiResTempRenderData;
    HiResTempRenderData.AllocateLODResources(1);
    // 注意獲取的是索引為0的LOD資料(亦即最高解析度的資料).
    FStaticMeshLODResources& HiResStaticMeshLOD = HiResTempRenderData.LODResources[0];
    HiResStaticMeshLOD.MaxDeviation = 0.0f;

    // 準備PerSectionIndices陣列, 以優化提供給GPU的索引緩衝.
    TArray<TArray<uint32>> PerSectionIndices;
    PerSectionIndices.AddDefaulted(HiResMeshDescription.PolygonGroups().Num());
    HiResStaticMeshLOD.Sections.Empty(HiResMeshDescription.PolygonGroups().Num());

    // 構建頂點和索引緩衝. 不需要WedgeMap或RemapVerts
    TArray<int32> WedgeMap, RemapVerts;
    TArray<FStaticMeshBuildVertex> StaticMeshBuildVertices;
    BuildVertexBuffer(StaticMesh, HiResMeshDescription, HiResBuildSettings, WedgeMap, HiResStaticMeshLOD.Sections, PerSectionIndices, StaticMeshBuildVertices, MeshDescriptionHelper.GetOverlappingCorners(), RemapVerts);
    WedgeMap.Empty();

    const uint32 NumTextureCoord = HiResMeshDescription.VertexInstanceAttributes().GetAttributesRef<FVector2D>(MeshAttribute::VertexInstance::TextureCoordinate).GetNumChannels();

    // 只有渲染資料和頂點資料需要被使用, 所以可以清理MeshDescription.
    HiResMeshDescription.Empty();

    // 連結逐section的索引緩衝.
    TArray<uint32> CombinedIndices;
    bool bNeeds32BitIndices = false;
    BuildCombinedSectionIndices(PerSectionIndices, HiResStaticMeshLOD, CombinedIndices, bNeeds32BitIndices);

    // 在Nanite構建之前從高解析度網格計算包圍盒, 因為它會修改StaticMeshBuildVertices.
    ComputeBoundsFromVertexList(StaticMeshBuildVertices, HiResBoundsOut.Origin, HiResBoundsOut.BoxExtent, HiResBoundsOut.SphereRadius);

    // Nanite構建要求section材質索引已經從SectionInfoMap中解析出來, 因為索引被烘焙進了FMaterialTriangles.
    for (int32 SectionIndex = 0; SectionIndex < HiResStaticMeshLOD.Sections.Num(); SectionIndex++)
    {
        HiResStaticMeshLOD.Sections[SectionIndex].MaterialIndex = StaticMesh->GetSectionInfoMap().Get(0, SectionIndex).MaterialIndex;
    }

    // 執行Nanite構建.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshBuilder::BuildNaniteFromHiResSourceModel::Nanite);
        Nanite::IBuilderModule& NaniteBuilderModule = Nanite::IBuilderModule::Get();
        if (!NaniteBuilderModule.Build(NaniteResourcesOut, StaticMeshBuildVertices, CombinedIndices, HiResStaticMeshLOD.Sections, NumTextureCoord, NaniteSettings))
        {
            UE_LOG(LogStaticMesh, Error, TEXT("Failed to build Nanite for HiRes static mesh. See previous line(s) for details."));
            return false;
        }
    }

    return true;
}

上面的程式碼涉及了幾個重要介面,下面分析它們:

// Engine\Source\Runtime\Engine\Private\StaticMesh.cpp

// 是否存在有效的高解析度網格.
bool UStaticMesh::IsHiResMeshDescriptionValid() const
{
    const FStaticMeshSourceModel& SourceModel = GetHiResSourceModel();
    return SourceModel.IsMeshDescriptionValid();
}


// Engine\Source\Developer\MeshBuilder\Private\MeshDescriptionHelper.cpp

void FMeshDescriptionHelper::SetupRenderMeshDescription(UObject* Owner, FMeshDescription& RenderMeshDescription)
{
    TRACE_CPUPROFILER_EVENT_SCOPE(FMeshDescriptionHelper::GetRenderMeshDescription);

    UStaticMesh* StaticMesh = Cast<UStaticMesh>(Owner);

    const bool bNaniteBuildEnabled = StaticMesh->NaniteSettings.bEnabled;
    float ComparisonThreshold = (BuildSettings->bRemoveDegenerates && !bNaniteBuildEnabled) ? THRESH_POINTS_ARE_SAME : 0.0f;
    
    // 保證多邊形法線,切線,副法線被計算, 也會從render mesh description刪除的退化三件套.
    FStaticMeshOperations::ComputeTriangleTangentsAndNormals(RenderMeshDescription, ComparisonThreshold);

    FVertexInstanceArray& VertexInstanceArray = RenderMeshDescription.VertexInstances();

    FStaticMeshAttributes Attributes(RenderMeshDescription);
    TVertexInstanceAttributesRef<FVector> Normals = Attributes.GetVertexInstanceNormals();
    TVertexInstanceAttributesRef<FVector> Tangents = Attributes.GetVertexInstanceTangents();
    TVertexInstanceAttributesRef<float> BinormalSigns = Attributes.GetVertexInstanceBinormalSigns();

    // 找到重疊的頂點,加速鄰接。
    FStaticMeshOperations::FindOverlappingCorners(OverlappingCorners, RenderMeshDescription, ComparisonThreshold);

    // 靜態網格總是混合重疊角的法線.
    EComputeNTBsFlags ComputeNTBsOptions = EComputeNTBsFlags::BlendOverlappingNormals;
    ComputeNTBsOptions |= BuildSettings->bComputeWeightedNormals ? EComputeNTBsFlags::WeightedNTBs : EComputeNTBsFlags::None;
    ComputeNTBsOptions |= BuildSettings->bRecomputeNormals ? EComputeNTBsFlags::Normals : EComputeNTBsFlags::None;
    ComputeNTBsOptions |= BuildSettings->bUseMikkTSpace ? EComputeNTBsFlags::UseMikkTSpace : EComputeNTBsFlags::None;

    // Nanite網格不會計算切線資料.
    if (!bNaniteBuildEnabled)
    {
        ComputeNTBsOptions |= BuildSettings->bRemoveDegenerates ? EComputeNTBsFlags::IgnoreDegenerateTriangles : EComputeNTBsFlags::None;
        ComputeNTBsOptions |= BuildSettings->bRecomputeTangents ? EComputeNTBsFlags::Tangents : EComputeNTBsFlags::None;
    }

    // 計算任何丟失的法線或切線.
    FStaticMeshOperations::ComputeTangentsAndNormals(RenderMeshDescription, ComputeNTBsOptions);

    // 生成光照圖UV.
    if (BuildSettings->bGenerateLightmapUVs && VertexInstanceArray.Num() > 0)
    {
        TVertexInstanceAttributesRef<FVector2D> VertexInstanceUVs = Attributes.GetVertexInstanceUVs();
        int32 NumIndices = VertexInstanceUVs.GetNumChannels();
        //Verify the src light map channel
        if (BuildSettings->SrcLightmapIndex >= NumIndices)
        {
            BuildSettings->SrcLightmapIndex = 0;
        }
        //Verify the destination light map channel
        if (BuildSettings->DstLightmapIndex >= NumIndices)
        {
            //Make sure we do not add illegal UV Channel index
            if (BuildSettings->DstLightmapIndex >= MAX_MESH_TEXTURE_COORDS_MD)
            {
                BuildSettings->DstLightmapIndex = MAX_MESH_TEXTURE_COORDS_MD - 1;
            }

            //Add some unused UVChannel to the mesh description for the lightmapUVs
            VertexInstanceUVs.SetNumChannels(BuildSettings->DstLightmapIndex + 1);
            BuildSettings->DstLightmapIndex = NumIndices;
        }
        FStaticMeshOperations::CreateLightMapUVLayout(RenderMeshDescription,
            BuildSettings->SrcLightmapIndex,
            BuildSettings->DstLightmapIndex,
            BuildSettings->MinLightmapResolution,
            (ELightmapUVVersion)StaticMesh->GetLightmapUVVersion(),
            OverlappingCorners);
    }
}


// Engine\Source\Developer\MeshBuilder\Private\StaticMeshBuilder.cpp

// 構建頂點緩衝區.
void BuildVertexBuffer(
      UStaticMesh *StaticMesh
    , const FMeshDescription& MeshDescription
    , const FMeshBuildSettings& BuildSettings
    , TArray<int32>& OutWedgeMap
    , FStaticMeshSectionArray& OutSections
    , TArray<TArray<uint32> >& OutPerSectionIndices
    , TArray< FStaticMeshBuildVertex >& StaticMeshBuildVertices
    , const FOverlappingCorners& OverlappingCorners
    , TArray<int32>& RemapVerts)
{
    TRACE_CPUPROFILER_EVENT_SCOPE(BuildVertexBuffer);

    TArray<int32> RemapVertexInstanceID;
    // 設定頂點緩衝元素.
    const int32 NumVertexInstances = MeshDescription.VertexInstances().GetArraySize();
    StaticMeshBuildVertices.Reserve(NumVertexInstances);

    FStaticMeshConstAttributes Attributes(MeshDescription);

    TPolygonGroupAttributesConstRef<FName> PolygonGroupImportedMaterialSlotNames = Attributes.GetPolygonGroupMaterialSlotNames();
    TVertexAttributesConstRef<FVector> VertexPositions = Attributes.GetVertexPositions();
    TVertexInstanceAttributesConstRef<FVector> VertexInstanceNormals = Attributes.GetVertexInstanceNormals();
    TVertexInstanceAttributesConstRef<FVector> VertexInstanceTangents = Attributes.GetVertexInstanceTangents();
    TVertexInstanceAttributesConstRef<float> VertexInstanceBinormalSigns = Attributes.GetVertexInstanceBinormalSigns();
    TVertexInstanceAttributesConstRef<FVector4> VertexInstanceColors = Attributes.GetVertexInstanceColors();
    TVertexInstanceAttributesConstRef<FVector2D> VertexInstanceUVs = Attributes.GetVertexInstanceUVs();

    const bool bHasColors = VertexInstanceColors.IsValid();
    const bool bIgnoreTangents = StaticMesh->NaniteSettings.bEnabled;

    const uint32 NumTextureCoord = VertexInstanceUVs.GetNumChannels();
    const FMatrix ScaleMatrix = FScaleMatrix(BuildSettings.BuildScale3D).Inverse().GetTransposed();

    TMap<FPolygonGroupID, int32> PolygonGroupToSectionIndex;

    for (const FPolygonGroupID PolygonGroupID : MeshDescription.PolygonGroups().GetElementIDs())
    {
        int32& SectionIndex = PolygonGroupToSectionIndex.FindOrAdd(PolygonGroupID);
        SectionIndex = OutSections.Add(FStaticMeshSection());
        FStaticMeshSection& StaticMeshSection = OutSections[SectionIndex];
        StaticMeshSection.MaterialIndex = StaticMesh->GetMaterialIndexFromImportedMaterialSlotName(PolygonGroupImportedMaterialSlotNames[PolygonGroupID]);
        if (StaticMeshSection.MaterialIndex == INDEX_NONE)
        {
            StaticMeshSection.MaterialIndex = PolygonGroupID.GetValue();
        }
    }

    int32 ReserveIndicesCount = MeshDescription.Triangles().Num() * 3;

    // 填充重對映陣列.
    RemapVerts.AddZeroed(ReserveIndicesCount);
    for (int32& RemapIndex : RemapVerts)
    {
        RemapIndex = INDEX_NONE;
    }

    // 初始化楔形表OutWedgeMap
    OutWedgeMap.Reset();
    OutWedgeMap.AddZeroed(ReserveIndicesCount);

    float VertexComparisonThreshold = BuildSettings.bRemoveDegenerates ? THRESH_POINTS_ARE_SAME : 0.0f;

    int32 WedgeIndex = 0;
    for (const FTriangleID TriangleID : MeshDescription.Triangles().GetElementIDs())
    {
        const FPolygonGroupID PolygonGroupID = MeshDescription.GetTrianglePolygonGroup(TriangleID);
        const int32 SectionIndex = PolygonGroupToSectionIndex[PolygonGroupID];
        TArray<uint32>& SectionIndices = OutPerSectionIndices[SectionIndex];

        TArrayView<const FVertexID> VertexIDs = MeshDescription.GetTriangleVertices(TriangleID);

        FVector CornerPositions[3];
        for (int32 TriVert = 0; TriVert < 3; ++TriVert)
        {
            CornerPositions[TriVert] = VertexPositions[VertexIDs[TriVert]];
        }
        FOverlappingThresholds OverlappingThresholds;
        OverlappingThresholds.ThresholdPosition = VertexComparisonThreshold;
        // 不處理已被合併的三角形.
        if (PointsEqual(CornerPositions[0], CornerPositions[1], OverlappingThresholds)
            || PointsEqual(CornerPositions[0], CornerPositions[2], OverlappingThresholds)
            || PointsEqual(CornerPositions[1], CornerPositions[2], OverlappingThresholds))
        {
            WedgeIndex += 3;
            continue;
        }

        TArrayView<const FVertexInstanceID> VertexInstanceIDs = MeshDescription.GetTriangleVertexInstances(TriangleID);
        for (int32 TriVert = 0; TriVert < 3; ++TriVert, ++WedgeIndex)
        {
            const FVertexInstanceID VertexInstanceID = VertexInstanceIDs[TriVert];
            const FVector& VertexPosition = CornerPositions[TriVert];
            const FVector& VertexInstanceNormal = VertexInstanceNormals[VertexInstanceID];
            const FVector& VertexInstanceTangent = VertexInstanceTangents[VertexInstanceID];
            const float VertexInstanceBinormalSign = VertexInstanceBinormalSigns[VertexInstanceID];

            FStaticMeshBuildVertex StaticMeshVertex;

            StaticMeshVertex.Position = VertexPosition * BuildSettings.BuildScale3D;
            // 如果是Nanite網格, 直接賦值固定的切線和副切線.
            if( bIgnoreTangents )
            {
                StaticMeshVertex.TangentX = FVector( 1.0f, 0.0f, 0.0f );
                StaticMeshVertex.TangentY = FVector( 0.0f, 1.0f, 0.0f );
            }
            else
            {
                StaticMeshVertex.TangentX = ScaleMatrix.TransformVector(VertexInstanceTangent).GetSafeNormal();
                StaticMeshVertex.TangentY = ScaleMatrix.TransformVector(FVector::CrossProduct(VertexInstanceNormal, VertexInstanceTangent) * VertexInstanceBinormalSign).GetSafeNormal();
            }
            StaticMeshVertex.TangentZ = ScaleMatrix.TransformVector(VertexInstanceNormal).GetSafeNormal();
                
            if (bHasColors)
            {
                const FVector4& VertexInstanceColor = VertexInstanceColors[VertexInstanceID];
                const FLinearColor LinearColor(VertexInstanceColor);
                StaticMeshVertex.Color = LinearColor.ToFColor(true);
            }
            else
            {
                StaticMeshVertex.Color = FColor::White;
            }

            const uint32 MaxNumTexCoords = FMath::Min<int32>(MAX_MESH_TEXTURE_COORDS_MD, MAX_STATIC_TEXCOORDS);
            for (uint32 UVIndex = 0; UVIndex < MaxNumTexCoords; ++UVIndex)
            {
                if(UVIndex < NumTextureCoord)
                {
                    StaticMeshVertex.UVs[UVIndex] = VertexInstanceUVs.Get(VertexInstanceID, UVIndex);
                }
                else
                {
                    StaticMeshVertex.UVs[UVIndex] = FVector2D(0.0f, 0.0f);
                }
            }
                    
            // 不會增加重複的頂點例項. 使用已被構建的WedgeIndex
            const TArray<int32>& DupVerts = OverlappingCorners.FindIfOverlapping(WedgeIndex);

            int32 Index = INDEX_NONE;
            for (int32 k = 0; k < DupVerts.Num(); k++)
            {
                if (DupVerts[k] >= WedgeIndex)
                {
                    break;
                }
                int32 Location = RemapVerts.IsValidIndex(DupVerts[k]) ? RemapVerts[DupVerts[k]] : INDEX_NONE;
                if (Location != INDEX_NONE && AreVerticesEqual(StaticMeshVertex, StaticMeshBuildVertices[Location], VertexComparisonThreshold))
                {
                    Index = Location;
                    break;
                }
            }
            if (Index == INDEX_NONE)
            {
                Index = StaticMeshBuildVertices.Add(StaticMeshVertex);
            }
            RemapVerts[WedgeIndex] = Index;
            OutWedgeMap[WedgeIndex] = Index;
            SectionIndices.Add( Index );
        }
    }

    // 設定緩衝區前先優化.
    if (NumVertexInstances < 100000 * 3)
    {
        BuildOptimizationHelper::CacheOptimizeVertexAndIndexBuffer(StaticMeshBuildVertices, OutPerSectionIndices, OutWedgeMap);
    }
}

// 構建組合的Section索引.
static void BuildCombinedSectionIndices(
    const TArray<TArray<uint32>>& PerSectionIndices, 
    FStaticMeshLODResources& StaticMeshLODInOut, 
    TArray<uint32>& CombinedIndicesOut,
    bool& bNeeds32BitIndicesOut )
{
    bNeeds32BitIndicesOut = false;
    for (int32 SectionIndex = 0; SectionIndex < StaticMeshLODInOut.Sections.Num(); SectionIndex++)
    {
        FStaticMeshSection& Section = StaticMeshLODInOut.Sections[SectionIndex];
        const TArray<uint32>& SectionIndices = PerSectionIndices[SectionIndex];
        Section.FirstIndex = 0;
        Section.NumTriangles = 0;
        Section.MinVertexIndex = 0;
        Section.MaxVertexIndex = 0;

        if (SectionIndices.Num())
        {
            Section.FirstIndex = CombinedIndicesOut.Num();
            Section.NumTriangles = SectionIndices.Num() / 3;

            CombinedIndicesOut.AddUninitialized(SectionIndices.Num());
            uint32* DestPtr = &CombinedIndicesOut[Section.FirstIndex];
            uint32 const* SrcPtr = SectionIndices.GetData();

            Section.MinVertexIndex = *SrcPtr;
            Section.MaxVertexIndex = *SrcPtr;

            for (int32 Index = 0; Index < SectionIndices.Num(); Index++)
            {
                uint32 VertIndex = *SrcPtr++;

                bNeeds32BitIndicesOut |= (VertIndex > MAX_uint16);
                Section.MinVertexIndex = FMath::Min<uint32>(VertIndex, Section.MinVertexIndex);
                Section.MaxVertexIndex = FMath::Max<uint32>(VertIndex, Section.MaxVertexIndex);
                *DestPtr++ = VertIndex;
            }
        }
    }
}

// 根據頂點計算包圍盒和球體
static void ComputeBoundsFromVertexList(const TArray<FStaticMeshBuildVertex>& Vertices, FVector& OriginOut, FVector& ExtentOut, float& RadiusOut)
{
    // 計算包圍盒
    FBox BoundingBox(ForceInit);
    for (int32 VertexIndex = 0; VertexIndex < Vertices.Num(); VertexIndex++)
    {
        BoundingBox += Vertices[VertexIndex].Position;
    }
    BoundingBox.GetCenterAndExtents(OriginOut, ExtentOut);

    // 計算球體, 利用包圍盒的中心作為球體中心.
    RadiusOut = 0.0f;
    for (int32 VertexIndex = 0; VertexIndex < Vertices.Num(); VertexIndex++)
    {
        RadiusOut = FMath::Max((Vertices[VertexIndex].Position-OriginOut).Size(), RadiusOut);
    }
}

以上的很多邏輯和普通的靜態網格類似,但也存在以下幾點不同:

  • Nanite的源模型來自超高解析度模型HiResSourceModel。
  • Nanite網格會忽略切線、副切線的計算以及減面過程。
  • 最後會呼叫Nanite::IBuilderModule::Build真正地構建Nanite網格資料。具體見下一小節分析。

6.4.2.2 BuildNaniteData

本小節將闡述Nanite網格的構建過程。

// Engine\Source\Developer\NaniteBuilder\Private\NaniteBuilder.cpp

bool FBuilderModule::Build(
    FResources& Resources,
    TArray< FStaticMeshBuildVertex>& Vertices,
    TArray< uint32 >& TriangleIndices,
    TArray< FStaticMeshSection, TInlineAllocator<1>>& Sections,
    uint32 NumTexCoords,
    const FMeshNaniteSettings& Settings)
{
    TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build);

    check(Sections.Num() > 0 && Sections.Num() <= 64);

    // 構建三角形索引和材質索引的關聯陣列。
    TArray<int32> MaterialIndices;
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::BuildSections);
        // 材質索引的數量和三角形數量一致.
        MaterialIndices.Reserve(TriangleIndices.Num() / 3);
        for (int32 SectionIndex = 0; SectionIndex < Sections.Num(); SectionIndex++)
        {
            FStaticMeshSection& Section = Sections[SectionIndex];

            check(Section.MaterialIndex != INDEX_NONE);
            for (uint32 i = 0; i < Section.NumTriangles; ++i)
            {
                MaterialIndices.Add(Section.MaterialIndex);
            }
        }
    }

    TArray<uint32> MeshTriangleCounts;
    MeshTriangleCounts.Add(TriangleIndices.Num() / 3);

    // 保證每個三角形有一個材質索引.
    check(MaterialIndices.Num() * 3 == TriangleIndices.Num());

    // 構建Nanite資料.
    return BuildNaniteData(
        Resources,
        Vertices,
        TriangleIndices,
        MaterialIndices,
        MeshTriangleCounts,
        Sections,
        NumTexCoords,
        Settings
    );
}

// 構建Nanite資料.
static bool BuildNaniteData(
    FResources& Resources,
    TArray< FStaticMeshBuildVertex >& Verts, // TODO: Do not require this vertex type for all users of Nanite
    TArray< uint32 >& Indexes,
    TArray< int32 >& MaterialIndexes,
    TArray<uint32>& MeshTriangleCounts,
    TArray< FStaticMeshSection, TInlineAllocator<1> >& Sections,
    uint32 NumTexCoords,
    const FMeshNaniteSettings& Settings
)
{
    TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::BuildData);

    if (NumTexCoords > MAX_NANITE_UVS) NumTexCoords = MAX_NANITE_UVS;

    FBounds    VertexBounds;
    uint32 Channel = 255; // 用來檢測是否擁有有效的頂點資料.
    for( auto& Vert : Verts )
    {
        VertexBounds += Vert.Position;

        Channel &= Vert.Color.R;
        Channel &= Vert.Color.G;
        Channel &= Vert.Color.B;
        Channel &= Vert.Color.A;
    }

    const uint32 NumMeshes = MeshTriangleCounts.Num();
    
    // 只有非全白時才擁有顏色資料.
    bool bHasColors = Channel != 255;

    TArray< uint32 > ClusterCountPerMesh;
    TArray< FCluster > Clusters;
    {
        uint32 BaseTriangle = 0;
        // 遍歷所有Section, 給每個Section構建一個或多個Cluster.
        for (uint32 NumTriangles : MeshTriangleCounts)
        {
            uint32 NumClustersBefore = Clusters.Num();
            if (NumTriangles)
            {
                // 為每個Section構建1或多個Cluster. 使用了TArrayView構建複用資料的陣列.
                // 後面有分析ClusterTriangles的具體過程.
                ClusterTriangles(Verts, TArrayView< const uint32 >( &Indexes[BaseTriangle * 3], NumTriangles * 3 ),
                                        TArrayView< const int32 >( &MaterialIndexes[BaseTriangle], NumTriangles ),
                                        Clusters, VertexBounds, NumTexCoords, bHasColors);
            }
            // 記錄每個Section的Cluster數量.
            ClusterCountPerMesh.Add(Clusters.Num() - NumClustersBefore);
            BaseTriangle += NumTriangles;
        }
    }
    
    const int32 OldTriangleCount = Indexes.Num() / 3;
    const int32 MinTriCount = 2000;
    // 用粗糙代表(coarse representation)代替原始的靜態網格資料。
    const bool bUseCoarseRepresentation = Settings.PercentTriangles < 1.0f && OldTriangleCount > MinTriCount;

    // 如果不用粗糙代表(coarse representation)替換原始的頂點緩衝, 去掉舊的拷貝資料.
    // 將它複製到cluster representation中, 在更長的DAG減少階段之前執行,以減少峰值記憶體持續時間。
    // 當並行構建多個巨大的Nanite網格時,這一點尤為重要。
    if (bUseCoarseRepresentation)
    {
        check(MeshTriangleCounts.Num() == 1);
        Verts.Empty();
        Indexes.Empty();
        MaterialIndexes.Empty();
    }

    uint32 Time0 = FPlatformTime::Cycles();

    FBounds MeshBounds;
    TArray<FClusterGroup> Groups; // Cluster組列表.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::DAG.Reduce);
        
        uint32 ClusterStart = 0;
        for (uint32 MeshIndex = 0; MeshIndex < NumMeshes; MeshIndex++)
        {
            uint32 NumClusters = ClusterCountPerMesh[MeshIndex];
            // 構建DAG(Directed Acyclic Graph,有向非迴圈圖),以減面減模, 並且附加Cluster和Group到對應陣列中.
            BuildDAG( Groups, Clusters, ClusterStart, NumClusters, MeshIndex, MeshBounds );
            ClusterStart += NumClusters;
        }
    }

    uint32 ReduceTime = FPlatformTime::Cycles();
    UE_LOG(LogStaticMesh, Log, TEXT("Reduce [%.2fs]"), FPlatformTime::ToMilliseconds(ReduceTime - Time0) / 1000.0f);

    // 使用粗糙代表.
    if (bUseCoarseRepresentation)
    {
        const uint32 CoarseStartTime = FPlatformTime::Cycles();
        int32 CoarseTriCount = FMath::Max(MinTriCount, int32((float(OldTriangleCount) * Settings.PercentTriangles)));

        TArray<FStaticMeshSection, TInlineAllocator<1>> CoarseSections = Sections;
        // 構建粗糙代表.
        BuildCoarseRepresentation(Groups, Clusters, Verts, Indexes, CoarseSections, NumTexCoords, CoarseTriCount);

        // 使用粗糙網格範圍修正網格section資訊, 同時遵守原始序號和保留材質.
        // 它不會以任何指定的三角形結束(由於抽取過程)。

        for (FStaticMeshSection& Section : Sections)
        {
            // 對於每個section的資訊,嘗試在粗略版本中找到一個匹配的條目。
            const FStaticMeshSection* CoarseSection = CoarseSections.FindByPredicate(
                [&Section](const FStaticMeshSection& CoarseSectionIter)
            {
                return CoarseSectionIter.MaterialIndex == Section.MaterialIndex;
            });

            // 找到匹配的條目
            if (CoarseSection != nullptr)
            {
                Section.FirstIndex     = CoarseSection->FirstIndex;
                Section.NumTriangles   = CoarseSection->NumTriangles;
                Section.MinVertexIndex = CoarseSection->MinVertexIndex;
                Section.MaxVertexIndex = CoarseSection->MaxVertexIndex;
            }
            // 未找到匹配的條目.
            else
            {
                // 由於抽取而被移除的部分,設定佔位符條目
                Section.FirstIndex     = 0;
                Section.NumTriangles   = 0;
                Section.MinVertexIndex = 0;
                Section.MaxVertexIndex = 0;
            }
        }

        const uint32 CoarseEndTime = FPlatformTime::Cycles();
        UE_LOG(LogStaticMesh, Log, TEXT("Coarse [%.2fs], original tris: %d, coarse tris: %d"), FPlatformTime::ToMilliseconds(CoarseEndTime - CoarseStartTime) / 1000.0f, OldTriangleCount, CoarseTriCount);
    }

    uint32 EncodeTime0 = FPlatformTime::Cycles();

    // 編碼Nanite網格.
    Encode( Resources, Settings, Clusters, Groups, MeshBounds, NumMeshes, NumTexCoords, bHasColors );

    uint32 EncodeTime1 = FPlatformTime::Cycles();
    UE_LOG( LogStaticMesh, Log, TEXT("Encode [%.2fs]"), FPlatformTime::ToMilliseconds( EncodeTime1 - EncodeTime0 ) / 1000.0f );

    // 只有一個網格時才生成Imposter.
    const bool bGenerateImposter = (NumMeshes == 1);
    if (bGenerateImposter)
    {
        uint32 ImposterStartTime = FPlatformTime::Cycles();
        auto& RootChildren = Groups.Last().Children;
    
        // Resources的ImposterAtlas.
        FImposterAtlas ImposterAtlas( Resources.ImposterAtlas, MeshBounds );

        // 並行生成Imposter.
        ParallelFor(FMath::Square(FImposterAtlas::AtlasSize),
            [&](int32 TileIndex)
        {
            FIntPoint TilePos(
                TileIndex % FImposterAtlas::AtlasSize,
                TileIndex / FImposterAtlas::AtlasSize);

            // 遍歷所有子Cluster, 光柵化到ImposterAtlas.
            for (int32 ClusterIndex = 0; ClusterIndex < RootChildren.Num(); ClusterIndex++)
            {
                ImposterAtlas.Rasterize(TilePos, Clusters[RootChildren[ClusterIndex]], ClusterIndex);
            }
        });

        UE_LOG(LogStaticMesh, Log, TEXT("Imposter [%.2fs]"), FPlatformTime::ToMilliseconds(FPlatformTime::Cycles() - ImposterStartTime ) / 1000.0f);
    }

    uint32 Time1 = FPlatformTime::Cycles();

    UE_LOG( LogStaticMesh, Log, TEXT("Nanite build [%.2fs]\n"), FPlatformTime::ToMilliseconds( Time1 - Time0 ) / 1000.0f );

    return true;
}

6.4.2.3 ClusterTriangles

// 為每個Section構建1或多個Cluster.
static void ClusterTriangles(
    const TArray< FStaticMeshBuildVertex >& Verts,
    const TArrayView< const uint32 >& Indexes,
    const TArrayView< const int32 >& MaterialIndexes,
    TArray< FCluster >& Clusters,    // Append
    const FBounds& MeshBounds,
    uint32 NumTexCoords,
    bool bHasColors )
{
    uint32 Time0 = FPlatformTime::Cycles();

    LOG_CRC( Verts );
    LOG_CRC( Indexes );

    uint32 NumTriangles = Indexes.Num() / 3;

    // 共享邊
    TArray< uint32 > SharedEdges; 
    SharedEdges.AddUninitialized( Indexes.Num() );

    // 邊界邊
    TBitArray<> BoundaryEdges; 
    BoundaryEdges.Init( false, Indexes.Num() );

    // 邊雜湊
    FHashTable EdgeHash( 1 << FMath::FloorLog2( Indexes.Num() ), Indexes.Num() );

    // 並行處理邊雜湊.
    ParallelFor( Indexes.Num(),
        [&]( int32 EdgeIndex )
        {

            uint32 VertIndex0 = Indexes[ EdgeIndex ];
            uint32 VertIndex1 = Indexes[ Cycle3( EdgeIndex ) ];
    
            const FVector& Position0 = Verts[ VertIndex0 ].Position;
            const FVector& Position1 = Verts[ VertIndex1 ].Position;
                
            uint32 Hash0 = HashPosition( Position0 );
            uint32 Hash1 = HashPosition( Position1 );
            uint32 Hash = Murmur32( { Hash0, Hash1 } );

            // 注意此處新增元素使用的是併發版本Add_Concurrent.
            EdgeHash.Add_Concurrent( Hash, EdgeIndex );
        });

    const int32 NumDwords = FMath::DivideAndRoundUp( BoundaryEdges.Num(), NumBitsPerDWORD );

    ParallelFor( NumDwords,
        [&]( int32 DwordIndex )
        {
            const int32 NumIndexes = Indexes.Num();
            const int32 NumBits = FMath::Min( NumBitsPerDWORD, NumIndexes - DwordIndex * NumBitsPerDWORD );

            uint32 Mask = 1;
            uint32 Dword = 0;
            for( int32 BitIndex = 0; BitIndex < NumBits; BitIndex++, Mask <<= 1 )
            {
                // 計算邊索引.
                int32 EdgeIndex = DwordIndex * NumBitsPerDWORD + BitIndex;

                uint32 VertIndex0 = Indexes[ EdgeIndex ];
                uint32 VertIndex1 = Indexes[ Cycle3( EdgeIndex ) ];
    
                const FVector& Position0 = Verts[ VertIndex0 ].Position;
                const FVector& Position1 = Verts[ VertIndex1 ].Position;
                
                uint32 Hash0 = HashPosition( Position0 );
                uint32 Hash1 = HashPosition( Position1 );
                uint32 Hash = Murmur32( { Hash1, Hash0 } );
    
                // 找到共享兩個頂點且方向相反的邊.
                /*
                      /\
                     /  \
                    o-<<-o
                    o->>-o
                     \  /
                      \/
                */
                uint32 FoundEdge = ~0u;
                for( uint32 OtherEdgeIndex = EdgeHash.First( Hash ); EdgeHash.IsValid( OtherEdgeIndex ); OtherEdgeIndex = EdgeHash.Next( OtherEdgeIndex ) )
                {
                    uint32 OtherVertIndex0 = Indexes[ OtherEdgeIndex ];
                    uint32 OtherVertIndex1 = Indexes[ Cycle3( OtherEdgeIndex ) ];
            
                    if( Position0 == Verts[ OtherVertIndex1 ].Position &&
                        Position1 == Verts[ OtherVertIndex0 ].Position )
                    {
                        // 找到匹配的邊.
                        // 雜湊表不是確定性的順序。找到穩定的匹配,而不僅僅是第一個。
                        FoundEdge = FMath::Min( FoundEdge, OtherEdgeIndex );
                    }
                }
                SharedEdges[ EdgeIndex ] = FoundEdge;
            
                if( FoundEdge == ~0u )
                {
                    Dword |= Mask;
                }
            }
            
            if( Dword )
            {
                BoundaryEdges.GetData()[ DwordIndex ] = Dword;
            }
        });

    // 不連貫的三角形集.
    FDisjointSet DisjointSet( NumTriangles );

    for( uint32 EdgeIndex = 0, Num = SharedEdges.Num(); EdgeIndex < Num; EdgeIndex++ )
    {
        uint32 OtherEdgeIndex = SharedEdges[ EdgeIndex ];
        if( OtherEdgeIndex != ~0u )
        {
            // OtherEdgeIndex是匹配EdgeIndex的最小索引.
            // ThisEdgeIndex是匹配OtherEdgeIndex的最小索引.

            uint32 ThisEdgeIndex = SharedEdges[ OtherEdgeIndex ];
            check( ThisEdgeIndex != ~0u );
            check( ThisEdgeIndex <= EdgeIndex );

            if( EdgeIndex > ThisEdgeIndex )
            {
                // 上一個元素指向OtherEdgeIndex
                SharedEdges[ EdgeIndex ] = ~0u;
            }
            else if( EdgeIndex > OtherEdgeIndex )
            {
                // 再次檢測.
                DisjointSet.UnionSequential( EdgeIndex / 3, OtherEdgeIndex / 3 );
            }
        }
    }

    uint32 BoundaryTime = FPlatformTime::Cycles();
    UE_LOG( LogStaticMesh, Log, TEXT("Boundary [%.2fs], tris: %i, UVs %i%s"), FPlatformTime::ToMilliseconds( BoundaryTime - Time0 ) / 1000.0f, Indexes.Num() / 3, NumTexCoords, bHasColors ? TEXT(", Color") : TEXT("") );

    LOG_CRC( SharedEdges );

    // 三角形劃分.
    FGraphPartitioner Partitioner( NumTriangles );

    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::PartitionGraph);

        // 獲取三角形的中心.
        auto GetCenter = [ &Verts, &Indexes ]( uint32 TriIndex )
        {
            FVector Center;
            Center  = Verts[ Indexes[ TriIndex * 3 + 0 ] ].Position;
            Center += Verts[ Indexes[ TriIndex * 3 + 1 ] ].Position;
            Center += Verts[ Indexes[ TriIndex * 3 + 2 ] ].Position;
            return Center * (1.0f / 3.0f);
        };
        // 構建位置連線.
        Partitioner.BuildLocalityLinks( DisjointSet, MeshBounds, GetCenter );

        auto* RESTRICT Graph = Partitioner.NewGraph( NumTriangles * 3 );

        // 處理劃分資料.
        for( uint32 i = 0; i < NumTriangles; i++ )
        {
            Graph->AdjacencyOffset[i] = Graph->Adjacency.Num();

            uint32 TriIndex = Partitioner.Indexes[i];

            for( int k = 0; k < 3; k++ )
            {
                uint32 EdgeIndex = SharedEdges[ 3 * TriIndex + k ];
                // 增加鄰邊.
                if( EdgeIndex != ~0u )
                {
                    Partitioner.AddAdjacency( Graph, EdgeIndex / 3, 4 * 65 );
                }
            }

            // 增加位置連線.
            Partitioner.AddLocalityLinks( Graph, TriIndex, 1 );
        }
        Graph->AdjacencyOffset[ NumTriangles ] = Graph->Adjacency.Num();

        // 精確地劃分Cluster.
        Partitioner.PartitionStrict( Graph, FCluster::ClusterSize - 4, FCluster::ClusterSize, true );
        check( Partitioner.Ranges.Num() );

        LOG_CRC( Partitioner.Ranges );
    }

    // 計算最理想的Cluster數量.
    const uint32 OptimalNumClusters = FMath::DivideAndRoundUp< int32 >( Indexes.Num(), FCluster::ClusterSize * 3 );

    uint32 ClusterTime = FPlatformTime::Cycles();
    UE_LOG( LogStaticMesh, Log, TEXT("Clustering [%.2fs]. Ratio: %f"), FPlatformTime::ToMilliseconds( ClusterTime - BoundaryTime ) / 1000.0f, (float)Partitioner.Ranges.Num() / OptimalNumClusters );

    const uint32 BaseCluster = Clusters.Num();
    Clusters.AddDefaulted( Partitioner.Ranges.Num() );

    // 筆者注: 大於32用單執行緒? 是否弄反了?
    const bool bSingleThreaded = Partitioner.Ranges.Num() > 32;
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildClusters);
        // 並行構建Cluster.
        ParallelFor( Partitioner.Ranges.Num(),
            [&]( int32 Index )
            {
                auto& Range = Partitioner.Ranges[ Index ];

                // 建立單個Cluster例項.
                Clusters[ BaseCluster + Index ] = FCluster( Verts,
                                                            Indexes,
                                                            MaterialIndexes,
                                                            BoundaryEdges, Range.Begin, Range.End, Partitioner.Indexes, NumTexCoords, bHasColors );

                // 負數標明它是個葉子.
                Clusters[ BaseCluster + Index ].EdgeLength *= -1.0f;
            }, bSingleThreaded);
    }

    uint32 LeavesTime = FPlatformTime::Cycles();
    UE_LOG( LogStaticMesh, Log, TEXT("Leaves [%.2fs]"), FPlatformTime::ToMilliseconds( LeavesTime - ClusterTime ) / 1000.0f );
}

6.4.2.4 FGraphPartitioner

上一小節的程式碼在處理Cluster時使用了FGraphPartitioner,下面進入它的程式碼分析:

// Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.h

(......)

// 引用了metis第三方開源庫.
#include "metis.h"

(......)

// Cluster劃分圖
class FGraphPartitioner
{
public:
    // 圖資料.
    struct FGraphData
    {
        int32    Offset; // 索引位移.
        int32    Num;    // 數量.

        TArray< idx_t >    Adjacency; // 鄰邊列表
        TArray< idx_t >    AdjacencyCost; // 鄰邊權重列表
        TArray< idx_t >    AdjacencyOffset; // 鄰邊位移列表
    };

    // 範圍是[Begin, End]
    struct FRange
    {
        uint32    Begin;
        uint32    End;

        bool operator<( const FRange& Other) const { return Begin < Other.Begin; }
    };
    TArray< FRange >    Ranges;
    TArray< uint32 >    Indexes;

public:
                FGraphPartitioner( uint32 InNumElements );

    // 構建新的子圖資料例項.
    FGraphData*    NewGraph( uint32 NumAdjacency ) const;

    // 增加鄰邊.
    void        AddAdjacency( FGraphData* Graph, uint32 AdjIndex, idx_t Cost );
    // 增加位置連線.
    void        AddLocalityLinks( FGraphData* Graph, uint32 Index, idx_t Cost );

    // 構建位置連線.
    template< typename FGetCenter >
    void        BuildLocalityLinks( FDisjointSet& DisjointSet, const FBounds& Bounds, FGetCenter& GetCenter );

    // 劃分Cluster.
    void        Partition( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize );
    // 精確地劃分Cluster.
    void        PartitionStrict( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize, bool bThreaded );

private:
    // 平分子圖.
    void        BisectGraph( FGraphData* Graph, FGraphData* ChildGraphs[2] );
    // 遞迴平分子圖.
    void        RecursiveBisectGraph( FGraphData* Graph );

    uint32        NumElements;
    int32        MinPartitionSize = 0;
    int32        MaxPartitionSize = 0;

    // Cluster數量. 用了原子, 以支援多執行緒讀寫.
    TAtomic< uint32 >    NumPartitions;

    TArray< idx_t >        PartitionIDs;
    TArray< int32 >        SwappedWith;
    TArray< uint32 >    SortedTo;

    // 位置連線.
    TMultiMap< uint32, uint32 >    LocalityLinks;
};

(......)

// Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.cpp

(......)

// 平分網格.
void FGraphPartitioner::BisectGraph( FGraphData* Graph, FGraphData* ChildGraphs[2] )
{
    ChildGraphs[0] = nullptr;
    ChildGraphs[1] = nullptr;

    // 增加分割槽回撥.
    auto AddPartition =
        [ this ]( int32 Offset, int32 Num )
        {
            FRange& Range = Ranges[ NumPartitions++ ];
            Range.Begin    = Offset;
            Range.End    = Offset + Num;
        };

    // 如果Graph的分割槽數量沒有超限, 則直接新增到this中.
    if( Graph->Num <= MaxPartitionSize )
    {
        AddPartition( Graph->Offset, Graph->Num );
        return;
    }

    // 計算預期的分割槽尺寸.
    const int32 TargetPartitionSize = ( MinPartitionSize + MaxPartitionSize ) / 2;
    const int32 TargetNumPartitions = FMath::Max( 2, FMath::DivideAndRoundNearest( Graph->Num, TargetPartitionSize ) );

    check( Graph->AdjacencyOffset.Num() == Graph->Num + 1 );

    idx_t NumConstraints = 1;
    idx_t NumParts = 2;
    idx_t EdgesCut = 0;

    real_t PartitionWeights[] = {
        float( TargetNumPartitions / 2 ) / TargetNumPartitions,
        1.0f - float( TargetNumPartitions / 2 ) / TargetNumPartitions
    };

    // 設定Metis庫的預設操作引數.
    idx_t Options[ METIS_NOPTIONS ];
    METIS_SetDefaultOptions( Options );

    // 在高層級允許寬鬆的容差, 嚴格的平衡在更接近分割槽大小之前並不重要。
    bool bLoose = TargetNumPartitions >= 128 || MaxPartitionSize / MinPartitionSize > 1;
    bool bSlow = Graph->Num < 4096;
    
    Options[ METIS_OPTION_UFACTOR ] = bLoose ? 200 : 1;
    //Options[ METIS_OPTION_NCUTS ] = Graph->Num < 1024 ? 8 : ( Graph->Num < 4096 ? 4 : 1 );
    //Options[ METIS_OPTION_NCUTS ] = bSlow ? 4 : 1;
    //Options[ METIS_OPTION_NITER ] = bSlow ? 20 : 10;
    //Options[ METIS_OPTION_IPTYPE ] = METIS_IPTYPE_RANDOM;
    //Options[ METIS_OPTION_MINCONN ] = 1;

    // 呼叫Metis的遞迴劃分.
    int r = METIS_PartGraphRecursive(
        &Graph->Num,
        &NumConstraints,            // number of balancing constraints
        Graph->AdjacencyOffset.GetData(),
        Graph->Adjacency.GetData(),
        NULL,                        // Vert weights
        NULL,                        // Vert sizes for computing the total communication volume
        Graph->AdjacencyCost.GetData(),    // Edge weights
        &NumParts,
        PartitionWeights,            // Target partition weight
        NULL,                        // Allowed load imbalance tolerance
        Options,
        &EdgesCut,
        PartitionIDs.GetData() + Graph->Offset
    );

    // 確認Metis遞迴劃分的結果有效.
    if( ensureAlways( r == METIS_OK ) )
    {
        // 在適當的位置劃分陣列.
        // 雙方都保持排序,但順序是顛倒的.
        int32 Front = Graph->Offset;
        int32 Back =  Graph->Offset + Graph->Num - 1;
        while( Front <= Back )
        {
            while( Front <= Back && PartitionIDs[ Front ] == 0 )
            {
                SwappedWith[ Front ] = Front;
                Front++;
            }
            while( Front <= Back && PartitionIDs[ Back ] == 1 )
            {
                SwappedWith[ Back ] = Back;
                Back--;
            }

            if( Front < Back )
            {
                Swap( Indexes[ Front ], Indexes[ Back ] );

                SwappedWith[ Front ] = Back;
                SwappedWith[ Back ] = Front;
                Front++;
                Back--;
            }
        }

        int32 Split = Front;

        int32 Num[2];
        Num[0] = Split - Graph->Offset;
        Num[1] = Graph->Offset + Graph->Num - Split;
                
        check( Num[0] > 1 );
        check( Num[1] > 1 );

        // 如果兩個子節點的分割槽尺寸未超限, 則直接新增.
        if( Num[0] <= MaxPartitionSize && Num[1] <= MaxPartitionSize )
        {
            AddPartition( Graph->Offset,    Num[0] );
            AddPartition( Split,            Num[1] );
        }
        else
        {
            // 建立兩個子節點例項.
            for( int32 i = 0; i < 2; i++ )
            {
                ChildGraphs[i] = new FGraphData;
                ChildGraphs[i]->Adjacency.Reserve( Graph->Adjacency.Num() >> 1 );
                ChildGraphs[i]->AdjacencyCost.Reserve( Graph->Adjacency.Num() >> 1 );
                ChildGraphs[i]->AdjacencyOffset.Reserve( Num[i] + 1 );
                ChildGraphs[i]->Num = Num[i];
            }

            ChildGraphs[0]->Offset = Graph->Offset;
            ChildGraphs[1]->Offset = Split;

            // 遍歷所有子分割槽, 將Graph的鄰邊加入到ChildGraphs[0]或ChildGraphs[1]
            for( int32 i = 0; i < Graph->Num; i++ )
            {
                // 這裡程式碼有點trick: 若i<=ChildGraphs[0]->Num則獲取ChildGraphs[0], 否則獲取ChildGraphs[1].
                FGraphData* ChildGraph = ChildGraphs[ i >= ChildGraphs[0]->Num ];

                ChildGraph->AdjacencyOffset.Add( ChildGraph->Adjacency.Num() );
                
                int32 OrgIndex = SwappedWith[ Graph->Offset + i ] - Graph->Offset;
                for( idx_t AdjIndex = Graph->AdjacencyOffset[ OrgIndex ]; AdjIndex < Graph->AdjacencyOffset[ OrgIndex + 1 ]; AdjIndex++ )
                {
                    idx_t Adj     = Graph->Adjacency[ AdjIndex ];
                    idx_t AdjCost = Graph->AdjacencyCost[ AdjIndex ];

                    // Remap to child
                    Adj = SwappedWith[ Graph->Offset + Adj ] - ChildGraph->Offset;

                    // Edge connects to node in this graph
                    if( 0 <= Adj && Adj < ChildGraph->Num )
                    {
                        ChildGraph->Adjacency.Add( Adj );
                        ChildGraph->AdjacencyCost.Add( AdjCost );
                    }
                }
            }
            ChildGraphs[0]->AdjacencyOffset.Add( ChildGraphs[0]->Adjacency.Num() );
            ChildGraphs[1]->AdjacencyOffset.Add( ChildGraphs[1]->Adjacency.Num() );
        }
    }
}

// 精確劃分
void FGraphPartitioner::PartitionStrict( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize, bool bThreaded )
{
    MinPartitionSize = InMinPartitionSize;
    MaxPartitionSize = InMaxPartitionSize;

    PartitionIDs.AddUninitialized( NumElements );
    SwappedWith.AddUninitialized( NumElements );

    // Adding to atomically so size big enough to not need to grow.
    int32 NumPartitionsExpected = FMath::DivideAndRoundUp( Graph->Num, MinPartitionSize );
    Ranges.AddUninitialized( NumPartitionsExpected * 2 );
    NumPartitions = 0;

    // 使用多執行緒.
    if( bThreaded && NumPartitionsExpected > 4 )
    {    
        extern CORE_API int32 GUseNewTaskBackend;
        // 使用後臺執行緒.
        if (GUseNewTaskBackend)
        {
            // 區域性工作佇列
            TLocalWorkQueue<FGraphData> LocalWork(Graph);
            // 這裡的Self指Lambda函式自身.
            LocalWork.Run(MakeYCombinator([this, &LocalWork](auto Self, FGraphData* Graph) -> void
            {
                FGraphData* ChildGraphs[2];
                // 平均劃分.
                BisectGraph( Graph, ChildGraphs );
                delete Graph;

                if( ChildGraphs[0] && ChildGraphs[1] )
                {
                    // 處理第1個子節點
                    // 只有在剩餘工作足夠大的情況下才會新增新的工作執行緒
                    if (ChildGraphs[0]->Num > 256)
                    {
                        LocalWork.AddTask(ChildGraphs[0]);
                        LocalWork.AddWorkers(1);
                    }
                    else // 否則遞迴呼叫.
                    {
                        Self(ChildGraphs[0]);
                    }
                    
                    // 處理第2個子節點
                    Self(ChildGraphs[1]);
                }
            }));
        }
        // 非後臺執行緒. 使用傳統的TaskGraph任務系統.
        else
        {
            const ENamedThreads::Type DesiredThread = IsInGameThread() ? ENamedThreads::AnyThread : ENamedThreads::AnyBackgroundThreadNormalTask;

            // 構建任務.
            class FBuildTask
            {
            public:
                FBuildTask( FGraphPartitioner* InPartitioner, FGraphData* InGraph, ENamedThreads::Type InDesiredThread)
                    : Partitioner( InPartitioner )
                    , Graph( InGraph )
                    , DesiredThread( InDesiredThread )
                {}

                void DoTask( ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionEvent )
                {
                    FGraphData* ChildGraphs[2];
                    Partitioner->BisectGraph( Graph, ChildGraphs );
                    delete Graph;

                    if( ChildGraphs[0] && ChildGraphs[1] )
                    {
                        if( ChildGraphs[0]->Num > 256 )
                        {
                            FGraphEventRef Task = TGraphTask< FBuildTask >::CreateTask().ConstructAndDispatchWhenReady( Partitioner, ChildGraphs[0], DesiredThread);
                            MyCompletionEvent->DontCompleteUntil( Task );
                        }
                        else
                        {
                            FBuildTask( Partitioner, ChildGraphs[0], DesiredThread).DoTask( CurrentThread, MyCompletionEvent );
                        }

                        FBuildTask( Partitioner, ChildGraphs[1], DesiredThread).DoTask( CurrentThread, MyCompletionEvent );
                    }
                }

                static FORCEINLINE TStatId GetStatId()
                {
                    RETURN_QUICK_DECLARE_CYCLE_STAT(FBuildTask, STATGROUP_ThreadPoolAsyncTasks);
                }

                static FORCEINLINE ESubsequentsMode::Type    GetSubsequentsMode()    { return ESubsequentsMode::TrackSubsequents; }

                FORCEINLINE ENamedThreads::Type GetDesiredThread() const
                {
                    return DesiredThread;
                }

            private:
                FGraphPartitioner*  Partitioner;
                FGraphData*         Graph;
                ENamedThreads::Type DesiredThread;
            };

            FGraphEventRef BuildTask = TGraphTask< FBuildTask >::CreateTask( nullptr ).ConstructAndDispatchWhenReady( this, Graph, DesiredThread);
            FTaskGraphInterface::Get().WaitUntilTaskCompletes( BuildTask );
        }
    }
    else
    {
        RecursiveBisectGraph( Graph );
    }

    Ranges.SetNum( NumPartitions );

    if( bThreaded )
    {
        // Force a deterministic order
        Ranges.Sort();
    }

    PartitionIDs.Empty();
    SwappedWith.Empty();
}

關於Nanite的網格劃分,這裡補充以下說明:

  • 構建Nanite時大量使用了並行化處理,包含但不限於處理邊雜湊、檢測共享邊和邊界邊、構建Cluster、劃分網格、生成Imposter等,以縮短Nanite資料的構建時間。
  • 劃分網格時,根據GUseNewTaskBackend決定是啟用新的後臺任務並行處理還是傳統的TaskGraph,新的後臺任務系統是UE5才加入的功能,更加輕便簡潔。
  • 均分網格時用到了第三方開源庫METIS的幾個關鍵介面:METIS_SetDefaultOptions、METIS_PartGraphKway、METIS_PartGraphRecursive。

METIS是一套用於劃分圖、劃分有限元網格和生成稀疏矩陣的填充約序的序列程式,在METIS中實現的演算法是基於Karypis實驗室開發的多級遞迴對分、多級k-way和多約束劃分方案。它的關鍵特性有:

  • 提供高品質的劃分。METIS產生的分割槽始終優於其他廣泛使用的演算法產生的分割槽。METIS產生的分割槽始終比光譜劃分演算法(spectral partitioning algorithms)產生的分割槽好10%到50%。
  • 處理速度異常快。大量實踐表明,METIS比其他廣泛使用的分割槽演算法快一到兩個數量級。在當前的工作站和pc機上,具有數百萬個頂點的圖形可以在幾秒鐘內劃分為256個部分。
  • 生成結果具有低填充率。由METIS產生的減少填充的排序明顯優於其他廣泛使用的演算法,包括多最小度(multiple minimum degree)。對於科學計算和線性規劃中出現的許多類問題,METIS能夠將稀疏矩陣分解的儲存和計算要求降低到一個數量級。與多最小度方法不同,METIS生成的消元樹適用於並行直接分解。此外,METIS能夠非常快地計算這些排序。在當前的工作站和pc上,具有數百萬行的矩陣可以在幾秒鐘內重新排序。

它還有並行化的版本ParMETIS。具體參加官方說明:Family of Graph and Hypergraph Partitioning Software

上圖存在多組配對圖,每組配對圖的左邊是基於普林斯頓劃分原則手動劃分的(深色的線表示手動劃分的邊),配對圖的右邊是演算法自動劃分而成(紅色是邊界)。可見自動劃分演算法可以和手動劃分高度匹配。

自動劃分演算法既有結合深度學習和視覺的方法,又有像METIS的基於數理的傳統演算法。而METIS的劃分演算法有3個階段:粗化(Coarsening)、劃分(Partitioning)、細分(Uncoarsening)。

在Coarsening階段,最大化匹配:沒有共同頂點的邊集合,查詢複雜度上存在NP完全問題。

Coarsening在匹配最大化邊緣時,存在NP完全問題,如a組明顯不是最多的非共享頂點邊數,b才是。

在Partitioning階段,需要兩個步驟,第一步是隨機選取一個根,第二步是寬度優先搜尋(breadth first search,BFS)以包含能夠獲得較少切邊的頂點。

在Uncoarsening階段的關鍵思路:每個父節點包含了一組子節點,通過從一個分割槽移動頂點到另一個分割槽來減少切邊。

6.4.2.5 BuildDAG

// Engine\Source\Developer\NaniteBuilder\Private\ClusterDAG.cpp

// 構建Cluster的有向非迴圈圖.
void BuildDAG( TArray< FClusterGroup >& Groups, TArray< FCluster >& Clusters, uint32 ClusterRangeStart, uint32 ClusterRangeNum, uint32 MeshIndex, FBounds& MeshBounds )
{
    uint32 LevelOffset    = ClusterRangeStart;
    
    TAtomic< uint32 >    NumClusters( Clusters.Num() );
    uint32                NumExternalEdges = 0;

    bool bFirstLevel = true;

    while( true )
    {
        TArrayView< FCluster > LevelClusters( &Clusters[LevelOffset], bFirstLevel ? ClusterRangeNum : (Clusters.Num() - LevelOffset) );
        bFirstLevel = false;
        
        for( FCluster& Cluster : LevelClusters )
        {
            NumExternalEdges    += Cluster.NumExternalEdges;
            MeshBounds            += Cluster.Bounds;
        }

        if( LevelClusters.Num() < 2 )
            break;

        // 如果該級別的Cluster少於每個組的最大數量, 直接新增到組列表.
        if( LevelClusters.Num() <= MaxGroupSize )
        {
            TArray< uint32, TInlineAllocator< MaxGroupSize > > Children;

            uint32 MaxParents = 0;
            for( FCluster& Cluster : LevelClusters )
            {
                MaxParents += FMath::DivideAndRoundUp< uint32 >( Cluster.Indexes.Num(), FCluster::ClusterSize * 6 );
                Children.Add( LevelOffset++ );
            }

            LevelOffset = Clusters.Num();
            Clusters.AddDefaulted( MaxParents );
            Groups.AddDefaulted( 1 );

            // 使用DAG減頂點減面並新增到對應組.
            DAGReduce( Groups, Clusters, NumClusters, Children, Groups.Num() - 1, MeshIndex );

            // Correct num to atomic count
            Clusters.SetNum( NumClusters, false );

            continue;
        }
        
        // 該級別的Cluster數量大於MaxGroupSize, 需要用FGraphPartitioner進行劃分.
        
        // 外部邊緣結構體
        struct FExternalEdge
        {
            uint32    ClusterIndex;
            uint32    EdgeIndex;
        };
        // 外部邊緣列表.
        TArray< FExternalEdge >    ExternalEdges;
        FHashTable                ExternalEdgeHash;
        TAtomic< uint32 >        ExternalEdgeOffset(0);

        // 有NumExternalEdges的總數,所以可以分配一個不增長的雜湊表。
        ExternalEdges.AddUninitialized( NumExternalEdges );
        ExternalEdgeHash.Clear( 1 << FMath::FloorLog2( NumExternalEdges ), NumExternalEdges );
        NumExternalEdges = 0;

        // 並行地增加邊緣到雜湊表.
        ParallelFor( LevelClusters.Num(),
            [&]( uint32 ClusterIndex )
            {
                FCluster& Cluster = LevelClusters[ ClusterIndex ];

                for( TConstSetBitIterator<> SetBit( Cluster.ExternalEdges ); SetBit; ++SetBit )
                {
                    uint32 EdgeIndex = SetBit.GetIndex();

                    uint32 VertIndex0 = Cluster.Indexes[ EdgeIndex ];
                    uint32 VertIndex1 = Cluster.Indexes[ Cycle3( EdgeIndex ) ];
    
                    const FVector& Position0 = Cluster.GetPosition( VertIndex0 );
                    const FVector& Position1 = Cluster.GetPosition( VertIndex1 );

                    uint32 Hash0 = HashPosition( Position0 );
                    uint32 Hash1 = HashPosition( Position1 );
                    uint32 Hash = Murmur32( { Hash0, Hash1 } );

                    uint32 ExternalEdgeIndex = ExternalEdgeOffset++;
                    ExternalEdges[ ExternalEdgeIndex ] = { ClusterIndex, EdgeIndex };
                    ExternalEdgeHash.Add_Concurrent( Hash, ExternalEdgeIndex );
                }
            });

        check( ExternalEdgeOffset == ExternalEdges.Num() );

        TAtomic< uint32 > NumAdjacency(0);

        // 並行地在其它Cluster查詢匹配邊緣.
        ParallelFor( LevelClusters.Num(),
            [&]( uint32 ClusterIndex )
            {
                FCluster& Cluster = LevelClusters[ ClusterIndex ];

                for( TConstSetBitIterator<> SetBit( Cluster.ExternalEdges ); SetBit; ++SetBit )
                {
                    uint32 EdgeIndex = SetBit.GetIndex();

                    uint32 VertIndex0 = Cluster.Indexes[ EdgeIndex ];
                    uint32 VertIndex1 = Cluster.Indexes[ Cycle3( EdgeIndex ) ];
    
                    const FVector& Position0 = Cluster.GetPosition( VertIndex0 );
                    const FVector& Position1 = Cluster.GetPosition( VertIndex1 );

                    uint32 Hash0 = HashPosition( Position0 );
                    uint32 Hash1 = HashPosition( Position1 );
                    uint32 Hash = Murmur32( { Hash1, Hash0 } );

                    for( uint32 ExternalEdgeIndex = ExternalEdgeHash.First( Hash ); ExternalEdgeHash.IsValid( ExternalEdgeIndex ); ExternalEdgeIndex = ExternalEdgeHash.Next( ExternalEdgeIndex ) )
                    {
                        FExternalEdge ExternalEdge = ExternalEdges[ ExternalEdgeIndex ];

                        FCluster& OtherCluster = LevelClusters[ ExternalEdge.ClusterIndex ];

                        if( OtherCluster.ExternalEdges[ ExternalEdge.EdgeIndex ] )
                        {
                            uint32 OtherVertIndex0 = OtherCluster.Indexes[ ExternalEdge.EdgeIndex ];
                            uint32 OtherVertIndex1 = OtherCluster.Indexes[ Cycle3( ExternalEdge.EdgeIndex ) ];
            
                            if( Position0 == OtherCluster.GetPosition( OtherVertIndex1 ) &&
                                Position1 == OtherCluster.GetPosition( OtherVertIndex0 ) )
                            {
                                // 找到匹配邊緣, 增加其計數.
                                Cluster.AdjacentClusters.FindOrAdd( ExternalEdge.ClusterIndex, 0 )++;

                                // Can't break or a triple edge might be non-deterministically connected.
                                // Need to find all matching, not just first.
                            }
                        }
                    }
                }
                NumAdjacency += Cluster.AdjacentClusters.Num();

                // 強制鄰邊的確定性順序。
                Cluster.AdjacentClusters.KeySort(
                    [ &LevelClusters ]( uint32 A, uint32 B )
                    {
                        return LevelClusters[A].GUID < LevelClusters[B].GUID;
                    } );
            });

        // 不連續的Cluster的集合.
        FDisjointSet DisjointSet( LevelClusters.Num() );

        for( uint32 ClusterIndex = 0; ClusterIndex < (uint32)LevelClusters.Num(); ClusterIndex++ )
        {
            for( auto& Pair : LevelClusters[ ClusterIndex ].AdjacentClusters )
            {
                uint32 OtherClusterIndex = Pair.Key;

                uint32 Count = LevelClusters[ OtherClusterIndex ].AdjacentClusters.FindChecked( ClusterIndex );
                check( Count == Pair.Value );

                if( ClusterIndex > OtherClusterIndex )
                {
                    DisjointSet.UnionSequential( ClusterIndex, OtherClusterIndex );
                }
            }
        }

        // 劃分器.
        FGraphPartitioner Partitioner( LevelClusters.Num() );

        // 排序以強制確定性順序。
        {
            TArray< uint32 > SortedIndexes;
            SortedIndexes.AddUninitialized( Partitioner.Indexes.Num() );
            RadixSort32( SortedIndexes.GetData(), Partitioner.Indexes.GetData(), Partitioner.Indexes.Num(),
                [&]( uint32 Index )
                {
                    return LevelClusters[ Index ].GUID;
                } );
            Swap( Partitioner.Indexes, SortedIndexes );
        }

        auto GetCenter = [&]( uint32 Index )
        {
            FBounds& Bounds = LevelClusters[ Index ].Bounds;
            return 0.5f * ( Bounds.Min + Bounds.Max );
        };
        // 構建位置連線.
        Partitioner.BuildLocalityLinks( DisjointSet, MeshBounds, GetCenter );

        auto* RESTRICT Graph = Partitioner.NewGraph( NumAdjacency );

        // 遍歷所有層級的Cluster, 再遍歷每個層級上的所有Cluster, 增加鄰邊和位置連線.
        for( int32 i = 0; i < LevelClusters.Num(); i++ )
        {
            Graph->AdjacencyOffset[i] = Graph->Adjacency.Num();

            uint32 ClusterIndex = Partitioner.Indexes[i];

            for( auto& Pair : LevelClusters[ ClusterIndex ].AdjacentClusters )
            {
                uint32 OtherClusterIndex = Pair.Key;
                uint32 NumSharedEdges = Pair.Value;

                const auto& Cluster0 = Clusters[ LevelOffset + ClusterIndex ];
                const auto& Cluster1 = Clusters[ LevelOffset + OtherClusterIndex ];

                bool bSiblings = Cluster0.GroupIndex != MAX_uint32 && Cluster0.GroupIndex == Cluster1.GroupIndex;

                Partitioner.AddAdjacency( Graph, OtherClusterIndex, NumSharedEdges * ( bSiblings ? 1 : 16 ) + 4 );
            }

            Partitioner.AddLocalityLinks( Graph, ClusterIndex, 1 );
        }
        Graph->AdjacencyOffset[ Graph->Num ] = Graph->Adjacency.Num();

        LOG_CRC( Graph->Adjacency );
        LOG_CRC( Graph->AdjacencyCost );
        LOG_CRC( Graph->AdjacencyOffset );

        // 嚴格分割槽.
        Partitioner.PartitionStrict( Graph, MinGroupSize, MaxGroupSize, true );

        LOG_CRC( Partitioner.Ranges );

        // 計算最大父親數量.
        uint32 MaxParents = 0;
        for( auto& Range : Partitioner.Ranges )
        {
            uint32 NumParentIndexes = 0;
            for( uint32 i = Range.Begin; i < Range.End; i++ )
            {
                // Global indexing is needed in Reduce()
                Partitioner.Indexes[i] += LevelOffset;
                NumParentIndexes += Clusters[ Partitioner.Indexes[i] ].Indexes.Num();
            }
            MaxParents += FMath::DivideAndRoundUp( NumParentIndexes, FCluster::ClusterSize * 6 );
        }

        LevelOffset = Clusters.Num();

        Clusters.AddDefaulted( MaxParents );
        Groups.AddDefaulted( Partitioner.Ranges.Num() );

        // 並行地執行DAG減面減模.
        ParallelFor( Partitioner.Ranges.Num(),
            [&]( int32 PartitionIndex )
            {
                auto& Range = Partitioner.Ranges[ PartitionIndex ];

                TArrayView< uint32 > Children( &Partitioner.Indexes[ Range.Begin ], Range.End - Range.Begin );
                uint32 ClusterGroupIndex = PartitionIndex + Groups.Num() - Partitioner.Ranges.Num();

                DAGReduce( Groups, Clusters, NumClusters, Children, ClusterGroupIndex, MeshIndex );
            });

        // Correct num to atomic count
        Clusters.SetNum( NumClusters, false );
    }
    
    // 最大輸出根節點.
    uint32 RootIndex = LevelOffset;
    FClusterGroup RootClusterGroup;
    RootClusterGroup.Children.Add( RootIndex );
    RootClusterGroup.Bounds = Clusters[ RootIndex ].SphereBounds;
    RootClusterGroup.LODBounds = FSphere( 0 );
    RootClusterGroup.MaxParentLODError = 1e10f;
    RootClusterGroup.MinLODError = -1.0f;
    RootClusterGroup.MipLevel = Clusters[RootIndex].MipLevel + 1;
    RootClusterGroup.MeshIndex = MeshIndex;
    Clusters[ RootIndex ].GroupIndex = Groups.Num();
    Groups.Add( RootClusterGroup );
}

上面數次執行了DAGReduce,簡析其實現:

static void DAGReduce( TArray< FClusterGroup >& Groups, TArray< FCluster >& Clusters, TAtomic< uint32 >& NumClusters, TArrayView< uint32 > Children, int32 GroupIndex, uint32 MeshIndex )
{
    check( GroupIndex >= 0 );

    // 合併Cluster.
    TArray< const FCluster*, TInlineAllocator<16> > MergeList;
    for( int32 Child : Children )
    {
        MergeList.Add( &Clusters[ Child ] );
    }
    
    // 強制有序。
    MergeList.Sort(
        []( const FCluster& A, const FCluster& B )
        {
            return A.GUID < B.GUID;
        } );

    FCluster Merged( MergeList );

    int32 NumParents = FMath::DivideAndRoundUp< int32 >( Merged.Indexes.Num(), FCluster::ClusterSize * 6 );
    int32 ParentStart = 0;
    int32 ParentEnd = 0;

    float ParentMaxLODError = 0.0f;

    // 注意TargetClusterSize的步長-2.
    for( int32 TargetClusterSize = FCluster::ClusterSize - 2; TargetClusterSize > FCluster::ClusterSize / 2; TargetClusterSize -= 2 )
    {
        int32 TargetNumTris = NumParents * TargetClusterSize;

        // 簡化, 會返回父節點最大LOD誤差.
        ParentMaxLODError = Merged.Simplify( TargetNumTris );

        // 拆分
        if( NumParents == 1 )
        {
            ParentEnd = ( NumClusters += NumParents );
            ParentStart = ParentEnd - NumParents;

            Clusters[ ParentStart ] = Merged;
            Clusters[ ParentStart ].Bound();
            break;
        }
        else
        {
            FGraphPartitioner Partitioner( Merged.Indexes.Num() / 3 );
            Merged.Split( Partitioner );

            if( Partitioner.Ranges.Num() <= NumParents )
            {
                NumParents = Partitioner.Ranges.Num();
                ParentEnd = ( NumClusters += NumParents );
                ParentStart = ParentEnd - NumParents;

                int32 Parent = ParentStart;
                for( auto& Range : Partitioner.Ranges )
                {
                    Clusters[ Parent ] = FCluster( Merged, Range.Begin, Range.End, Partitioner.Indexes );
                    Parent++;
                }

                break;
            }
        }
    }

    TArray< FSphere, TInlineAllocator<32> > Children_LODBounds;
    TArray< FSphere, TInlineAllocator<32> > Children_SphereBounds;
    
    // 強制單調地巢狀(monotonic nesting).
    float ChildMinLODError = MAX_flt;
    for( int32 Child : Children )
    {
        bool bLeaf = Clusters[ Child ].EdgeLength < 0.0f;
        float LODError = Clusters[ Child ].LODError;

        Children_LODBounds.Add( Clusters[ Child ].LODBounds );
        Children_SphereBounds.Add( Clusters[ Child ].SphereBounds );
        ChildMinLODError = FMath::Min( ChildMinLODError, bLeaf ? -1.0f : LODError );
        ParentMaxLODError = FMath::Max( ParentMaxLODError, LODError );

        Clusters[ Child ].GroupIndex = GroupIndex;
        Groups[ GroupIndex ].Children.Add( Child );
        check( Groups[ GroupIndex ].Children.Num() <= MAX_CLUSTERS_PER_GROUP_TARGET );
    }
    
    FSphere    ParentLODBounds( Children_LODBounds.GetData(), Children_LODBounds.Num() );
    FSphere    ParentBounds( Children_SphereBounds.GetData(), Children_SphereBounds.Num() );

    // 強制父節點都有相同的LOD資料, 它們彼此依賴.
    for( int32 Parent = ParentStart; Parent < ParentEnd; Parent++ )
    {
        Clusters[ Parent ].LODBounds            = ParentLODBounds;
        Clusters[ Parent ].LODError                = ParentMaxLODError;
        Clusters[ Parent ].GeneratingGroupIndex = GroupIndex;
    }

    Groups[ GroupIndex ].Bounds                = ParentBounds;
    Groups[ GroupIndex ].LODBounds            = ParentLODBounds;
    Groups[ GroupIndex ].MinLODError        = ChildMinLODError;
    Groups[ GroupIndex ].MaxParentLODError    = ParentMaxLODError;
    Groups[ GroupIndex ].MipLevel            = Merged.MipLevel - 1;
    Groups[ GroupIndex ].MeshIndex            = MeshIndex;
}

6.4.2.6 BuildCoarseRepresentation

BuildCoarseRepresentation根據輸入的Cluster列表和Cluster組列表構建網格的粗糙代表,輸出對應的頂點、索引、Section等資料:

static void BuildCoarseRepresentation(
    const TArray<FClusterGroup>& Groups,
    const TArray<FCluster>& Clusters,
    TArray<FStaticMeshBuildVertex>& Verts,
    TArray<uint32>& Indexes,
    TArray<FStaticMeshSection, TInlineAllocator<1>>& Sections,
    uint32& NumTexCoords,
    uint32 TargetNumTris
)
{
    FCluster CoarseRepresentation = FindDAGCut(Groups, Clusters, TargetNumTris + 4096);

    CoarseRepresentation.Simplify(TargetNumTris);

    TArray< FStaticMeshSection, TInlineAllocator<1> > OldSections = Sections;

    // 需要更新粗糙代表的UV計數以匹配新的資料。
    NumTexCoords = CoarseRepresentation.NumTexCoords;

    // 重建頂點資料。
    Verts.Empty(CoarseRepresentation.NumVerts);
    for (uint32 Iter = 0, Num = CoarseRepresentation.NumVerts; Iter < Num; ++Iter)
    {
        FStaticMeshBuildVertex Vertex = {};
        Vertex.Position = CoarseRepresentation.GetPosition(Iter);
        Vertex.TangentX = FVector::ZeroVector;
        Vertex.TangentY = FVector::ZeroVector;
        Vertex.TangentZ = CoarseRepresentation.GetNormal(Iter);

        const FVector2D* UVs = CoarseRepresentation.GetUVs(Iter);
        for (uint32 UVIndex = 0; UVIndex < NumTexCoords; ++UVIndex)
        {
            Vertex.UVs[UVIndex] = UVs[UVIndex].ContainsNaN() ? FVector2D::ZeroVector : UVs[UVIndex];
        }

        if (CoarseRepresentation.bHasColors)
        {
            Vertex.Color = CoarseRepresentation.GetColor(Iter).ToFColor(false /* sRGB */);
        }

        Verts.Add(Vertex);
    }

    TArray<FMaterialTriangle, TInlineAllocator<128>> CoarseMaterialTris;
    TArray<FMaterialRange, TInlineAllocator<4>> CoarseMaterialRanges;

    // 計算粗糙代表的材質範圍.
    BuildMaterialRanges(
        CoarseRepresentation.Indexes,
        CoarseRepresentation.MaterialIndexes,
        CoarseMaterialTris,
        CoarseMaterialRanges);
    check(CoarseMaterialRanges.Num() <= OldSections.Num());

    // 重建section資料.
    Sections.Reset(CoarseMaterialRanges.Num());
    for (const FStaticMeshSection& OldSection : OldSections)
    {
        // 根據計算的材質範圍新增新的section.
        // 強制材質順序與OldSections一樣.
        const FMaterialRange* FoundRange = CoarseMaterialRanges.FindByPredicate([&OldSection](const FMaterialRange& Range) { return Range.MaterialIndex == OldSection.MaterialIndex; });

        // 如果它們的源資料沒有包含足夠的三角形,那麼它們實際上可以從粗糙網格中刪除.
        if (FoundRange)
        {
            // 從原始網格section複製屬性。
            FStaticMeshSection Section(OldSection);

            // 渲染section時使用的頂點和索引的範圍.
            Section.FirstIndex = FoundRange->RangeStart * 3;
            Section.NumTriangles = FoundRange->RangeLength;
            Section.MinVertexIndex = TNumericLimits<uint32>::Max();
            Section.MaxVertexIndex = TNumericLimits<uint32>::Min();

            for (uint32 TriangleIndex = 0; TriangleIndex < (FoundRange->RangeStart + FoundRange->RangeLength); ++TriangleIndex)
            {
                const FMaterialTriangle& Triangle = CoarseMaterialTris[TriangleIndex];

                // 更新最小頂點索引.
                Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index0);
                Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index1);
                Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index2);

                // 更新最大頂點索引.
                Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index0);
                Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index1);
                Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index2);
            }

            Sections.Add(Section);
        }
    }

    // 重建索引資料.
    Indexes.Reset();
    for (const FMaterialTriangle& Triangle : CoarseMaterialTris)
    {
        Indexes.Add(Triangle.Index0);
        Indexes.Add(Triangle.Index1);
        Indexes.Add(Triangle.Index2);
    }

    // 計算切線.
    CalcTangents(Verts, Indexes);
}

6.4.2.7 NaniteEncode

Encode將Nanite資源根據FMeshNaniteSettings編碼到Cluster和Cluster組中:

// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp

void Encode(
    FResources& Resources,
    const FMeshNaniteSettings& Settings,
    TArray< FCluster >& Clusters,
    TArray< FClusterGroup >& Groups,
    const FBounds& MeshBounds,
    uint32 NumMeshes,
    uint32 NumTexCoords,
    bool bHasColors )
{
    // 刪除退化的三角形.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::RemoveDegenerateTriangles);
        RemoveDegenerateTriangles( Clusters );
    }

    // 構建材質範圍.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildMaterialRanges);
        BuildMaterialRanges( Clusters );
    }

    // 約束Cluster.
#if USE_CONSTRAINED_CLUSTERS
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::ConstrainClusters);
        ConstrainClusters( Groups, Clusters );
    }
    (......)
#endif

    // 計算量化的位置.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::CalculateQuantizedPositions);
        // 需要在cluster被約束和拆分之後觸發。
        Resources.PositionPrecision = CalculateQuantizedPositionsUniformGrid( Clusters, MeshBounds, Settings );    
    }

    // 輸出材質範圍統計資訊.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::PrintMaterialRangeStats);
        PrintMaterialRangeStats( Clusters );
    }

    TArray<FPage> Pages;
    TArray<FClusterGroupPart> GroupParts;
    TArray<FEncodingInfo> EncodingInfos;

    // 計算編碼資訊.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::CalculateEncodingInfos);
        CalculateEncodingInfos(EncodingInfos, Clusters, bHasColors, NumTexCoords);
    }

    // 分配Cluster到Page頁表.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::AssignClustersToPages);
        AssignClustersToPages(Groups, Clusters, EncodingInfos, Pages, GroupParts);
    }

    // 構建Cluster組的層級節點.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildHierarchyNodes);
        BuildHierarchies(Resources, Groups, GroupParts, NumMeshes);
    }

    // 將Cluster和Cluster組的資訊寫入Page頁表.
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::WritePages);
        WritePages(Resources, Pages, Groups, GroupParts, Clusters, EncodingInfos, NumTexCoords);
    }
}

上面編碼的過程涉及了很多重要介面,下面一一分析它們:

// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp

static void RemoveDegenerateTriangles(TArray<FCluster>& Clusters)
{
    // 並行地刪除Cluster列表的退化三角形.
    ParallelFor( Clusters.Num(),
        [&]( uint32 ClusterIndex )
        {
            RemoveDegenerateTriangles( Clusters[ ClusterIndex ] );
        } );
}

// 刪除單個Cluster的退化三角形.
static void RemoveDegenerateTriangles(FCluster& Cluster)
{
    uint32 NumOldTriangles = Cluster.NumTris;
    uint32 NumNewTriangles = 0;

    for (uint32 OldTriangleIndex = 0; OldTriangleIndex < NumOldTriangles; OldTriangleIndex++)
    {
        uint32 i0 = Cluster.Indexes[OldTriangleIndex * 3 + 0];
        uint32 i1 = Cluster.Indexes[OldTriangleIndex * 3 + 1];
        uint32 i2 = Cluster.Indexes[OldTriangleIndex * 3 + 2];
        uint32 mi = Cluster.MaterialIndexes[OldTriangleIndex];

        // 如果不是退化三角形, 則3個頂點的資料必然彼此不一樣.
        // 筆者注: 也許這裡可以做優化, 比如同一個三角形的任意兩個頂點的距離小於某個閾值(0.01f)時也算退化三角形.
        if (i0 != i1 && i0 != i2 && i1 != i2)
        {
            Cluster.Indexes[NumNewTriangles * 3 + 0] = i0;
            Cluster.Indexes[NumNewTriangles * 3 + 1] = i1;
            Cluster.Indexes[NumNewTriangles * 3 + 2] = i2;
            Cluster.MaterialIndexes[NumNewTriangles] = mi;

            NumNewTriangles++;
        }
    }
    Cluster.NumTris = NumNewTriangles;
    Cluster.Indexes.SetNum(NumNewTriangles * 3);
    Cluster.MaterialIndexes.SetNum(NumNewTriangles);
}

// 將Cluster三角形分類到材質範圍內, 新增材質範圍到Cluster。
static void BuildMaterialRanges( TArray<FCluster>& Clusters )
{
    // 並行處理.
    ParallelFor( Clusters.Num(),
        [&]( uint32 ClusterIndex )
        {
            BuildMaterialRanges( Clusters[ ClusterIndex ] );
        } );
}

static void BuildMaterialRanges(FCluster& Cluster)
{
    TArray<FMaterialTriangle, TInlineAllocator<128>> MaterialTris;
    
    // 構建單個Cluster的材質範圍.
    BuildMaterialRanges(
        Cluster.Indexes,
        Cluster.MaterialIndexes,
        MaterialTris,
        Cluster.MaterialRanges);

    // 將索引寫回到Cluster.
    for (uint32 Triangle = 0; Triangle < Cluster.NumTris; ++Triangle)
    {
        Cluster.Indexes[Triangle * 3 + 0] = MaterialTris[Triangle].Index0;
        Cluster.Indexes[Triangle * 3 + 1] = MaterialTris[Triangle].Index1;
        Cluster.Indexes[Triangle * 3 + 2] = MaterialTris[Triangle].Index2;
        Cluster.MaterialIndexes[Triangle] = MaterialTris[Triangle].MaterialIndex;
    }
}

// 約束Cluster.
static void ConstrainClusters( TArray< FClusterGroup >& ClusterGroups, TArray< FCluster >& Clusters )
{
    // 計算統計資訊.
    uint32 TotalOldTriangles = 0;
    uint32 TotalOldVertices = 0;
    for( const FCluster& Cluster : Clusters )
    {
        TotalOldTriangles += Cluster.NumTris;
        TotalOldVertices += Cluster.NumVerts;
    }

    // 並行地約束Cluster, 區分是否使用帶狀索引.
    ParallelFor( Clusters.Num(),
        [&]( uint32 i )
        {
#if USE_STRIP_INDICES // 使用帶狀索引.
            FStripifier Stripifier;
            Stripifier.ConstrainAndStripifyCluster(Clusters[i]);
#else // 不使用帶狀索引.
            ConstrainClusterFIFO(Clusters[i]);
#endif
        } );
    
    uint32 TotalNewTriangles = 0;
    uint32 TotalNewVertices = 0;

    // 約束cluster.
    const uint32 NumOldClusters = Clusters.Num();
    for( uint32 i = 0; i < NumOldClusters; i++ )
    {
        TotalNewTriangles += Clusters[ i ].NumTris;
        TotalNewVertices += Clusters[ i ].NumVerts;

        // 如果Cluster太多頂點(多於256個), 則拆分它們.
        if( Clusters[ i ].NumVerts > 256 )
        {
            FCluster ClusterA, ClusterB;
            
            uint32 NumTrianglesA = Clusters[ i ].NumTris / 2;
            uint32 NumTrianglesB = Clusters[ i ].NumTris - NumTrianglesA;
            
            BuildClusterFromClusterTriangleRange( Clusters[ i ], ClusterA, 0, NumTrianglesA );
            BuildClusterFromClusterTriangleRange( Clusters[ i ], ClusterB, NumTrianglesA, NumTrianglesB );
            
            Clusters[ i ] = ClusterA;
            ClusterGroups[ ClusterB.GroupIndex ].Children.Add( Clusters.Num() );
            Clusters.Add( ClusterB );
        }
    }

    // 計算統計資訊.
    uint32 TotalNewTrianglesWithSplits = 0;
    uint32 TotalNewVerticesWithSplits = 0;
    for( const FCluster& Cluster : Clusters )
    {
        TotalNewTrianglesWithSplits += Cluster.NumTris;
        TotalNewVerticesWithSplits += Cluster.NumVerts;
    }

    (......)
}

// 計算量化位置的均勻格子.
static int32 CalculateQuantizedPositionsUniformGrid(TArray< FCluster >& Clusters, const FBounds& MeshBounds, const FMeshNaniteSettings& Settings)
{    
    // 為EA簡化全域性的量化值.
    const int32 MaxPositionQuantizedValue    = (1 << MAX_POSITION_QUANTIZATION_BITS) - 1;
    
    int32 PositionPrecision = Settings.PositionPrecision;
    if (PositionPrecision == MIN_int32)
    {
        // 自動: 從葉子層級的邊界上猜測需要的精度.
        const float MaxSize = MeshBounds.GetExtent().GetMax();

        // 啟發: 如果網格更密集,需要更高的解析度.
        // 使用cluster大小的幾何平均值作為密度的代理.
        // 另一種解讀: 位精度是cluster所需的平均值.
        // 對於大小大致相同的cluster,這給出的結果與舊的量化程式碼非常相似.
        double TotalLogSize = 0.0;
        int32 TotalNum = 0;
        for (const FCluster& Cluster : Clusters)
        {
            if (Cluster.MipLevel == 0)
            {
                float ExtentSize = Cluster.Bounds.GetExtent().Size();
                if (ExtentSize > 0.0)
                {
                    TotalLogSize += FMath::Log2(ExtentSize);
                    TotalNum++;
                }
            }
        }
        double AvgLogSize = TotalNum > 0 ? TotalLogSize / TotalNum : 0.0;
        PositionPrecision = 7 - FMath::RoundToInt(AvgLogSize);

        // 截斷精度. 使用者現在需要明確選擇最低精度設定.
        // 這些設定可能會導致問題,並且對節省磁碟大小的貢獻很小(在測試專案中約為0.4%), 所以不應該自動選擇它們.
        // 例如:一個非常低解析度的道路或建築框架,在孤立狀態下看起來不需要什麼精度, 但是在一個場景中仍然需要相當高的精度,因為更小的網格被放置在上面或裡面.
        const int32 AUTO_MIN_PRECISION = 4;    // 最小精度是1/16cm.
        PositionPrecision = FMath::Max(PositionPrecision, AUTO_MIN_PRECISION);
    }

    // 計算量化比例. 
    float QuantizationScale = FMath::Exp2((float)PositionPrecision);

    // 確保所有cluster都是可編碼的。一個足夠大的cluster可能會達到21bpc的極限。如果發生了,就縮小規模,直到合適為止。
    for (const FCluster& Cluster : Clusters)
    {
        const FBounds& Bounds = Cluster.Bounds;
        
        int32 Iterations = 0;
        while (true)
        {
            float MinX = FMath::RoundToFloat(Bounds.Min.X * QuantizationScale);
            float MinY = FMath::RoundToFloat(Bounds.Min.Y * QuantizationScale);
            float MinZ = FMath::RoundToFloat(Bounds.Min.Z * QuantizationScale);

            float MaxX = FMath::RoundToFloat(Bounds.Max.X * QuantizationScale);
            float MaxY = FMath::RoundToFloat(Bounds.Max.Y * QuantizationScale);
            float MaxZ = FMath::RoundToFloat(Bounds.Max.Z * QuantizationScale);

            if (MinX >= (double)MIN_int32 && MinY >= (double)MIN_int32 && MinZ >= (double)MIN_int32 &&    // MIN_int32/MAX_int32 is not representable in float
                MaxX <= (double)MAX_int32 && MaxY <= (double)MAX_int32 && MaxZ <= (double)MAX_int32 &&
                ((int32)MaxX - (int32)MinX) <= MaxPositionQuantizedValue && ((int32)MaxY - (int32)MinY) <= MaxPositionQuantizedValue && ((int32)MaxZ - (int32)MinZ) <= MaxPositionQuantizedValue)
            {
                break;
            }
            
            QuantizationScale *= 0.5f;
            PositionPrecision--;
            check(++Iterations < 100);    // Endless loop?
        }
    }

    const float RcpQuantizationScale = 1.0f / QuantizationScale;

    // 並行地處理位置量化.
    ParallelFor(Clusters.Num(), [&](uint32 ClusterIndex)
    {
        FCluster& Cluster = Clusters[ClusterIndex];
        
        const uint32 NumClusterVerts = Cluster.NumVerts;
        const uint32 ClusterShift = Cluster.QuantizedPosShift;

        Cluster.QuantizedPositions.SetNumUninitialized(NumClusterVerts);

        // 量化位置.
        FIntVector IntClusterMax = { MIN_int32,    MIN_int32, MIN_int32 };
        FIntVector IntClusterMin = { MAX_int32,    MAX_int32, MAX_int32 };

        for (uint32 i = 0; i < NumClusterVerts; i++)
        {
            const FVector Position = Cluster.GetPosition(i);

            FIntVector& IntPosition = Cluster.QuantizedPositions[i];
            float PosX = FMath::RoundToFloat(Position.X * QuantizationScale);
            float PosY = FMath::RoundToFloat(Position.Y * QuantizationScale);
            float PosZ = FMath::RoundToFloat(Position.Z * QuantizationScale);

            IntPosition = FIntVector((int32)PosX, (int32)PosY, (int32)PosZ);

            IntClusterMax.X = FMath::Max(IntClusterMax.X, IntPosition.X);
            IntClusterMax.Y = FMath::Max(IntClusterMax.Y, IntPosition.Y);
            IntClusterMax.Z = FMath::Max(IntClusterMax.Z, IntPosition.Z);
            IntClusterMin.X = FMath::Min(IntClusterMin.X, IntPosition.X);
            IntClusterMin.Y = FMath::Min(IntClusterMin.Y, IntPosition.Y);
            IntClusterMin.Z = FMath::Min(IntClusterMin.Z, IntPosition.Z);
        }

        // 儲存最小位數.
        const uint32 NumBitsX = FMath::CeilLogTwo(IntClusterMax.X - IntClusterMin.X + 1);
        const uint32 NumBitsY = FMath::CeilLogTwo(IntClusterMax.Y - IntClusterMin.Y + 1);
        const uint32 NumBitsZ = FMath::CeilLogTwo(IntClusterMax.Z - IntClusterMin.Z + 1);
        check(NumBitsX <= MAX_POSITION_QUANTIZATION_BITS);
        check(NumBitsY <= MAX_POSITION_QUANTIZATION_BITS);
        check(NumBitsZ <= MAX_POSITION_QUANTIZATION_BITS);

        for (uint32 i = 0; i < NumClusterVerts; i++)
        {
            FIntVector& IntPosition = Cluster.QuantizedPositions[i];

            // 用量化資料更新浮點位置.
            Cluster.GetPosition(i) = FVector(IntPosition.X * RcpQuantizationScale, IntPosition.Y * RcpQuantizationScale, IntPosition.Z * RcpQuantizationScale);
            
            IntPosition.X -= IntClusterMin.X;
            IntPosition.Y -= IntClusterMin.Y;
            IntPosition.Z -= IntClusterMin.Z;
            check(IntPosition.X >= 0 && IntPosition.X < (1 << NumBitsX));
            check(IntPosition.Y >= 0 && IntPosition.Y < (1 << NumBitsY));
            check(IntPosition.Z >= 0 && IntPosition.Z < (1 << NumBitsZ));
        }

        // 更新包圍盒.
        Cluster.Bounds.Min = FVector(IntClusterMin.X * RcpQuantizationScale, IntClusterMin.Y * RcpQuantizationScale, IntClusterMin.Z * RcpQuantizationScale);
        Cluster.Bounds.Max = FVector(IntClusterMax.X * RcpQuantizationScale, IntClusterMax.Y * RcpQuantizationScale, IntClusterMax.Z * RcpQuantizationScale);

        Cluster.MeshBoundsMin = FVector::ZeroVector;
        Cluster.MeshBoundsDelta = FVector(RcpQuantizationScale);

        Cluster.QuantizedPosBits = FIntVector(NumBitsX, NumBitsY, NumBitsZ);
        Cluster.QuantizedPosStart = IntClusterMin;
        Cluster.QuantizedPosShift = 0;

    } );
    
    return PositionPrecision;
}

// 計算一組Cluster的編碼資訊.
static void CalculateEncodingInfos(TArray<FEncodingInfo>& EncodingInfos, const TArray<Nanite::FCluster>& Clusters, bool bHasColors, uint32 NumTexCoords)
{
    uint32 NumClusters = Clusters.Num();
    EncodingInfos.SetNumUninitialized(NumClusters);

    for (uint32 i = 0; i < NumClusters; i++)
    {
        CalculateEncodingInfo(EncodingInfos[i], Clusters[i], bHasColors, NumTexCoords);
    }
}

// 計算單個Cluster的編碼資訊.
static void CalculateEncodingInfo(FEncodingInfo& Info, const Nanite::FCluster& Cluster, bool bHasColors, uint32 NumTexCoords)
{
    const uint32 NumClusterVerts = Cluster.NumVerts;
    const uint32 NumClusterTris = Cluster.NumTris;

    FMemory::Memzero(Info);

    // 寫三角形索引。索引儲存在一個密集的位流中,每個索引使用ceil(log2(NumClusterVerices))位。著色器實現了未對齊的位流讀取來支援這一點。
    const uint32 BitsPerIndex = NumClusterVerts > 1 ? (FGenericPlatformMath::FloorLog2(NumClusterVerts - 1) + 1) : 0;
    const uint32 BitsPerTriangle = BitsPerIndex + 2 * 5;    // Base index + two 5-bit offsets
    Info.BitsPerIndex = BitsPerIndex;

    // 計算頁資訊.
    FPageSections& GpuSizes = Info.GpuSizes;
    GpuSizes.Cluster = sizeof(FPackedCluster);
    GpuSizes.MaterialTable = CalcMaterialTableSize(Cluster) * sizeof(uint32);
    GpuSizes.DecodeInfo = NumTexCoords * sizeof(FUVRange);
    GpuSizes.Index = (NumClusterTris * BitsPerTriangle + 31) / 32 * 4;

#if USE_UNCOMPRESSED_VERTEX_DATA // 使用未壓縮的頂點資料.
    const uint32 AttribBytesPerVertex = (3 * sizeof(float) + sizeof(uint32) + NumTexCoords * 2 * sizeof(float));

    Info.BitsPerAttribute = AttribBytesPerVertex * 8;
    Info.ColorMin = FIntVector4(0, 0, 0, 0);
    Info.ColorBits = FIntVector4(8, 8, 8, 8);
    Info.ColorMode = VERTEX_COLOR_MODE_VARIABLE;
    Info.UVPrec = 0;

    GpuSizes.Position = NumClusterVerts * 3 * sizeof(float);
    GpuSizes.Attribute = NumClusterVerts * AttribBytesPerVertex;
#else // 使用壓縮的頂點資料.
    Info.BitsPerAttribute = 2 * NORMAL_QUANTIZATION_BITS;

    check(NumClusterVerts > 0);
    const bool bIsLeaf = (Cluster.GeneratingGroupIndex == INVALID_GROUP_INDEX);

    // 頂點顏色.
    Info.ColorMode = VERTEX_COLOR_MODE_WHITE;
    Info.ColorMin = FIntVector4(255, 255, 255, 255);
    if (bHasColors)
    {
        FIntVector4 ColorMin = FIntVector4( 255, 255, 255, 255);
        FIntVector4 ColorMax = FIntVector4( 0, 0, 0, 0);
        for (uint32 i = 0; i < NumClusterVerts; i++)
        {
            FColor Color = Cluster.GetColor(i).ToFColor(false);
            ColorMin.X = FMath::Min(ColorMin.X, (int32)Color.R);
            ColorMin.Y = FMath::Min(ColorMin.Y, (int32)Color.G);
            ColorMin.Z = FMath::Min(ColorMin.Z, (int32)Color.B);
            ColorMin.W = FMath::Min(ColorMin.W, (int32)Color.A);
            ColorMax.X = FMath::Max(ColorMax.X, (int32)Color.R);
            ColorMax.Y = FMath::Max(ColorMax.Y, (int32)Color.G);
            ColorMax.Z = FMath::Max(ColorMax.Z, (int32)Color.B);
            ColorMax.W = FMath::Max(ColorMax.W, (int32)Color.A);
        }

        const FIntVector4 ColorDelta = ColorMax - ColorMin;
        const int32 R_Bits = FMath::CeilLogTwo(ColorDelta.X + 1);
        const int32 G_Bits = FMath::CeilLogTwo(ColorDelta.Y + 1);
        const int32 B_Bits = FMath::CeilLogTwo(ColorDelta.Z + 1);
        const int32 A_Bits = FMath::CeilLogTwo(ColorDelta.W + 1);
        
        uint32 NumColorBits = R_Bits + G_Bits + B_Bits + A_Bits;
        Info.BitsPerAttribute += NumColorBits;
        Info.ColorMin = ColorMin;
        Info.ColorBits = FIntVector4(R_Bits, G_Bits, B_Bits, A_Bits);
        if (NumColorBits > 0)
        {
            Info.ColorMode = VERTEX_COLOR_MODE_VARIABLE;
        }
        else 
        {
            if (ColorMin.X == 255 && ColorMin.Y == 255 && ColorMin.Z == 255 && ColorMin.W == 255)
                Info.ColorMode = VERTEX_COLOR_MODE_WHITE;
            else
                Info.ColorMode = VERTEX_COLOR_MODE_CONSTANT;
        }
    }

    for( uint32 UVIndex = 0; UVIndex < NumTexCoords; UVIndex++ )
    {
        FGeometryEncodingUVInfo& UVInfo = Info.UVInfos[UVIndex];
        // 分塊壓縮紋理座標.
        // 紋理座標相對於Cluster的最小/最大UV座標儲存.
        // UV接縫產生非常大的稀疏邊界矩形. 為了減輕這一點,最大的差距在U和V的邊界矩形被排除在編碼空間.
        // 解碼這個非常簡單: UV += (UV >= GapStart) ? GapRange : 0;
        // 生成有序的U和V陣列.
        TArray<float> UValues;
        TArray<float> VValues;
        UValues.AddUninitialized(NumClusterVerts);
        VValues.AddUninitialized(NumClusterVerts);
        for (uint32 i = 0; i < NumClusterVerts; i++)
        {
            const FVector2D& UV = Cluster.GetUVs(i)[ UVIndex ];
            UValues[i] = UV.X;
            VValues[i] = UV.Y;
        }

        UValues.Sort();
        VValues.Sort();

        // 找出有序uv之間的最大差距
        FVector2D LargestGapStart = FVector2D(UValues[0], VValues[0]);
        FVector2D LargestGapEnd = FVector2D(UValues[0], VValues[0]);
        for (uint32 i = 0; i < NumClusterVerts - 1; i++)
        {
            if (UValues[i + 1] - UValues[i] > LargestGapEnd.X - LargestGapStart.X)
            {
                LargestGapStart.X = UValues[i];
                LargestGapEnd.X = UValues[i + 1];
            }
            if (VValues[i + 1] - VValues[i] > LargestGapEnd.Y - LargestGapStart.Y)
            {
                LargestGapStart.Y = VValues[i];
                LargestGapEnd.Y = VValues[i + 1];
            }
        }

        const FVector2D UVMin = FVector2D(UValues[0], VValues[0]);
        const FVector2D UVMax = FVector2D(UValues[NumClusterVerts - 1], VValues[NumClusterVerts - 1]);
        const FVector2D UVDelta = UVMax - UVMin;

        const FVector2D UVRcpDelta = FVector2D(    UVDelta.X > SMALL_NUMBER ? 1.0f / UVDelta.X : 0.0f,
                                                UVDelta.Y > SMALL_NUMBER ? 1.0f / UVDelta.Y : 0.0f);

        const FVector2D NonGapLength = FVector2D::Max(UVDelta - (LargestGapEnd - LargestGapStart), FVector2D(0.0f, 0.0f));
        const FVector2D NormalizedGapStart = (LargestGapStart - UVMin) * UVRcpDelta;
        const FVector2D NormalizedGapEnd = (LargestGapEnd - UVMin) * UVRcpDelta;

        const FVector2D NormalizedNonGapLength = NonGapLength * UVRcpDelta;

#if 1
        const float TexCoordUnitPrecision = (1 << 14);    // TODO: Implement UI + 'Auto' mode that decides when this is necessary.

        int32 TexCoordBitsU = 0;
        if (UVDelta.X > 0)
        {
            // 即使當NonGapLength=0時,UVDelta是非零的,所以至少需要2個值(1bit)來區分高和低。
            int32 NumValues = FMath::Max(FMath::CeilToInt(NonGapLength.X * TexCoordUnitPrecision), 2);
            // 限制在12位, 從下面的臨時hack可知已足夠好了.
            TexCoordBitsU = FMath::Min((int32)FMath::CeilLogTwo(NumValues), 12);
        }

        int32 TexCoordBitsV = 0;
        if (UVDelta.Y > 0)
        {
            int32 NumValues = FMath::Max(FMath::CeilToInt(NonGapLength.Y * TexCoordUnitPrecision), 2);
            TexCoordBitsV = FMath::Min((int32)FMath::CeilLogTwo(NumValues), 12);
        }
#else
        // 臨時hack以修正編碼問題.
        const int32 TexCoordBitsU = 12;
        const int32 TexCoordBitsV = 12;
#endif

        // 處理UV座標和大小.
        Info.UVPrec |= ((TexCoordBitsV << 4) | TexCoordBitsU) << (UVIndex * 8);

        const int32 TexCoordMaxValueU = (1 << TexCoordBitsU) - 1;
        const int32 TexCoordMaxValueV = (1 << TexCoordBitsV) - 1;

        const int32 NU = (int32)FMath::Clamp(NormalizedNonGapLength.X > SMALL_NUMBER ? (TexCoordMaxValueU - 2) / NormalizedNonGapLength.X : 0.0f, (float)TexCoordMaxValueU, (float)0xFFFF);
        const int32 NV = (int32)FMath::Clamp(NormalizedNonGapLength.Y > SMALL_NUMBER ? (TexCoordMaxValueV - 2) / NormalizedNonGapLength.Y : 0.0f, (float)TexCoordMaxValueV, (float)0xFFFF);

        int32 GapStartU = TexCoordMaxValueU + 1;
        int32 GapStartV = TexCoordMaxValueV + 1;
        int32 GapLengthU = 0;
        int32 GapLengthV = 0;
        if (NU > TexCoordMaxValueU)
        {
            GapStartU = int32(NormalizedGapStart.X * NU + 0.5f) + 1;
            const int32 GapEndU = int32(NormalizedGapEnd.X * NU + 0.5f);
            GapLengthU = FMath::Max(GapEndU - GapStartU, 0);
        }
        if (NV > TexCoordMaxValueV)
        {
            GapStartV = int32(NormalizedGapStart.Y * NV + 0.5f) + 1;
            const int32 GapEndV = int32(NormalizedGapEnd.Y * NV + 0.5f);
            GapLengthV = FMath::Max(GapEndV - GapStartV, 0);
        }

        UVInfo.UVRange.Min = UVMin;
        UVInfo.UVRange.Scale = FVector2D(NU > 0 ? UVDelta.X / NU : 0.0f, NV > 0 ? UVDelta.Y / NV : 0.0f);
        
        check(GapStartU >= 0);
        check(GapStartV >= 0);
        UVInfo.UVRange.GapStart[0] = GapStartU;
        UVInfo.UVRange.GapStart[1] = GapStartV;
        UVInfo.UVRange.GapLength[0] = GapLengthU;
        UVInfo.UVRange.GapLength[1] = GapLengthV;
        
        UVInfo.UVDelta = UVDelta;
        UVInfo.UVRcpDelta = UVRcpDelta;
        UVInfo.NU = NU;
        UVInfo.NV = NV;

        Info.BitsPerAttribute += TexCoordBitsU + TexCoordBitsV;
    }

    const uint32 PositionBitsPerVertex = Cluster.QuantizedPosBits.X + Cluster.QuantizedPosBits.Y + Cluster.QuantizedPosBits.Z;
    GpuSizes.Position = (NumClusterVerts * PositionBitsPerVertex + 31) / 32 * 4;
    GpuSizes.Attribute = (NumClusterVerts * Info.BitsPerAttribute + 31) / 32 * 4;
#endif
}

/*
構建流式Page
Page佈局:
    Fixup Chunk (僅載入到CPU記憶體)
    FPackedCluster
    MaterialRangeTable
    GeometryData
*/
static void AssignClustersToPages(
    TArray< FClusterGroup >& ClusterGroups,
    TArray< FCluster >& Clusters,
    const TArray< FEncodingInfo >& EncodingInfos,
    TArray<FPage>& Pages,
    TArray<FClusterGroupPart>& Parts
    )
{
    check(Pages.Num() == 0);
    check(Parts.Num() == 0);

    const uint32 NumClusterGroups = ClusterGroups.Num();
    Pages.AddDefaulted();

    SortGroupClusters(ClusterGroups, Clusters);
    TArray<uint32> ClusterGroupPermutation = CalculateClusterGroupPermutation(ClusterGroups);

    for (uint32 i = 0; i < NumClusterGroups; i++)
    {
        // 挑選最好的下一個Group.
        uint32 GroupIndex = ClusterGroupPermutation[i];
        FClusterGroup& Group = ClusterGroups[GroupIndex];
        uint32 GroupStartPage = INVALID_PAGE_INDEX;
    
        for (uint32 ClusterIndex : Group.Children)
        {
            // 挑選最好的下一個Cluster.
            FCluster& Cluster = Clusters[ClusterIndex];
            const FEncodingInfo& EncodingInfo = EncodingInfos[ClusterIndex];

            // 加入Page.
            FPage* Page = &Pages.Top();
            if (Page->GpuSizes.GetTotal() + EncodingInfo.GpuSizes.GetTotal() > CLUSTER_PAGE_GPU_SIZE || Page->NumClusters + 1 > MAX_CLUSTERS_PER_PAGE)
            {
                // Page已滿, 需要新增一個.
                Pages.AddDefaulted();
                Page = &Pages.Top();
            }
            
            // 檢測是否增加新的FClusterGroupPart.
            if (Page->PartsNum == 0 || Parts[Page->PartsStartIndex + Page->PartsNum - 1].GroupIndex != GroupIndex)
            {
                if (Page->PartsNum == 0)
                {
                    Page->PartsStartIndex = Parts.Num();
                }
                Page->PartsNum++;

                FClusterGroupPart& Part = Parts.AddDefaulted_GetRef();
                Part.GroupIndex = GroupIndex;
            }

            // 新增cluster到page.
            uint32 PageIndex = Pages.Num() - 1;
            uint32 PartIndex = Parts.Num() - 1;

            FClusterGroupPart& Part = Parts.Last();
            if (Part.Clusters.Num() == 0)
            {
                Part.PageClusterOffset = Page->NumClusters;
                Part.PageIndex = PageIndex;
            }
            Part.Clusters.Add(ClusterIndex);
            check(Part.Clusters.Num() <= MAX_CLUSTERS_PER_GROUP);

            Cluster.GroupPartIndex = PartIndex;
            
            if (GroupStartPage == INVALID_PAGE_INDEX)
            {
                GroupStartPage = PageIndex;
            }
            
            Page->GpuSizes += EncodingInfo.GpuSizes;
            Page->NumClusters++;
        }

        Group.PageIndexStart = GroupStartPage;
        Group.PageIndexNum = Pages.Num() - GroupStartPage;
        check(Group.PageIndexNum >= 1);
        check(Group.PageIndexNum <= MAX_GROUP_PARTS_MASK);
    }

    // 重新計算group part的包圍盒.
    for (FClusterGroupPart& Part : Parts)
    {
        check(Part.Clusters.Num() <= MAX_CLUSTERS_PER_GROUP);
        check(Part.PageIndex < (uint32)Pages.Num());

        FBounds Bounds;
        for (uint32 ClusterIndex : Part.Clusters)
        {
            Bounds += Clusters[ClusterIndex].Bounds;
        }
        Part.Bounds = Bounds;
    }
}

// 構建ClusterGroup層級結構.
static void BuildHierarchies(FResources& Resources, const TArray<FClusterGroup>& Groups, TArray<FClusterGroupPart>& Parts, uint32 NumMeshes)
{
    TArray<TArray<uint32>> PartsByMesh;
    PartsByMesh.SetNum(NumMeshes);

    // 將group part分配給它們所屬的網格.
    const uint32 NumTotalParts = Parts.Num();
    for (uint32 PartIndex = 0; PartIndex < NumTotalParts; PartIndex++)
    {
        FClusterGroupPart& Part = Parts[PartIndex];
        PartsByMesh[Groups[Part.GroupIndex].MeshIndex].Add(PartIndex);
    }

    for (uint32 MeshIndex = 0; MeshIndex < NumMeshes; MeshIndex++)
    {
        const TArray<uint32>& PartIndices = PartsByMesh[MeshIndex];
        const uint32 NumParts = PartIndices.Num();
        
        int32 MaxMipLevel = 0;
        for (uint32 i = 0; i < NumParts; i++)
        {
            MaxMipLevel = FMath::Max(MaxMipLevel, Groups[Parts[PartIndices[i]].GroupIndex].MipLevel);
        }

        TArray< FIntermediateNode >    Nodes;
        Nodes.SetNum(NumParts);

        // 為每個網格的LOD層級構建葉子節點.
        TArray<TArray<uint32>> NodesByMip;
        NodesByMip.SetNum(MaxMipLevel + 1);
        for (uint32 i = 0; i < NumParts; i++)
        {
            const uint32 PartIndex = PartIndices[i];
            const FClusterGroupPart& Part = Parts[PartIndex];
            const FClusterGroup& Group = Groups[Part.GroupIndex];

            const int32 MipLevel = Group.MipLevel;
            FIntermediateNode& Node = Nodes[i];
            Node.Bound = Part.Bounds;
            Node.PartIndex = PartIndex;
            Node.MipLevel = Group.MipLevel;
            Node.bLeaf = true;
            NodesByMip[Group.MipLevel].Add(i);
        }

        uint32 RootIndex = 0;
        if (Nodes.Num() == 1)
        {
            // 只是一個葉子節點, 需要特殊設定, 因為根節點總是一個內部節點。
            FIntermediateNode& Node = Nodes.AddDefaulted_GetRef();
            Node.Children.Add(0);
            Node.Bound = Nodes[0].Bound;
            RootIndex = 1;
        }
        else
        {
            // 構建層次結構(Hierarchy):
            // Nanite網格包含了許多LOD級的Cluster資料. 不同層級的Cluster大小可以相差很大, 這對建立良好的Hierarchy儼然是個挑戰.
            // 除了可見性包圍盒,該Hierarchy還跟蹤子節點的保守LOD誤差度量。
            // 只要子節點是可見的,並且保守LOD誤差不會比我們所尋找的更詳細,執行時遍歷就會下降。
            // 當混合來自不同LOD的Cluster時,我們必須非常小心,因為不太詳細的Cluster很容易導致包圍盒和誤差度量的膨脹。
            // 我們已經嘗試了許多LOD混合方法,但目前看來,為每個LOD級別構建單獨的Hierarchy,然後再構建這些Hierarchy的Hierarchy,可以得到最好的、最可預測的結果。
            TArray<uint32> LevelRoots;
            for (int32 MipLevel = 0; MipLevel <= MaxMipLevel; MipLevel++)
            {
                if (NodesByMip[MipLevel].Num() > 0)
                {
                    // 為mip層級構建一個hierarchy, 使用了自頂向下分離法.
                    uint32 NodeIndex = BuildHierarchyTopDown(Nodes, NodesByMip[MipLevel], true);

                    if (Nodes[NodeIndex].bLeaf || Nodes[NodeIndex].Children.Num() == MAX_BVH_NODE_FANOUT)
                    {
                        // 葉子或填充節點, 直接加入.
                        LevelRoots.Add(NodeIndex);
                    }
                    else
                    {
                        // 不完整的節點。丟棄編碼,並將子節點新增為根節點.
                        LevelRoots.Append(Nodes[NodeIndex].Children);
                    }
                }
            }
            // 構建頂層hierarchy, 是MIP hierarchies的hierarchy.
            RootIndex = BuildHierarchyTopDown(Nodes, LevelRoots, false);
        }

        check(Nodes.Num() > 0);

#if BVH_BUILD_WRITE_GRAPHVIZ
        WriteDotGraph(Nodes);
#endif

        TArray< FHierarchyNode > HierarchyNodes;
        BuildHierarchyRecursive(HierarchyNodes, Nodes, Groups, Parts, RootIndex);

        // 轉換hierarchy成壓縮格式.
        const uint32 NumHierarchyNodes = HierarchyNodes.Num();
        const uint32 PackedBaseIndex = Resources.HierarchyNodes.Num();
        Resources.HierarchyRootOffsets.Add(PackedBaseIndex);
        Resources.HierarchyNodes.AddDefaulted(NumHierarchyNodes);
        for (uint32 i = 0; i < NumHierarchyNodes; i++)
        {
            // 壓縮Hierarchy節點.
            PackHierarchyNode(Resources.HierarchyNodes[PackedBaseIndex + i], HierarchyNodes[i], Groups, Parts);
        }
    }
}

// 寫入頁表.
static void WritePages(    FResources& Resources,
                        TArray<FPage>& Pages,
                        const TArray<FClusterGroup>& Groups,
                        const TArray<FClusterGroupPart>& Parts,
                        const TArray<FCluster>& Clusters,
                        const TArray<FEncodingInfo>& EncodingInfos,
                        uint32 NumTexCoords)
{
    check(Resources.PageStreamingStates.Num() == 0);

    const bool bLZCompress = true;

    TArray< uint8 > StreamableBulkData;
    
    const uint32 NumPages = Pages.Num();
    const uint32 NumClusters = Clusters.Num();
    Resources.PageStreamingStates.SetNum(NumPages);

    // 處理FixupChunk.
    uint32 TotalGPUSize = 0;
    TArray<FFixupChunk> FixupChunks;
    FixupChunks.SetNum(NumPages);
    for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
    {
        const FPage& Page = Pages[PageIndex];
        FFixupChunk& FixupChunk = FixupChunks[PageIndex];
        FixupChunk.Header.NumClusters = Page.NumClusters;

        uint32 NumHierarchyFixups = 0;
        for (uint32 i = 0; i < Page.PartsNum; i++)
        {
            const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
            NumHierarchyFixups += Groups[Part.GroupIndex].PageIndexNum;
        }

        FixupChunk.Header.NumHierachyFixups = NumHierarchyFixups;    // NumHierarchyFixups must be set before writing cluster fixups
        TotalGPUSize += Page.GpuSizes.GetTotal();
    }

    // 向Page新增額外的修正.
    for (const FClusterGroupPart& Part : Parts)
    {
        check(Part.PageIndex < NumPages);

        const FClusterGroup& Group = Groups[Part.GroupIndex];
        for (uint32 ClusterPositionInPart = 0; ClusterPositionInPart < (uint32)Part.Clusters.Num(); ClusterPositionInPart++)
        {
            const FCluster& Cluster = Clusters[Part.Clusters[ClusterPositionInPart]];
            if (Cluster.GeneratingGroupIndex != INVALID_GROUP_INDEX)
            {
                const FClusterGroup& GeneratingGroup = Groups[Cluster.GeneratingGroupIndex];
                check(GeneratingGroup.PageIndexNum >= 1);

                if (GeneratingGroup.PageIndexStart == Part.PageIndex && GeneratingGroup.PageIndexNum == 1)
                    continue;    // Dependencies already met by current page. Fixup directly instead.

                uint32 PageDependencyStart = GeneratingGroup.PageIndexStart;
                uint32 PageDependencyNum = GeneratingGroup.PageIndexNum;
                RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum);    // Root page should never be a dependency

                const FClusterFixup ClusterFixup = FClusterFixup(Part.PageIndex, Part.PageClusterOffset + ClusterPositionInPart, PageDependencyStart, PageDependencyNum);
                for (uint32 i = 0; i < GeneratingGroup.PageIndexNum; i++)
                {
                    FFixupChunk& FixupChunk = FixupChunks[GeneratingGroup.PageIndexStart + i];
                    FixupChunk.GetClusterFixup(FixupChunk.Header.NumClusterFixups++) = ClusterFixup;
                }
            }
        }
    }

    // 生成page依賴.
    for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
    {
        const FFixupChunk& FixupChunk = FixupChunks[PageIndex];
        FPageStreamingState& PageStreamingState = Resources.PageStreamingStates[PageIndex];
        PageStreamingState.DependenciesStart = Resources.PageDependencies.Num();

        for (uint32 i = 0; i < FixupChunk.Header.NumClusterFixups; i++)
        {
            uint32 FixupPageIndex = FixupChunk.GetClusterFixup(i).GetPageIndex();
            check(FixupPageIndex < NumPages);
            if (IsRootPage(FixupPageIndex) || FixupPageIndex == PageIndex)    // Never emit dependencies to ourselves or a root page.
                continue;

            // 沒有在集合內才增加.
            // O(n^2), 但實際上依賴數量會比較小.
            bool bFound = false;
            for (uint32 j = PageStreamingState.DependenciesStart; j < (uint32)Resources.PageDependencies.Num(); j++)
            {
                if (Resources.PageDependencies[j] == FixupPageIndex)
                {
                    bFound = true;
                    break;
                }
            }

            if (bFound)
                continue;

            Resources.PageDependencies.Add(FixupPageIndex);
        }
        PageStreamingState.DependenciesNum = Resources.PageDependencies.Num() - PageStreamingState.DependenciesStart;
    }

    // 處理page.
    struct FPageResult
    {
        TArray<uint8> Data;
        uint32 UncompressedSize;
    };
    TArray< FPageResult > PageResults;
    PageResults.SetNum(NumPages);

    // 並行處理
    ParallelFor(NumPages, [&Resources, &Pages, &Groups, &Parts, &Clusters, &EncodingInfos, &FixupChunks, &PageResults, NumTexCoords, bLZCompress](int32 PageIndex)
    {
        const FPage& Page = Pages[PageIndex];
        FFixupChunk& FixupChunk = FixupChunks[PageIndex];

        // 增加hierarchy修正.
        {
            // Parts include the hierarchy fixups for all the other parts of the same group.
            uint32 NumHierarchyFixups = 0;
            for (uint32 i = 0; i < Page.PartsNum; i++)
            {
                const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
                const FClusterGroup& Group = Groups[Part.GroupIndex];
                const uint32 HierarchyRootOffset = Resources.HierarchyRootOffsets[Group.MeshIndex];

                uint32 PageDependencyStart = Group.PageIndexStart;
                uint32 PageDependencyNum = Group.PageIndexNum;
                RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum);

                // Add fixups to all parts of the group
                for (uint32 j = 0; j < Group.PageIndexNum; j++)
                {
                    const FPage& Page2 = Pages[Group.PageIndexStart + j];
                    for (uint32 k = 0; k < Page2.PartsNum; k++)
                    {
                        const FClusterGroupPart& Part2 = Parts[Page2.PartsStartIndex + k];
                        if (Part2.GroupIndex == Part.GroupIndex)
                        {
                            const uint32 GlobalHierarchyNodeIndex = HierarchyRootOffset + Part2.HierarchyNodeIndex;
                            FixupChunk.GetHierarchyFixup(NumHierarchyFixups++) = FHierarchyFixup(Part2.PageIndex, GlobalHierarchyNodeIndex, Part2.HierarchyChildIndex, Part2.PageClusterOffset, PageDependencyStart, PageDependencyNum);
                            break;
                        }
                    }
                }
            }
            check(NumHierarchyFixups == FixupChunk.Header.NumHierachyFixups);
        }

        // Pack clusters and generate material range data
        TArray<uint32>                CombinedStripBitmaskData;
        TArray<uint32>                CombinedVertexRefBitmaskData;
        TArray<uint32>                CombinedVertexRefData;
        TArray<uint8>                CombinedIndexData;
        TArray<uint8>                CombinedPositionData;
        TArray<uint8>                CombinedAttributeData;
        TArray<uint32>                MaterialRangeData;
        TArray<uint16>                CodedVerticesPerCluster;
        TArray<uint32>                NumVertexBytesPerCluster;
        TArray<FPackedCluster>        PackedClusters;

        PackedClusters.SetNumUninitialized(Page.NumClusters);
        CodedVerticesPerCluster.SetNumUninitialized(Page.NumClusters);
        NumVertexBytesPerCluster.SetNumUninitialized(Page.NumClusters);
        
        const uint32 NumPackedClusterDwords = Page.NumClusters * sizeof(FPackedCluster) / sizeof(uint32);

        FPageSections GpuSectionOffsets = Page.GpuSizes.GetOffsets();
        TMap<FVariableVertex, uint32> UniqueVertices;

        for (uint32 i = 0; i < Page.PartsNum; i++)
        {
            const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
            for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
            {
                const uint32 ClusterIndex = Part.Clusters[j];
                const FCluster& Cluster = Clusters[ClusterIndex];
                const FEncodingInfo& EncodingInfo = EncodingInfos[ClusterIndex];

                const uint32 LocalClusterIndex = Part.PageClusterOffset + j;
                FPackedCluster& PackedCluster = PackedClusters[LocalClusterIndex];
                PackCluster(PackedCluster, Cluster, EncodingInfos[ClusterIndex], NumTexCoords);

                PackedCluster.PackedMaterialInfo = PackMaterialInfo(Cluster, MaterialRangeData, NumPackedClusterDwords);
                check((GpuSectionOffsets.Index & 3) == 0);
                check((GpuSectionOffsets.Position & 3) == 0);
                check((GpuSectionOffsets.Attribute & 3) == 0);
                PackedCluster.SetIndexOffset(GpuSectionOffsets.Index);
                PackedCluster.SetPositionOffset(GpuSectionOffsets.Position);
                PackedCluster.SetAttributeOffset(GpuSectionOffsets.Attribute);
                PackedCluster.SetDecodeInfoOffset(GpuSectionOffsets.DecodeInfo);
                
                GpuSectionOffsets += EncodingInfo.GpuSizes;

                const uint32 PrevVertexBytes = CombinedPositionData.Num();
                uint32 NumCodedVertices = 0;
                EncodeGeometryData(    LocalClusterIndex, Cluster, EncodingInfo, NumTexCoords, 
                                    CombinedStripBitmaskData, CombinedIndexData,
                                    CombinedVertexRefBitmaskData, CombinedVertexRefData, CombinedPositionData, CombinedAttributeData,
                                    UniqueVertices, NumCodedVertices);

                NumVertexBytesPerCluster[LocalClusterIndex] = CombinedPositionData.Num() - PrevVertexBytes;
                CodedVerticesPerCluster[LocalClusterIndex] = NumCodedVertices;
            }
        }
        check(GpuSectionOffsets.Cluster                        == Page.GpuSizes.GetMaterialTableOffset());
        check(Align(GpuSectionOffsets.MaterialTable, 16)    == Page.GpuSizes.GetDecodeInfoOffset());
        check(GpuSectionOffsets.DecodeInfo                    == Page.GpuSizes.GetIndexOffset());
        check(GpuSectionOffsets.Index                        == Page.GpuSizes.GetPositionOffset());
        check(GpuSectionOffsets.Position                    == Page.GpuSizes.GetAttributeOffset());
        check(GpuSectionOffsets.Attribute                    == Page.GpuSizes.GetTotal());

        // Dword對齊索引資料.
        CombinedIndexData.SetNumZeroed((CombinedIndexData.Num() + 3) & -4);

        // 直接在packkedclusters上執行頁面內部修復.
        for (uint32 LocalPartIndex = 0; LocalPartIndex < Page.PartsNum; LocalPartIndex++)
        {
            const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + LocalPartIndex];
            const FClusterGroup& Group = Groups[Part.GroupIndex];
            uint32 GeneratingGroupIndex = MAX_uint32;
            for (uint32 ClusterPositionInPart = 0; ClusterPositionInPart < (uint32)Part.Clusters.Num(); ClusterPositionInPart++)
            {
                const FCluster& Cluster = Clusters[Part.Clusters[ClusterPositionInPart]];
                if (Cluster.GeneratingGroupIndex != INVALID_GROUP_INDEX)
                {
                    const FClusterGroup& GeneratingGroup = Groups[Cluster.GeneratingGroupIndex];
                    uint32 PageDependencyStart = Group.PageIndexStart;
                    uint32 PageDependencyNum = Group.PageIndexNum;
                    RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum);

                    if (GeneratingGroup.PageIndexStart == PageIndex && GeneratingGroup.PageIndexNum == 1)
                    {
                        // 當前Page已經滿足的依賴, 直接修正.
                        PackedClusters[Part.PageClusterOffset + ClusterPositionInPart].Flags &= ~NANITE_CLUSTER_FLAG_LEAF;    // Mark parent as no longer leaf
                    }
                }
            }
        }

        // 開始page
        FPageResult& PageResult = PageResults[PageIndex];
        PageResult.Data.SetNum(CLUSTER_PAGE_DISK_SIZE);
        FBlockPointer PagePointer(PageResult.Data.GetData(), PageResult.Data.Num());

        // 磁碟頭資訊.
        FPageDiskHeader* PageDiskHeader = PagePointer.Advance<FPageDiskHeader>(1);

        // 16位元組對齊材質範圍資料,使其易於在GPU轉碼期間複製.
        MaterialRangeData.SetNum(Align(MaterialRangeData.Num(), 4));

        static_assert(sizeof(FUVRange) % 16 == 0, "sizeof(FUVRange) must be a multiple of 16");
        static_assert(sizeof(FPackedCluster) % 16 == 0, "sizeof(FPackedCluster) must be a multiple of 16");
        PageDiskHeader->NumClusters = Page.NumClusters;
        PageDiskHeader->GpuSize = Page.GpuSizes.GetTotal();
        PageDiskHeader->NumRawFloat4s = Page.NumClusters * (sizeof(FPackedCluster) + NumTexCoords * sizeof(FUVRange)) / 16 +  MaterialRangeData.Num() / 4;
        PageDiskHeader->NumTexCoords = NumTexCoords;

        // Cluster頭資訊.
        FClusterDiskHeader* ClusterDiskHeaders = PagePointer.Advance<FClusterDiskHeader>(Page.NumClusters);

        // 用SOA(Structure-of-Arrays)記憶體佈局寫入cluster.
        {
            const uint32 NumClusterFloat4Propeties = sizeof(FPackedCluster) / 16;
            for (uint32 float4Index = 0; float4Index < NumClusterFloat4Propeties; float4Index++)
            {
                for (const FPackedCluster& PackedCluster : PackedClusters)
                {
                    uint8* Dst = PagePointer.Advance<uint8>(16);
                    FMemory::Memcpy(Dst, (uint8*)&PackedCluster + float4Index * 16, 16);
                }
            }
        }
        
        // 材質表.
        uint32 MaterialTableSize = MaterialRangeData.Num() * MaterialRangeData.GetTypeSize();
        uint8* MaterialTable = PagePointer.Advance<uint8>(MaterialTableSize);
        FMemory::Memcpy(MaterialTable, MaterialRangeData.GetData(), MaterialTableSize);
        check(MaterialTableSize == Page.GpuSizes.GetMaterialTableSize());

        // 解碼資訊.
        PageDiskHeader->DecodeInfoOffset = PagePointer.Offset();
        for (uint32 i = 0; i < Page.PartsNum; i++)
        {
            const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
            for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
            {
                const uint32 ClusterIndex = Part.Clusters[j];
                FUVRange* DecodeInfo = PagePointer.Advance<FUVRange>(NumTexCoords);
                for (uint32 k = 0; k < NumTexCoords; k++)
                {
                    DecodeInfo[k] = EncodingInfos[ClusterIndex].UVInfos[k].UVRange;
                }
            }
        }
        
        // 索引資料.
        {
            uint8* IndexData = PagePointer.GetPtr<uint8>();
#if USE_STRIP_INDICES
            for (uint32 i = 0; i < Page.PartsNum; i++)
            {
                const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
                for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
                {
                    const uint32 LocalClusterIndex = Part.PageClusterOffset + j;
                    const uint32 ClusterIndex = Part.Clusters[j];
                    const FCluster& Cluster = Clusters[ClusterIndex];

                    ClusterDiskHeaders[LocalClusterIndex].IndexDataOffset = PagePointer.Offset();
                    ClusterDiskHeaders[LocalClusterIndex].NumPrevNewVerticesBeforeDwords = Cluster.StripDesc.NumPrevNewVerticesBeforeDwords;
                    ClusterDiskHeaders[LocalClusterIndex].NumPrevRefVerticesBeforeDwords = Cluster.StripDesc.NumPrevRefVerticesBeforeDwords;
                    
                    PagePointer.Advance<uint8>(Cluster.StripIndexData.Num());
                }
            }

            uint32 IndexDataSize = CombinedIndexData.Num() * CombinedIndexData.GetTypeSize();
            FMemory::Memcpy(IndexData, CombinedIndexData.GetData(), IndexDataSize);
            PagePointer.Align(sizeof(uint32));

            PageDiskHeader->StripBitmaskOffset = PagePointer.Offset();
            uint32 StripBitmaskDataSize = CombinedStripBitmaskData.Num() * CombinedStripBitmaskData.GetTypeSize();
            uint8* StripBitmaskData = PagePointer.Advance<uint8>(StripBitmaskDataSize);
            FMemory::Memcpy(StripBitmaskData, CombinedStripBitmaskData.GetData(), StripBitmaskDataSize);
            
#else
            for (uint32 i = 0; i < Page.NumClusters; i++)
            {
                ClusterDiskHeaders[i].IndexDataOffset = PagePointer.Offset();
                PagePointer.Advance<uint8>(PackedClusters[i].GetNumTris() * 3);
            }
            PagePointer.Align(sizeof(uint32));

            uint32 IndexDataSize = CombinedIndexData.Num() * CombinedIndexData.GetTypeSize();
            FMemory::Memcpy(IndexData, CombinedIndexData.GetData(), IndexDataSize);
#endif
        }

        // 寫入頂點引用的位掩碼.
        {
            PageDiskHeader->VertexRefBitmaskOffset = PagePointer.Offset();
            const uint32 VertexRefBitmaskSize = Page.NumClusters * (MAX_CLUSTER_VERTICES / 8);
            uint8* VertexRefBitmask = PagePointer.Advance<uint8>(VertexRefBitmaskSize);
            FMemory::Memcpy(VertexRefBitmask, CombinedVertexRefBitmaskData.GetData(), VertexRefBitmaskSize);
            check(CombinedVertexRefBitmaskData.Num() * CombinedVertexRefBitmaskData.GetTypeSize() == VertexRefBitmaskSize);
        }

        // 寫入頂點引用.
        {
            uint8* VertexRefs = PagePointer.GetPtr<uint8>();
            for (uint32 i = 0; i < Page.NumClusters; i++)
            {
                ClusterDiskHeaders[i].VertexRefDataOffset = PagePointer.Offset();
                uint32 NumVertexRefs = PackedClusters[i].GetNumVerts() - CodedVerticesPerCluster[i];
                PagePointer.Advance<uint32>(NumVertexRefs);
            }
            FMemory::Memcpy(VertexRefs, CombinedVertexRefData.GetData(), CombinedVertexRefData.Num() * CombinedVertexRefData.GetTypeSize());
        }

        // 寫入位置.
        {
            uint8* PositionData = PagePointer.GetPtr<uint8>();
            for (uint32 i = 0; i < Page.NumClusters; i++)
            {
                ClusterDiskHeaders[i].PositionDataOffset = PagePointer.Offset();
                PagePointer.Advance<uint8>(NumVertexBytesPerCluster[i]);
            }
            check( (PagePointer.GetPtr<uint8>() - PositionData) == CombinedPositionData.Num() * CombinedPositionData.GetTypeSize());

            FMemory::Memcpy(PositionData, CombinedPositionData.GetData(), CombinedPositionData.Num() * CombinedPositionData.GetTypeSize());
        }

        // 寫入屬性.
        {
            uint8* AttribData = PagePointer.GetPtr<uint8>();
            for (uint32 i = 0; i < Page.NumClusters; i++)
            {
                const uint32 BytesPerAttribute = (PackedClusters[i].GetBitsPerAttribute() + 7) / 8;
                ClusterDiskHeaders[i].AttributeDataOffset = PagePointer.Offset();
                PagePointer.Advance<uint8>(Align(CodedVerticesPerCluster[i] * BytesPerAttribute, 4));
            }
            check((uint32)(PagePointer.GetPtr<uint8>() - AttribData) == CombinedAttributeData.Num() * CombinedAttributeData.GetTypeSize());
            FMemory::Memcpy(AttribData, CombinedAttributeData.GetData(), CombinedAttributeData.Num()* CombinedAttributeData.GetTypeSize());
        }

        // 使用Lempel-Ziv(LZ)無失真壓縮記憶體, LZ的一個變種是Lempel-Ziv-Welch(LZW).
        // 更多詳見: http://athena.ecs.csus.edu/~wang/DLZW.pdf.
        if (bLZCompress) 
        {
            TArray<uint8> DataCopy(PageResult.Data.GetData(), PagePointer.Offset());
            PageResult.UncompressedSize = DataCopy.Num();
            
            int32 CompressedSize = PageResult.Data.Num();
            verify(FCompression::CompressMemory(NAME_LZ4, PageResult.Data.GetData(), CompressedSize, DataCopy.GetData(), DataCopy.Num()));

            PageResult.Data.SetNum(CompressedSize, false);
        }
        else // 不使用壓縮.
        {
            PageResult.Data.SetNum(PagePointer.Offset(), false);
            PageResult.UncompressedSize = PageResult.Data.Num();
        }
    });

    // 寫入Page.
    uint32 TotalUncompressedSize = 0;
    uint32 TotalCompressedSize = 0;
    uint32 TotalFixupSize = 0;
    for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
    {
        const FPage& Page = Pages[PageIndex];
        
        FFixupChunk& FixupChunk = FixupChunks[PageIndex];
        TArray<uint8>& BulkData = IsRootPage(PageIndex) ? Resources.RootClusterPage : StreamableBulkData;

        FPageStreamingState& PageStreamingState = Resources.PageStreamingStates[PageIndex];
        PageStreamingState.BulkOffset = BulkData.Num();

        // 寫入修正塊.
        uint32 FixupChunkSize = FixupChunk.GetSize();
        check(FixupChunk.Header.NumHierachyFixups < MAX_CLUSTERS_PER_PAGE);
        check(FixupChunk.Header.NumClusterFixups < MAX_CLUSTERS_PER_PAGE);
        BulkData.Append((uint8*)&FixupChunk, FixupChunkSize);
        TotalFixupSize += FixupChunkSize;

        // 拷貝頁到BulkData.
        TArray<uint8>& PageData = PageResults[PageIndex].Data;
        BulkData.Append(PageData.GetData(), PageData.Num());
        TotalUncompressedSize += PageResults[PageIndex].UncompressedSize;
        TotalCompressedSize += PageData.Num();

        PageStreamingState.BulkSize = BulkData.Num() - PageStreamingState.BulkOffset;
        PageStreamingState.PageUncompressedSize = PageResults[PageIndex].UncompressedSize;
    }

    uint32 TotalDiskSize = Resources.RootClusterPage.Num() + StreamableBulkData.Num();
    UE_LOG(LogStaticMesh, Log, TEXT("WritePages:"), NumPages);
    UE_LOG(LogStaticMesh, Log, TEXT("  %d pages written."), NumPages);
    UE_LOG(LogStaticMesh, Log, TEXT("  GPU size: %d bytes. %.3f bytes per page. %.3f%% utilization."), TotalGPUSize, TotalGPUSize / float(NumPages), TotalGPUSize / (float(NumPages) * CLUSTER_PAGE_GPU_SIZE) * 100.0f);
    UE_LOG(LogStaticMesh, Log, TEXT("  Uncompressed page data: %d bytes. Compressed page data: %d bytes. Fixup data: %d bytes."), TotalUncompressedSize, TotalCompressedSize, TotalFixupSize);
    UE_LOG(LogStaticMesh, Log, TEXT("  Total disk size: %d bytes. %.3f bytes per page."), TotalDiskSize, TotalDiskSize/ float(NumPages));

    // 儲存PageData.
    Resources.StreamableClusterPages.Lock(LOCK_READ_WRITE);
    uint8* Ptr = (uint8*)Resources.StreamableClusterPages.Realloc(StreamableBulkData.Num());
    FMemory::Memcpy(Ptr, StreamableBulkData.GetData(), StreamableBulkData.Num());
    Resources.StreamableClusterPages.Unlock();
    Resources.StreamableClusterPages.SetBulkDataFlags(BULKDATA_Force_NOT_InlinePayload);
    Resources.bLZCompressed = bLZCompress;
}

6.4.2.8 FImposterAtlas::Rasterize

// Engine\Source\Developer\NaniteBuilder\Private\ImposterAtlas.cpp

// 將指定Cluster的光柵化到Imposter.
void FImposterAtlas::Rasterize( const FIntPoint& TilePos, const FCluster& Cluster, uint32 ClusterIndex )
{
    constexpr uint32 ViewSize = TileSize;// * SuperSample;

    FIntRect Scissor( 0, 0, ViewSize, ViewSize );

    // 獲取區域性到Imposter的變換矩陣.
    FMatrix LocalToImposter = GetLocalToImposter( TilePos );

    TArray< FVector, TInlineAllocator<128> > Positions;
    Positions.SetNum( Cluster.NumVerts, false );

    // 提取Cluster頂點位置, 並轉換到Imposter空間.
    for( uint32 VertIndex = 0; VertIndex < Cluster.NumVerts; VertIndex++ )
    {
        FVector Position = Cluster.GetPosition( VertIndex );
        Position = LocalToImposter.TransformPosition( Position );

        Positions[ VertIndex ].X = ( Position.X * 0.5f + 0.5f ) * ViewSize;
        Positions[ VertIndex ].Y = ( Position.Y * 0.5f + 0.5f ) * ViewSize;
        Positions[ VertIndex ].Z = ( Position.Z * 0.5f + 0.5f ) * 254.0f + 1.0f;    // zero is reserved as masked
    }

    // 遍歷所有三角形, 光柵化它們到Imposter.
    for( uint32 TriIndex = 0; TriIndex < Cluster.NumTris; TriIndex++ )
    {
        FVector Verts[3];
        Verts[0] = Positions[ Cluster.Indexes[ TriIndex * 3 + 0 ] ];
        Verts[1] = Positions[ Cluster.Indexes[ TriIndex * 3 + 1 ] ];
        Verts[2] = Positions[ Cluster.Indexes[ TriIndex * 3 + 2 ] ];

        // 光柵化三角形.
        RasterizeTri( Verts, Scissor, 0,
            // 儲存光柵化後的結果.
            [&]( int32 x, int32 y, float z )
            {
                uint32 Depth = FMath::RoundToInt( FMath::Clamp( z, 1.0f, 255.0f ) );
                uint16 PixelValue = ( Depth << 8 ) | ( ClusterIndex << 7 ) | TriIndex;
                //uint32 PixelIndex = x + y * ViewSize;
                uint32 PixelIndex = x + ( y + ( TilePos.X + TilePos.Y * AtlasSize ) * TileSize ) * TileSize;
                Pixels[ PixelIndex ] = FMath::Max( Pixels[ PixelIndex ], PixelValue );
            } );
    }
}

// Engine\Source\Developer\NaniteBuilder\Private\Rasterizer.h

// 軟光柵指定的三角形, 寫入資料時呼叫FWritePixel回撥函式.
template< typename FWritePixel >
void RasterizeTri( const FVector Verts[3], const FIntRect& ScissorRect, uint32 SubpixelDilate, FWritePixel WritePixel )
{
    constexpr uint32 SubpixelBits        = 8;
    constexpr uint32 SubpixelSamples    = 1 << SubpixelBits;

    FVector v01 = Verts[1] - Verts[0];
    FVector v02 = Verts[2] - Verts[0];

    float DetXY = v01.X * v02.Y - v01.Y * v02.X;
    if( DetXY >= 0.0f )
    {
        // 背面剔除.
        // 如果未剔除,需要交換頂點,為其餘程式碼糾正winding.
        return;
    }

    FVector2D GradZ;
    GradZ.X = ( v01.Z * v02.Y - v01.Y * v02.Z ) / DetXY;
    GradZ.Y = ( v01.X * v02.Z - v01.Z * v02.X ) / DetXY;

    // 24.8 fixed point
    FIntPoint Vert0 = ToIntPoint( Verts[0] * SubpixelSamples );
    FIntPoint Vert1 = ToIntPoint( Verts[1] * SubpixelSamples );
    FIntPoint Vert2 = ToIntPoint( Verts[2] * SubpixelSamples );

    // 矩形包圍盒.
    FIntRect RectSubpixel( Vert0, Vert0 );
    RectSubpixel.Include( Vert1 );
    RectSubpixel.Include( Vert2 );
    RectSubpixel.InflateRect( SubpixelDilate );

    // 四捨五入到最近畫素.
    FIntRect RectPixel = ( ( RectSubpixel + (SubpixelSamples / 2) - 1 ) ) / SubpixelSamples;

    // 裁剪到視口.
    RectPixel.Clip( ScissorRect );
    
    // 若沒有畫素覆蓋, 裁剪之.
    if( RectPixel.IsEmpty() )
        return;

    // 12.8 fixed point
    FIntPoint Edge01 = Vert0 - Vert1;
    FIntPoint Edge12 = Vert1 - Vert2;
    FIntPoint Edge20 = Vert2 - Vert0;

    // 用半畫素偏移調整MinPixel.
    // 12.8 fixed point
    // 最大的三角形尺寸 = 2047x2047 畫素.
    const FIntPoint BaseSubpixel = RectPixel.Min * SubpixelSamples + (SubpixelSamples / 2);
    Vert0 -= BaseSubpixel;
    Vert1 -= BaseSubpixel;
    Vert2 -= BaseSubpixel;

    auto EdgeC = [=]( const FIntPoint& Edge, const FIntPoint& Vert )
    {
        int64 ex = Edge.X;
        int64 ey = Edge.Y;
        int64 vx = Vert.X;
        int64 vy = Vert.Y;

        // Half-edge constants
        // 24.16 fixed point
        int64 C = ey * vx - ex * vy;

        // 校正填充公約(fill convention)
        // Top left rule for CCW
        C -= ( Edge.Y < 0 || ( Edge.Y == 0 && Edge.X > 0 ) ) ? 0 : 1;

        // 擴大邊.
        C += ( FMath::Abs( Edge.X ) + FMath::Abs( Edge.Y ) ) * SubpixelDilate;

        // 畫素增量步進.
        // 低位總是相同的,因此在測試符號時無關緊要。
        // 24.8 fixed point
        return int32( C >> SubpixelBits );
    };

    int32 C0 = EdgeC( Edge01, Vert0 );
    int32 C1 = EdgeC( Edge12, Vert1 );
    int32 C2 = EdgeC( Edge20, Vert2 );
    float Z0 = Verts[0].Z - ( GradZ.X * Vert0.X + GradZ.Y * Vert0.Y ) / SubpixelSamples;
    
    int32 CY0 = C0;
    int32 CY1 = C1;
    int32 CY2 = C2;
    float ZY = Z0;

    // 遍歷矩形內的所有畫素, 填充在三角形內的畫素.
    for( int32 y = RectPixel.Min.Y; y < RectPixel.Max.Y; y++ )
    {
        int32 CX0 = CY0;
        int32 CX1 = CY1;
        int32 CX2 = CY2;
        float ZX = ZY;

        for( int32 x = RectPixel.Min.X; x < RectPixel.Max.X; x++ )
        {
            // 如果當前3個邊的X分量都是正數, 說明在三角形內, 呼叫WritePixel寫入資料.
            if( ( CX0 | CX1 | CX2 ) >= 0 )
            {
                WritePixel( x, y, ZX );
            }

            CX0 -= Edge01.Y;
            CX1 -= Edge12.Y;
            CX2 -= Edge20.Y;
            ZX += GradZ.X;
        }

        CY0 += Edge01.X;
        CY1 += Edge12.X;
        CY2 += Edge20.X;
        ZY += GradZ.Y;
    }
}

6.4.2.9 Nanite資料構建總結

本小節總結一下Nanite資料構建的過程。最初的入口是BuildNaniteFromHiResSourceModel

  • 從UStaticMesh的HiResSourceModel獲取Nanite的高精度模型。
  • 計算切線、光照圖UV等。
  • 構建臨時的RenderData資料, 以便傳遞到後續的Nanite構建階段.。
  • 構建逐Section索引、頂點和索引緩衝。
  • BuildCombinedSectionIndices:連結逐section的索引緩衝。
  • ComputeBoundsFromVertexList:在Nanite構建之前從高解析度網格計算包圍盒。
  • 從SectionInfoMap中解析出Section材質索引。
  • NaniteBuilderModule.Build:執行Nanite構建模組。

下面是NaniteBuilderModule.Build的主要過程概述:

  • 構建三角形索引和材質索引的關聯陣列。
  • BuildNaniteData:構建Nanite資料。
    • 處理頂點色。
    • 遍歷所有Section,給每個Section構建Cluster。
      • ClusterTriangles:將Section拆分成一個或多個Cluster。
        • 初始化共享邊、邊界邊、邊雜湊等資料。
        • 並行地處理邊雜湊。
        • 並行地查詢共享邊和邊界邊。
        • 處理不連貫的三角形集。
        • 使用FGraphPartitioner劃分網格。
          • FGraphPartitioner內部使用了METIS第三方開源庫,METIS能夠高效地提供高品質的網格劃分,同時具有低填充率的特點,能夠保證網格劃分的效果和效率。METIS的劃分演算法有3個階段:粗化(Coarsening)、劃分(Partitioning)、細分(Uncoarsening)。
        • 並行地構建Cluster。
    • 檢測是否需要用粗糙代表(coarse representation)代替原始的靜態網格資料。
      • 使用粗糙代表須滿足:Nanite設定的PercentTriangles小於1且原網格的三角形數量大於2000。
    • 為所有Section呼叫BuildDAG構建有向非迴圈圖加速減面減模。
    • 如果使用粗糙代表,則呼叫BuildCoarseRepresentation構建粗糙代表的資料,然後使用粗糙網格範圍修正網格section資訊,同時遵守原始序號和保留材質。
    • Encode:編碼Nanite網格。
      • RemoveDegenerateTriangles:刪除所有Cluster的退化三角形。
      • BuildMaterialRanges:構建所有Cluster的材質範圍。
      • ConstrainClusters:約束Cluster到ClusterGroup。
      • CalculateQuantizedPositionsUniformGrid:計算量化的位置。
      • CalculateEncodingInfos:計算編碼資訊。
      • AssignClustersToPages:分配Cluster到Page。
      • BuildHierarchies:構建ClusterGroup的層級節點。
      • WritePages:將Cluster和ClusterGroup的資訊寫入Page。
    • 如果有需要(只有一個Section時),則生成FImposterAtlas。
      • 並行地為所有Cluster生成Imposter,將每個Cluster的所有三角形光柵化到FImposterAtlas。

在Nanite的構建過程使用了大量的優化技巧,主要包含但不限於:

  • 大量使用了ParallelFor,以便利用多執行緒並行地處理邏輯,縮減構建時間。
  • 利用MITES能夠獲得優良的網格劃分。
  • 使用Cluster、ClusterGroup、ClusterGroupPart等不同層級的概念有機組合網格相關的資料。
  • 使用DAG、Hierarchy、Mip Level、Coarse Representation等加速和優化網格構建和劃分。
  • 提前生成GPU渲染時需要的Page、ImposterAtlas等資訊。
  • 使用了大量的高度的資料壓縮,如LZ Compression、定點數(fixed point)、位操作等。

另外再說一下,Nanite並沒有使用之前傳聞的幾何影像(Geometry Image)技術,但核心思想或技術還是比較類似的。

據說UE5的Coder中就有和國際數學大師丘成桐弟子、幾何影像先驅——顧險峰教授一起發表過論文的作者。

關於Geometry Image技術,可以參考顧教授的論文Geometry images以及他的公眾號:老顧談幾何。

6.4.3 Nanite渲染

本節將闡述Nanite渲染階段的程式碼及邏輯。

6.4.3.1 Nanite渲染概述

UE5在渲染模組做了較大的改動以支援Nanite特性的渲染,總結起來如下:

  • 引擎模組:

    • 增加FMeshNaniteSettings型別,上一章節已做解析。
    • FStaticMeshRenderData增加Nanite::FResources資料及相關處理邏輯。
    • UStaticMeshComponent增加bDisplayNaniteProxyMesh,及增加FMeshNaniteSettings資料。
    • UHierarchicalInstancedStaticMeshComponent、UInstancedStaticMeshComponent等元件增加Nanite的資料和SceneProxy的支援。
    • 增加InstanceUniformShaderParameters模組。
  • 渲染模組:

    • 新增NaniteRender模組,包含FNaniteCommandInfo、ENaniteMeshPass、FNaniteDrawListContext、FCullingContext、FRasterContext、FRasterResults、FNaniteShader、FNaniteMaterialVS、FNaniteMeshProcessor、FNaniteMaterialTables、ERasterTechnique、ERasterScheduling、EOutputBufferMode、FPackedView等等型別及處理介面。

    • FPrimitiveSceneInfo增加NaniteCommandInfos、NaniteMaterialIds、LumenPrimitiveIndex以及CachedRayTracingMeshCommandsHashPerLOD、bRegisteredWithVelocityData、InstanceDataOffset、NumInstanceDataEntries例項化和光追相關的等資料和處理介面。

    • 新增NaniteResources模組,包含Nanite::FSceneProxy、Nanite::FResources、Nanite::FVertexFactory等型別。

    • 新增NaniteStreamingManager模組,包含Nanite::FPageKey、Nanite::FGPUStreamingRequest、Nanite::FStreamingRequest、Nanite::FStreamingPageInfo、Nanite::FRootPageInfo、Nanite::FPendingPage、Nanite::FAsyncState、Nanite::FStreamingManager、Nanite::、Nanite::等型別。

    • FPrimitiveSceneProxyz增加SupportsNaniteRendering、IsNaniteMesh、bSupportsMeshCardRepresentation、IsAlwaysVisible、GetPrimitiveInstances、RayTracingGroupId等。

    • SceneInterface和SceneManagement增加FInstanceCullingManagerResources等型別,使用FGPUScenePrimitiveCollector代替FPrimitiveUniformShaderParameters。

    • SceneView增加FViewShaderParameters、PrecomputedIndirectLightingColorScale、GlobalDistanceField、VirtualTexture、PhysicsField、Lumen、Instance、Page等shader繫結。

    • FPrimitiveFlagsCompact等型別增加bIsNaniteMesh標記。

  • Shader模組:

    • 新增了ClusterCulling、Culling、HZBCull、InstanceCulling、MaterialCulling、Shadow、GBuffer、Impsoter、DataDecode、DataPacked、Rasterizer、WritePixel等。

6.4.3.2 Nanite渲染基礎

本小節解析Nanite渲染中涉及的主要概念、型別及介面。

  • InstanceUniformShaderParameters
// Engine\Source\Runtime\Engine\Public\InstanceUniformShaderParameters.h

#define INSTANCE_SCENE_DATA_FLAG_CAST_SHADOWS            0x1
#define INSTANCE_SCENE_DATA_FLAG_DETERMINANT_SIGN        0x2
#define INSTANCE_SCENE_DATA_FLAG_HAS_IMPOSTER            0x4

// Nanite例項化資訊.
class FNaniteInfo
{
public:
    uint32 RuntimeResourceID; // 執行時的資源標識號.
    uint32 HierarchyOffset_AndHasImposter; // 層次結構偏移和是否有Imposter的聯合資料.

    FNaniteInfo()
    : RuntimeResourceID(0xFFFFFFFFu)
    , HierarchyOffset_AndHasImposter(0xFFFFFFFFu)
    {
    }

    FNaniteInfo(uint32 InRuntimeResourceID, int32 InHierarchyOffset, bool bHasImposter)
    : RuntimeResourceID(InRuntimeResourceID)
    , HierarchyOffset_AndHasImposter((InHierarchyOffset << 1) | (bHasImposter ? 1u : 0u))
    {
    }
};

// Nanite圖元例項化資訊.
struct FPrimitiveInstance
{
    FMatrix  InstanceToLocal;
    FMatrix     PrevInstanceToLocal;
    FMatrix  LocalToWorld;
    FMatrix  PrevLocalToWorld;
    FVector4 NonUniformScale;
    FVector4 InvNonUniformScaleAndDeterminantSign;
    FBoxSphereBounds RenderBounds;
    FBoxSphereBounds LocalBounds;
    FVector4 LightMapAndShadowMapUVBias;
    uint32 PrimitiveId; // 圖元ID.
    FNaniteInfo NaniteInfo; // Nanite資訊.
    uint32 LastUpdateSceneFrameNumber;
    float PerInstanceRandom;
    uint32 Flags;
};

(......)

// FInstanceUniformShaderParameters的宣告. 需要和shader的FInstanceSceneData嚴格匹配.
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FInstanceUniformShaderParameters,ENGINE_API)
    SHADER_PARAMETER(FMatrix,  LocalToWorld)
    SHADER_PARAMETER(FMatrix,  PrevLocalToWorld)
    SHADER_PARAMETER(FVector4, NonUniformScale)
    SHADER_PARAMETER(FVector4, InvNonUniformScaleAndDeterminantSign)
    SHADER_PARAMETER(FVector,  LocalBoundsCenter)
    SHADER_PARAMETER(uint32,   PrimitiveId)
    SHADER_PARAMETER(FVector,  LocalBoundsExtent)
    SHADER_PARAMETER(uint32,   LastUpdateSceneFrameNumber)
    SHADER_PARAMETER(uint32,   NaniteRuntimeResourceID)
    SHADER_PARAMETER(uint32,   NaniteHierarchyOffset)
    SHADER_PARAMETER(float,    PerInstanceRandom)
    SHADER_PARAMETER(uint32,   Flags)
    SHADER_PARAMETER(FVector4, LightMapAndShadowMapUVBias)
END_GLOBAL_SHADER_PARAMETER_STRUCT()

// 例項化場景著色資料.
struct FInstanceSceneShaderData
{
    // 需要和SceneData.ush的GetInstanceData()相匹配.
    enum { InstanceDataStrideInFloat4s = 10 };

    FVector4 Data[InstanceDataStrideInFloat4s];

    (......)
};
  • NaniteStreamingManager
// Engine\Source\Runtime\Engine\Public\Rendering\NaniteStreamingManager.h

namespace Nanite
{

// 頁面鍵值
struct FPageKey
{
    // 執行時資源ID.
    uint32 RuntimeResourceID;
    // 頁索引.
    uint32 PageIndex;
};

// 鍵值雜湊
FORCEINLINE uint32 GetTypeHash( const FPageKey& Key )
{
    return Key.RuntimeResourceID * 0xFC6014F9u + Key.PageIndex * 0x58399E77u;
}

// 鍵值比較.
FORCEINLINE bool operator==( const FPageKey& A, const FPageKey& B )
{
    return A.RuntimeResourceID == B.RuntimeResourceID && A.PageIndex == B.PageIndex;
}
FORCEINLINE bool operator!=(const FPageKey& A, const FPageKey& B)
{
    return !(A == B);
}

// 去重(deduplication)【之前】的資料資訊.
struct FGPUStreamingRequest
{
    uint32        RuntimeResourceID;
    uint32        PageIndex_NumPages;
    uint32        Priority;
};

// 去重(deduplication)【之後】的資料資訊.
struct FStreamingRequest
{
    FPageKey    Key;
    uint32        Priority;
};

// 流式頁面資訊.
struct FStreamingPageInfo
{
    FStreamingPageInfo* Next;
    FStreamingPageInfo* Prev;

    FPageKey    RegisteredKey;
    FPageKey    ResidentKey;
    
    uint32        GPUPageIndex;
    uint32        LatestUpdateIndex;
    uint32        RefCount;
};

// 根頁資訊.
struct FRootPageInfo
{
    uint32    RuntimeResourceID;
    uint32    NumClusters;
};

// 掛起頁面.
struct FPendingPage
{
#if !WITH_EDITOR
    uint8*                    MemoryPtr;
    FIoRequest                Request;
    IAsyncReadFileHandle*    AsyncHandle;
    IAsyncReadRequest*        AsyncRequest;
#endif

    uint32                    GPUPageIndex;
    FPageKey                InstallKey;
#if !UE_BUILD_SHIPPING
    uint32                    BytesLeftToStream;
#endif
};

// 非同步資訊.
struct FAsyncState
{
    FRHIGPUBufferReadback*    LatestReadbackBuffer        = nullptr;
    const uint32*            LatestReadbackBufferPtr        = nullptr;
    uint32                    NumReadyPages                = 0;
    bool                    bUpdateActive                = false;
    bool                    bBuffersTransitionedToWrite = false;
};

// Nanite流管理器.
class FStreamingManager : public FRenderResource
{
public:
    FStreamingManager();
    
    // 初始化/釋放RHI資源.
    virtual void InitRHI() override;
    virtual void ReleaseRHI() override;

    // 增刪資源.
    void    Add( FResources* Resources );
    void    Remove( FResources* Resources );

    // 須在Nanite任何渲染髮生之前每幀呼叫一次, 也必須在EndUpdate[之前]呼叫。
    ENGINE_API void BeginAsyncUpdate(FRDGBuilder& GraphBuilder);
    // 須在Nanite任何渲染髮生之前每幀呼叫一次, 也必須在BeginUpdate[之後]呼叫。
    ENGINE_API void EndAsyncUpdate(FRDGBuilder& GraphBuilder);
    ENGINE_API bool IsAsyncUpdateInProgress();
    // 在新增最後一個請求後,每幀呼叫一次。
    ENGINE_API void    SubmitFrameStreamingRequests(FRDGBuilder& GraphBuilder);
    
    (......)

private:
    friend class FStreamingUpdateTask;

    // 堆緩衝, 包含資料和上傳緩衝.
    struct FHeapBuffer
    {
        int32                    TotalUpload = 0;

        FGrowOnlySpanAllocator    Allocator;

        FScatterUploadBuffer    UploadBuffer;
        FRWByteAddressBuffer    DataBuffer;

        void    Release()
        {
            UploadBuffer.Release();
            DataBuffer.Release();
        }
    };

    // FPackedCluster*, GeometryData { Index, Position, TexCoord, TangentX, TangentZ }*
    FHeapBuffer                ClusterPageData;    
    FHeapBuffer                ClusterPageHeaders;
    FScatterUploadBuffer    ClusterFixupUploadBuffer;
    FHeapBuffer                Hierarchy; // 層次結構.
    FHeapBuffer                RootPages; // 根頁面.
    TRefCountPtr< FRDGPooledBuffer > StreamingRequestsBuffer;
    
    uint32                    MaxStreamingPages;
    uint32                    MaxPendingPages;
    uint32                    MaxPageInstallsPerUpdate;
    uint32                    MaxStreamingReadbackBuffers;

    // 回傳資料.
    uint32                    ReadbackBuffersWriteIndex;
    uint32                    ReadbackBuffersNumPending;

    TArray<uint32>            NextRootPageVersion;
    uint32                    NextUpdateIndex;
    uint32                    NumRegisteredStreamingPages;
    uint32                    NumPendingPages;
    uint32                    NextPendingPageIndex;

    TArray<FRootPageInfo>    RootPageInfos;

#if !UE_BUILD_SHIPPING
    uint64                    PrevUpdateTick;
#endif

    TArray< FRHIGPUBufferReadback* >        StreamingRequestReadbackBuffers;
    TArray< FResources* >                    PendingAdds;

    TMap< uint32, FResources* >                RuntimeResourceMap;
    TMap< FPageKey, FStreamingPageInfo* >    RegisteredStreamingPagesMap;        // This is updated immediately.
    TMap< FPageKey, FStreamingPageInfo* >    CommittedStreamingPageMap;            // This update is deferred to the point where the page has been loaded and committed to memory.
    TArray< FStreamingRequest >                PrioritizedRequestsHeap;
    FStreamingPageInfo                        StreamingPageLRU;

    FStreamingPageInfo*                        StreamingPageInfoFreeList;
    TArray< FStreamingPageInfo >            StreamingPageInfos;
    // 常駐流頁面的修復資訊, 需保持這個資訊,以便能夠釋放頁面。
    TArray< FFixupChunk* >                    StreamingPageFixupChunks;

    TArray< FPendingPage >                    PendingPages;
#if !WITH_EDITOR
    TArray< uint8 >                            PendingPageStagingMemory;
#endif
    TArray< uint8 >                            PendingPageStagingMemoryLZ;

    FRequestsHashTable*                        RequestsHashTable = nullptr;
    FStreamingPageUploader*                    PageUploader = nullptr;

    FGraphEventArray                        AsyncTaskEvents;
    FAsyncState                                AsyncState;

    // 操作頁面.
    void CollectDependencyPages( FResources* Resources, TSet< FPageKey >& DependencyPages, const FPageKey& Key );
    void SelectStreamingPages( FResources* Resources, TArray< FPageKey >& SelectedPages, TSet<FPageKey>& SelectedPagesSet, uint32 RuntimeResourceID, uint32 PageIndex, uint32 MaxSelectedPages );

    // 註冊/取消註冊頁面.
    void RegisterStreamingPage( FStreamingPageInfo* Page, const FPageKey& Key );
    void UnregisterPage( const FPageKey& Key );
    void MovePageToFreeList( FStreamingPageInfo* Page );

    void ApplyFixups( const FFixupChunk& FixupChunk, const FResources& Resources, uint32 PageIndex, uint32 GPUPageIndex );

    bool ArePageDependenciesCommitted(uint32 RuntimeResourceID, uint32 PageIndex, uint32 DependencyPageStart, uint32 DependencyPageNum);

    // 返回是否完成了任何工作且頁面/層次緩衝區是否轉換為計算可寫狀態.
    bool ProcessNewResources( FRDGBuilder& GraphBuilder);
    
    uint32 DetermineReadyPages();
    void InstallReadyPages( uint32 NumReadyPages );

    // 非同步更新.
    void AsyncUpdate();

    void ClearStreamingRequestCount(FRDGBuilder& GraphBuilder, FRDGBufferUAVRef BufferUAVRef);
};

// Nanite流管理器宣告.
extern ENGINE_API TGlobalResource< FStreamingManager > GStreamingManager;
}
  • NaniteRender
// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.h

static constexpr uint32 NANITE_MAX_MATERIALS = 64;
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS = 12;                                        
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS_MASK    = ( ( 1 << MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS ) - 1 );
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS        = ( 1 << MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS );

(......)

// Nanite統一緩衝區引數.
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FNaniteUniformParameters, )
    SHADER_PARAMETER(FIntVector4,                    SOAStrides)
    SHADER_PARAMETER(FIntVector4,                    MaterialConfig) // .x mode, .yz grid size, .w unused
    SHADER_PARAMETER(uint32,                        MaxNodes)
    SHADER_PARAMETER(uint32,                        MaxVisibleClusters)
    SHADER_PARAMETER(uint32,                        RenderFlags)
    SHADER_PARAMETER(FVector4,                        RectScaleOffset) // xy: scale, zw: offset
    SHADER_PARAMETER_SRV(ByteAddressBuffer,            ClusterPageData)
    SHADER_PARAMETER_SRV(ByteAddressBuffer,            ClusterPageHeaders)
    SHADER_PARAMETER_SRV(ByteAddressBuffer,            VisibleClustersSWHW)
    SHADER_PARAMETER_SRV(StructuredBuffer<uint>,    VisibleMaterials)
    SHADER_PARAMETER_TEXTURE(Texture2D<uint2>,        MaterialRange)
    SHADER_PARAMETER_TEXTURE(Texture2D<UlongType>,    VisBuffer64)
    SHADER_PARAMETER_TEXTURE(Texture2D<UlongType>,    DbgBuffer64)
    SHADER_PARAMETER_TEXTURE(Texture2D<uint>,        DbgBuffer32)
END_SHADER_PARAMETER_STRUCT()

(......)

// 光柵化引數.
BEGIN_SHADER_PARAMETER_STRUCT( FRasterParameters, )
    SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >,        OutDepthBuffer ) // 深度
    SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< UlongType >,    OutVisBuffer64 ) // 可見性
    SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< UlongType >,    OutDbgBuffer64 ) // 除錯資料
    SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >,        OutDbgBuffer32 )
    SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >,        LockBuffer ) // 鎖定緩衝
END_SHADER_PARAMETER_STRUCT()

// Nanite繪製命令資訊.
class FNaniteCommandInfo
{
public:

    static constexpr int32 MAX_STATE_BUCKET_ID = (1 << 14) - 1; // Must match NaniteDataDecode.ush
    
    void SetStateBucketId(int32 InStateBucketId)
    {
        StateBucketId = InStateBucketId;
    }

    int32 GetStateBucketId() const
    {
        check(StateBucketId < MAX_STATE_BUCKET_ID);
        return StateBucketId;
    }

    uint32 GetMaterialId() const
    {
        return GetMaterialId(GetStateBucketId());
    }

    static uint32 GetMaterialId(int32 StateBucketId)
    {
        float DepthId = GetDepthId(StateBucketId);
        return *reinterpret_cast<uint32*>(&DepthId);
    }

    static float GetDepthId(int32 StateBucketId)
    {
        return float(StateBucketId + 1) / float(MAX_STATE_BUCKET_ID);
    }

private:
    // 將索引儲存到對應FMeshDrawCommand的FScene::NaniteDrawCommands中.
    int32 StateBucketId = INDEX_NONE;
};

struct MeshDrawCommandKeyFuncs;

// Nanite繪製命令列表上下文, 跟非Nanite模式的比較型別.
class FNaniteDrawListContext : public FMeshPassDrawListContext
{
public:
    FNaniteDrawListContext(FRWLock& InNaniteDrawCommandLock, FStateBucketMap& InNaniteDrawCommands);
    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final;

    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch,
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        EFVisibleMeshDrawCommandFlags Flags,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand
        ) override final;

    (......)
    
private:
    FRWLock* NaniteDrawCommandLock;
    FStateBucketMap* NaniteDrawCommands; // Nanite繪製命令.
    FNaniteCommandInfo CommandInfo; // Nanite命令資訊.
    FMeshDrawCommand MeshDrawCommandForStateBucketing;
};

// Nanite著色器父類.
class FNaniteShader : public FGlobalShader
{
public:
    (......)
    
    static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters);
    static void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment;
};

// 指定深度繪製全屏的頂點著色器, 可在所有平臺執行.
class FNaniteMaterialVS : public FNaniteShader
{
    DECLARE_GLOBAL_SHADER(FNaniteMaterialVS);

    BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
        SHADER_PARAMETER(float, MaterialDepth)
    END_SHADER_PARAMETER_STRUCT()

    (......)

    void GetShaderBindings(
        const FScene* Scene,
        ERHIFeatureLevel::Type FeatureLevel,
        const FPrimitiveSceneProxy* PrimitiveSceneProxy,
        const FMaterialRenderProxy& MaterialRenderProxy,
        const FMaterial& Material,
        const FMeshPassProcessorRenderState& DrawRenderState,
        const FMeshMaterialShaderElementData& ShaderElementData,
        FMeshDrawSingleShaderBindings& ShaderBindings) const
    {
        ShaderBindings.Add(NaniteUniformBuffer, DrawRenderState.GetNaniteUniformBuffer());
    }

private:
    LAYOUT_FIELD(FShaderParameter, MaterialDepth);
    LAYOUT_FIELD(FShaderUniformBufferParameter, NaniteUniformBuffer);
};

// Nanite網格處理器.
class FNaniteMeshProcessor : public FMeshPassProcessor
{
public:
    (......)

    virtual void AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId = -1) override final;

private:
    FMeshPassProcessorRenderState PassDrawRenderState;
};

// 建立Nanite網格處理器例項.
FMeshPassProcessor* CreateNaniteMeshProcessor(const FScene* Scene, const FSceneView* InViewIfDynamicMeshCommand, FMeshPassDrawListContext* InDrawListContext);

// Nanite材質表.
class FNaniteMaterialTables
{
public:
    FNaniteMaterialTables(uint32 MaxMaterials = NANITE_MAX_MATERIALS);
    ~FNaniteMaterialTables();

    void Release();

    void UpdateBufferState(FRDGBuilder& GraphBuilder, uint32 NumPrimitives);

    void Begin(FRHICommandListImmediate& RHICmdList, uint32 NumPrimitives, uint32 NumPrimitiveUpdates);
    void* GetDepthTablePtr(uint32 PrimitiveIndex, uint32 EntryCount);
    void Finish(FRHICommandListImmediate& RHICmdList);

    FRHIShaderResourceView* GetDepthTableSRV() const { return DepthTableDataBuffer.SRV; }

private:
    uint32 MaxMaterials = 0;
    uint32 NumPrimitiveUpdates = 0;
    uint32 NumDepthTableUpdates = 0;
    uint32 NumHitProxyTableUpdates = 0;

    // CPU及用於上傳的資料緩衝.
    FScatterUploadBuffer DepthTableUploadBuffer;
    FRWByteAddressBuffer DepthTableDataBuffer;
    FScatterUploadBuffer HitProxyTableUploadBuffer;
    FRWByteAddressBuffer HitProxyTableDataBuffer;
};

namespace Nanite
{
// 光柵化技術.
enum class ERasterTechnique : uint8
{
    LockBufferFallback = 0, // 使用備用鎖定緩衝來近似沒有64位原子(有競爭條件).
    PlatformAtomics = 1, // 使用平臺提供的64位原子.
    NVAtomics = 2, // 使用Nv擴充套件提供的64位原子.
    AMDAtomicsD3D11 = 3, // 使用AMD擴充套件(D3D11)提供的64位原子.
    AMDAtomicsD3D12 = 4, // 使用AMD擴充套件(D3D12)提供的64位原子.
    DepthOnly = 5, // 對深度使用32位原子, 沒有額外負載.

    NumTechniques
};

// 光柵化排程模式.
enum class ERasterScheduling : uint8
{
    HardwareOnly = 0, // 只使用固定功能硬體的光柵化.
    HardwareThenSoftware = 1, // 用硬體光柵化大三角形,用軟體(comtue shader)光柵化小三角形.
    HardwareAndSoftwareOverlap = 2, // 用硬體光柵化大三角形,重疊地用軟體(comtue shader)光柵化小三角形.
};

// 輸出緩衝模式. 當建立裝置上下文時用來選擇光柵化模式.
enum class EOutputBufferMode : uint8
{
    VisBuffer, // 可見性緩衝, 預設模式, 用來輸出ID和深度.
    DepthOnly, // 僅光柵化深度到32位緩衝.
};

// 填充的檢視.
struct FPackedView
{
    FMatrix        TranslatedWorldToView;
    FMatrix        TranslatedWorldToClip;
    FMatrix        ViewToClip;
    FMatrix        ClipToWorld;
    
    FMatrix        PrevTranslatedWorldToView;
    FMatrix        PrevTranslatedWorldToClip;
    FMatrix        PrevViewToClip;
    FMatrix        PrevClipToWorld;

    FIntVector4    ViewRect;
    FVector4    ViewSizeAndInvSize;
    FVector4    ClipSpaceScaleOffset;
    FVector4    PreViewTranslation;
    FVector4    PrevPreViewTranslation;
    FVector4    WorldCameraOrigin;
    FVector4    ViewForwardAndNearPlane;
    
    FVector2D    LODScales;
    float        MinBoundsRadiusSq;
    uint32        StreamingPriorityCategory_AndFlags;
    
    FIntVector4 TargetLayerIdX_AndMipLevelY_AndNumMipLevelsZ;
    FIntVector4    HZBTestViewRect;    // In full resolution

    // 計算LOD比例,假設檢視大小和投影已經設定好。依賴全域性變數GNaniteMaxPixelsPerEdge.
    void UpdateLODScales();
};

// 裁剪上下文.
struct FCullingContext
{
    FGlobalShaderMap* ShaderMap;

    uint32            DrawPassIndex;
    uint32            NumInstancesPreCull;
    uint32            RenderFlags;
    uint32            DebugFlags;
    TRefCountPtr<IPooledRenderTarget>    PrevHZB; // 如果非null, HZB裁剪將開啟.
    FIntRect        HZBBuildViewRect;
    bool            bTwoPassOcclusion;
    bool            bSupportsMultiplePasses;

    FIntVector4        SOAStrides;

    FRDGBufferRef    MainRasterizeArgsSWHW;
    FRDGBufferRef    PostRasterizeArgsSWHW;

    FRDGBufferRef    SafeMainRasterizeArgsSWHW;
    FRDGBufferRef    SafePostRasterizeArgsSWHW;

    FRDGBufferRef    MainAndPostPassPersistentStates;
    FRDGBufferRef    VisibleClustersSWHW;
    FRDGBufferRef    OccludedInstances;
    FRDGBufferRef    OccludedInstancesArgs;
    FRDGBufferRef    TotalPrevDrawClustersBuffer;
    FRDGBufferRef    StreamingRequests;
    FRDGBufferRef    ViewsBuffer;
    FRDGBufferRef    InstanceDrawsBuffer;
    FRDGBufferRef    StatsBuffer;
};

// 光柵化上下文.
struct FRasterContext
{
    FGlobalShaderMap*    ShaderMap;

    FVector2D            RcpViewSize;
    FIntPoint            TextureSize;
    ERasterTechnique    RasterTechnique;
    ERasterScheduling    RasterScheduling;

    FRasterParameters    Parameters;

    FRDGTextureRef        LockBuffer;
    FRDGTextureRef        DepthBuffer;
    FRDGTextureRef        VisBuffer64;
    FRDGTextureRef        DbgBuffer64;
    FRDGTextureRef        DbgBuffer32;

    uint32                VisualizeModeBitMask;
    bool                VisualizeActive;
};

// 光柵化結果.
struct FRasterResults
{
    FIntVector4        SOAStrides;
    uint32            MaxVisibleClusters;
    uint32            MaxNodes;
    uint32            RenderFlags;

    FRDGBufferRef    ViewsBuffer{};
    FRDGBufferRef    VisibleClustersSWHW{};

    FRDGTextureRef    VisBuffer64{};
    FRDGTextureRef    DbgBuffer64{};
    FRDGTextureRef    DbgBuffer32{};

    FRDGTextureRef    MaterialDepth{};
    FRDGTextureRef    NaniteMask{};
    FRDGTextureRef    VelocityBuffer{};

    TArray<FVisualizeResult, TInlineAllocator<32>> Visualizations;
};

// 初始化裁剪上下文.
FCullingContext    InitCullingContext(FRDGBuilder& GraphBuilder, const FScene& Scene, ...);
// 初始化光柵化上下文.
FRasterContext InitRasterContext(FRDGBuilder& GraphBuilder, ERHIFeatureLevel::Type FeatureLevel, ...);

// 填充的檢視引數.
struct FPackedViewParams
{
    FViewMatrices ViewMatrices;
    FViewMatrices PrevViewMatrices;
    FIntRect ViewRect;
    FIntPoint RasterContextSize;
    uint32 StreamingPriorityCategory = 0;
    float MinBoundsRadius = 0.0f;
    float LODScaleFactor = 1.0f;
    uint32 Flags = 0;

    int32 TargetLayerIndex = 0;
    int32 PrevTargetLayerIndex = INDEX_NONE;
    int32 TargetMipLevel = 0;
    int32 TargetMipCount = 1;

    FIntRect HZBTestViewRect = {0, 0, 0, 0};
};

FPackedView CreatePackedView( const FPackedViewParams& Params );
FPackedView CreatePackedViewFromViewInfo(const FViewInfo& View, FIntPoint RasterContextSize, ...);

// 光柵化狀態.
struct FRasterState
{
    bool bNearClip = true; // 是否開啟Near平面裁剪.
    ERasterizerCullMode CullMode = CM_CW; // 光柵化裁剪模式, 預設是順時針.
};

// 帶裁剪的光柵化.
void CullRasterize(
    FRDGBuilder& GraphBuilder,
    const FScene& Scene,
    const TArray<FPackedView, SceneRenderingAllocator>& Views,
    FCullingContext& CullingContext,
    const FRasterContext& RasterContext,
    const FRasterState& RasterState = FRasterState(),
    const TArray<FInstanceDraw, SceneRenderingAllocator>* OptionalInstanceDraws = nullptr,
    bool bExtractStats = false
);

// 光柵化到虛擬陰影圖(virtual shadow map)集
void CullRasterize(
    FRDGBuilder& GraphBuilder,
    const FScene& Scene,
    const TArray<FPackedView, SceneRenderingAllocator>& Views,
    uint32 NumPrimaryViews,    // Number of non-mip views
    FCullingContext& CullingContext,
    const FRasterContext& RasterContext,
    const FRasterState& RasterState = FRasterState(),
    const TArray<FInstanceDraw, SceneRenderingAllocator>* OptionalInstanceDraws = nullptr,
    FVirtualShadowMapArray* VirtualShadowMapArray = nullptr,
    bool bExtractStats = false
);

// 解壓光柵化結果.
void ExtractResults(FRDGBuilder& GraphBuilder, const FCullingContext& CullingContext, const FRasterContext& RasterContext, FRasterResults& RasterResults);

// 觸發陰影圖.
void EmitShadowMap(FRDGBuilder& GraphBuilder, const FRasterContext& RasterContext, const FRDGTextureRef DepthBuffer, ...);

// 觸發立方體圖陰影.
void EmitCubemapShadow(FRDGBuilder& GraphBuilder, const FRasterContext& RasterContext, const FRDGTextureRef CubemapDepthBuffer, ...);

// 觸發深度目標.
void EmitDepthTargets(FRDGBuilder& GraphBuilder, const FScene& Scene, const FViewInfo& View, ...);

// 繪製BasePass.
void DrawBasePass(FRDGBuilder& GraphBuilder, const FSceneTextures& SceneTextures, const FDBufferTextures& DBufferTextures, const FScene& Scene, const FViewInfo& View, const FRasterResults& RasterResults
);

// 繪製Lumen網格捕捉通道.
void DrawLumenMeshCapturePass(FRDGBuilder& GraphBuilder, const FScene& Scene, ...);

(......)
}

// 是否需要渲染Nanite.
extern bool ShouldRenderNanite(const FScene* Scene, const FViewInfo& View, bool bCheckForAtomicSupport = true);
  • NaniteSceneProxy
// Engine\Source\Runtime\Engine\Public\NaniteSceneProxy.h

namespace Nanite
{

// Nanite場景代理父類.
class FSceneProxyBase : public FPrimitiveSceneProxy
{
public:
    struct FMaterialSection
    {
        UMaterialInterface* Material = nullptr;
        int32 MaterialIndex = INDEX_NONE;
    };

public:
    ENGINE_API SIZE_T GetTypeHash() const override;

    FSceneProxyBase(UPrimitiveComponent* Component)
    : FPrimitiveSceneProxy(Component)
    {
        bIsNaniteMesh  = true;
        bAlwaysVisible = true;
    }

    // 檢測是否滿足Nanite渲染的條件: 不透明物體, 不是貼花, 不是Masked, 不是法線半透明, 不是分離半透明.
    static bool IsNaniteRenderable(FMaterialRelevance MaterialRelevance)
    {
        return MaterialRelevance.bOpaque &&
            !MaterialRelevance.bDecal &&
            !MaterialRelevance.bMasked &&
            !MaterialRelevance.bNormalTranslucency &&
            !MaterialRelevance.bSeparateTranslucency;
    }

    virtual bool CanBeOccluded() const override;
    inline const TArray<FMaterialSection>& GetMaterialSections() const;
    inline int32 GetMaterialMaxIndex() const;
    virtual const TArray<FPrimitiveInstance>* GetPrimitiveInstances() const;
    virtual TArray<FPrimitiveInstance>* GetPrimitiveInstances();
    virtual uint8 GetCurrentFirstLODIdx_RenderThread() const override;

protected:
    ENGINE_API void DrawStaticElementsInternal(FStaticPrimitiveDrawInterface* PDI, const FLightCacheInterface* LCI);

protected:
    TArray<FMaterialSection> MaterialSections;
    TArray<FPrimitiveInstance> Instances;
    int32 MaterialMaxIndex = INDEX_NONE;
};

// Nanite場景代理.
class FSceneProxy : public FSceneProxyBase
{
public:
    FSceneProxy(UStaticMeshComponent* Component);
    FSceneProxy(UInstancedStaticMeshComponent* Component);
    FSceneProxy(UHierarchicalInstancedStaticMeshComponent* Component);
    virtual ~FSceneProxy() = default;

public:
    // FPrimitiveSceneProxy介面.
    virtual FPrimitiveViewRelevance    GetViewRelevance(const FSceneView* View) const override;
    virtual void GetLightRelevance(const FLightSceneProxy* LightSceneProxy, bool& bDynamic, bool& bRelevant, bool& bLightMapped, bool& bShadowMapped) const override;

    // 獲取靜態或動態網格元素.
    virtual void DrawStaticElements(FStaticPrimitiveDrawInterface* PDI) override;
    virtual void GetDynamicMeshElements(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, FMeshElementCollector& Collector) const override;

    // 光追相關介面.
#if RHI_RAYTRACING
    virtual bool IsRayTracingRelevant() const { return true; }
    virtual bool IsRayTracingStaticRelevant() const { return false; }
    virtual void GetDynamicRayTracingInstances(FRayTracingMaterialGatheringContext& Context, TArray<struct FRayTracingInstance>& OutRayTracingInstances) override;
#endif

    virtual uint32 GetMemoryFootprint() const override;

    virtual void GetLCIs(FLCIArray& LCIs) override
    {
        FLightCacheInterface* LCI = &MeshInfo;
        LCIs.Add(LCI);
    }

    // 距離場介面.
    virtual void GetDistancefieldAtlasData(const FDistanceFieldVolumeData*& OutDistanceFieldData, float& SelfShadowBias) const override;
    virtual void GetDistancefieldInstanceData(TArray<FMatrix>& ObjectLocalToWorldTransforms) const override;
    virtual bool HasDistanceFieldRepresentation() const override;
    
    // GI介面.
    virtual const FCardRepresentationData* GetMeshCardRepresentation() const override;
    virtual int32 GetLightMapCoordinateIndex() const override;

    // 獲取靜態網格.
    const UStaticMesh* GetStaticMesh() const
    {
        return StaticMesh;
    }

protected:
    virtual void CreateRenderThreadResources() override;

    class FMeshInfo : public FLightCacheInterface
    {
    public:
        FMeshInfo(const UStaticMeshComponent* InComponent);

        // FLightCacheInterface.
        virtual FLightInteraction GetInteraction(const FLightSceneProxy* LightSceneProxy) const override;

    private:
        TArray<FGuid> IrrelevantLights;
    };

    bool IsCollisionView(const FEngineShowFlags& EngineShowFlags, bool& bDrawSimpleCollision, bool& bDrawComplexCollision) const;

protected:
    FMeshInfo MeshInfo;

    FResources* Resources = nullptr;

    const FStaticMeshRenderData* RenderData;
    const FDistanceFieldVolumeData* DistanceFieldData;
    const FCardRepresentationData* CardRepresentationData;

    FMaterialRelevance MaterialRelevance;

    uint32 bReverseCulling : 1;
    uint32 bHasMaterialErrors : 1;

    const UStaticMesh* StaticMesh = nullptr;

#if RHI_RAYTRACING
    TArray<FRayTracingGeometry*> RayTracingGeometries;
#endif

    (......)
};

} // namespace Nanite
  • RenderUtils
// Engine\Source\Runtime\RenderCore\Public\RenderUtils.h

(......)

// 檢測平臺是否支援Nanite渲染.
RENDERCORE_API bool DoesPlatformSupportNanite(EShaderPlatform Platform)
{
    // 確保當前平臺定義了DDPI(FGenericDataDrivenShaderPlatformInfo).
    const bool bValidPlatform = FDataDrivenShaderPlatformInfo::IsValid(Platform);
    // Nanite需要GPUScene.
    const bool bSupportGPUScene = FDataDrivenShaderPlatformInfo::GetSupportsGPUScene(Platform);
    // Nanite特定檢測.
    const bool bSupportNanite = FDataDrivenShaderPlatformInfo::GetSupportsNanite(Platform);

    const bool bFullCheck = bValidPlatform && bSupportGPUScene && bSupportNanite;
    return bFullCheck;
}

// 使用Nanite, 如果成功將返回true.
inline bool UseNanite(EShaderPlatform ShaderPlatform, bool bCheckForAtomicSupport = true);
// 使用VSM, 成功返回true.
inline bool UseVirtualShadowMaps(EShaderPlatform ShaderPlatform, const FStaticFeatureLevel FeatureLevel);
// 使用非Nanite的VSM, 成功返回true. 前提是r.Shadow.Virtual.NonNaniteVSM不為0, 且UseVirtualShadowMaps為true.
inline bool UseNonNaniteVirtualShadowMaps(EShaderPlatform ShaderPlatform, const FStaticFeatureLevel FeatureLevel);
  • 其它
// Engine\Source\Runtime\Engine\Classes\Components\StaticMeshComponent.h

class ENGINE_API UStaticMeshComponent : public UMeshComponent
{
    (......)
    
    uint8 bDisplayNaniteProxyMesh:1; // 對於nanite啟用的網格,如果為true,將只顯示代理網格.
    
    (......)
};

// Engine\Source\Runtime\Engine\Public\PrimitiveSceneProxy.h

class FPrimitiveSceneProxy
{
    inline bool IsNaniteMesh() const
    {
        return bIsNaniteMesh;
    }
    
    (......)
    
private:
    uint8 bIsNaniteMesh : 1; // 是否Nanite網格.
    
    (......)
};

// 如果指定網格可通過Nanite渲染, 則返回true.
ENGINE_API extern bool SupportsNaniteRendering(const FVertexFactory* RESTRICT VertexFactory, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy);
ENGINE_API extern bool SupportsNaniteRendering(const FVertexFactory* RESTRICT VertexFactory, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, const class FMaterialRenderProxy* MaterialRenderProxy, ERHIFeatureLevel::Type FeatureLevel);

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

struct FMeshPassProcessorRenderState
{
public:
    void SetNaniteUniformBuffer(FRHIUniformBuffer* InNaniteUniformBuffer);
    FRHIUniformBuffer* GetNaniteUniformBuffer() const;
    
    (......)
    
private:
    FRHIUniformBuffer* NaniteUniformBuffer = nullptr; // Nanite統一緩衝區.
    
    (......)
};

// Engine\Source\Runtime\RenderCore\Public\VertexFactory.h

// 頂點工廠標記.
enum class EVertexFactoryFlags : uint32
{
    None                            = 0u,
    UsedWithMaterials               = 1u << 1,
    SupportsStaticLighting          = 1u << 2,
    SupportsDynamicLighting         = 1u << 3,
    SupportsPrecisePrevWorldPos     = 1u << 4,
    SupportsPositionOnly            = 1u << 5,
    SupportsCachingMeshDrawCommands = 1u << 6,
    SupportsPrimitiveIdStream       = 1u << 7,
    SupportsNaniteRendering         = 1u << 8, // 是否支援Nanite渲染.
};

// 是否支援Nanite渲染.
bool SupportsNaniteRendering() const { return HasFlags(EVertexFactoryFlags::SupportsNaniteRendering); }

// Engine\Source\Runtime\RHI\Public\RHIDefinitions.h

// 通用的資料驅動著色器平臺資訊.
class RHI_API FGenericDataDrivenShaderPlatformInfo
{
    static FORCEINLINE_DEBUGGABLE const bool GetSupportsNanite(const FStaticShaderPlatform Platform)
    {
        return Infos[Platform].bSupportsNanite;
    }
    
    (......)
};

6.4.3.3 Nanite渲染流程

Nanite的主要渲染步驟也是發生在FDeferredShadingSceneRenderer::Render,下面將闡述Nanite相關的步驟以及前幾篇涉及的重要步驟:

void FDeferredShadingSceneRenderer::Render(FRDGBuilder& GraphBuilder)
{
    // 嘗試使用Nanite渲染。
    const bool bNaniteEnabled = UseNanite(ShaderPlatform) && ViewFamily.EngineShowFlags.NaniteMeshes;

    // 更新圖元場景資訊.
    Scene->UpdateAllPrimitiveSceneInfos(GraphBuilder, true);

    // 使用GPUScene.
    FGPUSceneScopeBeginEndHelper GPUSceneScopeBeginEndHelper(Scene->GPUScene, GPUSceneDynamicContext, Scene);

    bool bVisualizeNanite = false;
    if (bNaniteEnabled) // Nanite開啟才執行
    {
        // 更新Nanite全域性資源. 需要為Nanite管理亂序的緩衝區。
        Nanite::GGlobalResources.Update(GraphBuilder);
        // 開始非同步更新Nanite流管理器.
        Nanite::GStreamingManager.BeginAsyncUpdate(GraphBuilder);

        // 處理Nanite視覺化模式.
        FNaniteVisualizationData& NaniteVisualization = GetNaniteVisualizationData();
        if (Views.Num() > 0)
        {
            const FName& NaniteViewMode = Views[0].CurrentNaniteVisualizationMode;
            if (NaniteVisualization.Update(NaniteViewMode))
            {
                ViewFamily.EngineShowFlags.SetVisualizeNanite(true);
            }
            bVisualizeNanite = NaniteVisualization.IsActive() && ViewFamily.EngineShowFlags.VisualizeNanite;
        }
    }
    
    (......)
    
    // 是否需要應用Nanite材質.
    const bool bShouldApplyNaniteMaterials
         = !ViewFamily.EngineShowFlags.ShaderComplexity
        && !ViewFamily.UseDebugViewPS()
        && !ViewFamily.EngineShowFlags.Wireframe
        && !ViewFamily.EngineShowFlags.LightMapDensity;
    
    (......)
    
    // 例項化裁剪管理器.
    FInstanceCullingManager InstanceCullingManager(GInstanceCullingManagerResources, Scene->GPUScene.IsEnabled());
    
    bDoInitViewAftersPrepass = InitViews(GraphBuilder, ..., InstanceCullingManager);
    
    (......)
    
    // 處理GPUScene.
    {
        (......)

        // 更新GPUScene.
        Scene->GPUScene.Update(GraphBuilder, *Scene);

        (......)
        
        // 上傳動態圖元著色器資料到GPU.
        for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
        {
            FViewInfo& View = Views[ViewIndex];
            Scene->GPUScene.UploadDynamicPrimitiveShaderDataForView(GraphBuilder, Scene, View);
        }

        // 例項化裁剪.
        {
            InstanceCullingManager.CullInstances(GraphBuilder, Scene->GPUScene);
        }

        (......)
    }
    
    (......)

    if (bNaniteEnabled)
    {
        Nanite::ListStatFilters(this);

        // 必須在每幀的Nanite渲染之前呼叫.
        Nanite::GStreamingManager.EndAsyncUpdate(GraphBuilder);
    }
    
    (......)
    
    // 提前深度通道.
    RenderPrePass(GraphBuilder, SceneTextures.Depth.Target, InstanceCullingManager);

    (......)

    // Nanite光柵化
    TArray<Nanite::FRasterResults, TInlineAllocator<2>> NaniteRasterResults;
    if (bNaniteEnabled && Views.Num() > 0)
    {
        LLM_SCOPE_BYTAG(Nanite);

        NaniteRasterResults.AddDefaulted(Views.Num());

        RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteRaster);
        const FIntPoint RasterTextureSize = SceneTextures.Depth.Target->Desc.Extent;

        const FViewInfo& PrimaryViewRef = Views[0];
        const FIntRect PrimaryViewRect = PrimaryViewRef.ViewRect;
        
        // 主光柵化檢視
        {
            Nanite::FRasterState RasterState;

            Nanite::FRasterContext RasterContext = Nanite::InitRasterContext(GraphBuilder, FeatureLevel, RasterTextureSize);

            const bool bTwoPassOcclusion = true;
            const bool bUpdateStreaming = true;
            const bool bSupportsMultiplePasses = false;
            const bool bForceHWRaster = RasterContext.RasterScheduling == Nanite::ERasterScheduling::HardwareOnly;
            const bool bPrimaryContext = true;
            const bool bDiscardNonMoving = ViewFamily.EngineShowFlags.DrawOnlyVSMInvalidatingGeo != 0;

            // 遍歷所有view
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
            {
                const FViewInfo& View = Views[ViewIndex];

                // 初始化裁剪上下文.
                Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(
                    GraphBuilder,
                    *Scene,
                    !bIsEarlyDepthComplete ? View.PrevViewInfo.NaniteHZB : View.PrevViewInfo.HZB,
                    View.ViewRect,
                    bTwoPassOcclusion,
                    bUpdateStreaming,
                    bSupportsMultiplePasses,
                    bForceHWRaster,
                    bPrimaryContext,
                    bDiscardNonMoving
                );

                static FString EmptyFilterName = TEXT(""); // Empty filter represents primary view.
                const bool bExtractStats = Nanite::IsStatFilterActive(EmptyFilterName);

                Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(View, RasterTextureSize, VIEW_FLAG_HZBTEST, /*StreamingPriorityCategory*/ 3);

                // 帶裁剪的光柵化.
                Nanite::CullRasterize(
                    GraphBuilder,
                    *Scene,
                    { PackedView },
                    CullingContext,
                    RasterContext,
                    RasterState,
                    /*OptionalInstanceDraws*/ nullptr,
                    bExtractStats
                );

                Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];

                // 需要提前深度, 則渲染之.
                if (bNeedsPrePass)
                {
                    Nanite::EmitDepthTargets(
                        GraphBuilder,
                        *Scene,
                        Views[ViewIndex],
                        CullingContext.SOAStrides,
                        CullingContext.VisibleClustersSWHW,
                        CullingContext.ViewsBuffer,
                        SceneTextures.Depth.Target,
                        RasterContext.VisBuffer64,
                        RasterResults.MaterialDepth,
                        RasterResults.NaniteMask,
                        RasterResults.VelocityBuffer,
                        bNeedsPrePass
                    );
                }

                // 構建層次深度緩衝HZB.
                if (!bIsEarlyDepthComplete && bTwoPassOcclusion && View.ViewState)
                {
                    // 不會有一個針對後通道的完整的場景深度,所以不能使用完整的HZB主通道, 否則它將干擾後通道HZB銷燬遮擋剔除。
                    RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BuildHZB");

                    FRDGTextureRef SceneDepth = SystemTextures.Black;
                    FRDGTextureRef GraphHZB = nullptr;

                    // 最大程度地構建HZB.
                    BuildHZBFurthest(
                        GraphBuilder,
                        SceneDepth,
                        RasterContext.VisBuffer64,
                        PrimaryViewRect,
                        FeatureLevel,
                        ShaderPlatform,
                        TEXT("Nanite.HZB"),
                        /* OutFurthestHZBTexture = */ &GraphHZB );
                    
                    GraphBuilder.QueueTextureExtraction( GraphHZB, &View.ViewState->PrevFrameViewInfo.NaniteHZB );
                }

                Nanite::ExtractResults(GraphBuilder, CullingContext, RasterContext, RasterResults);
            }
        }
    }

    (......)
    
    // 渲染Nanite的BasePass.
    {
        RenderBasePass(GraphBuilder, SceneTextures, DBufferTextures, BasePassDepthStencilAccess, ForwardScreenSpaceShadowMaskTexture, InstanceCullingManager);
        AddServiceLocalQueuePass(GraphBuilder);
        
        if (bNaniteEnabled && bShouldApplyNaniteMaterials)
        {
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ++ViewIndex)
            {
                const FViewInfo& View = Views[ViewIndex];
                Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];

                // 如果沒有提前繪製深度, 則現在繪製深度
                if (!bNeedsPrePass)
                {
                    Nanite::EmitDepthTargets(
                        GraphBuilder,
                        *Scene,
                        Views[ViewIndex],
                        RasterResults.SOAStrides,
                        RasterResults.VisibleClustersSWHW,
                        RasterResults.ViewsBuffer,
                        SceneTextures.Depth.Target,
                        RasterResults.VisBuffer64,
                        RasterResults.MaterialDepth,
                        RasterResults.NaniteMask,
                        RasterResults.VelocityBuffer,
                        bNeedsPrePass
                    );
                }

                // 繪製BasePass.
                Nanite::DrawBasePass(
                    GraphBuilder,
                    SceneTextures,
                    DBufferTextures,
                    *Scene,
                    View,
                    RasterResults
                );
            }
        }

        if (!bAllowReadOnlyDepthBasePass)
        {
            AddResolveSceneDepthPass(GraphBuilder, Views, SceneTextures.Depth);
        }

        (......)
    }

    (......)
    
    if (bNaniteEnabled)
    {
        // 計算體積霧.
        if (!bOcclusionBeforeBasePass)
        {
            ComputeVolumetricFog(GraphBuilder);
        }

        // 提交幀流請求.
        Nanite::GStreamingManager.SubmitFrameStreamingRequests(GraphBuilder);
    }

    (......)
    
    // 渲染延遲光源.
    RenderLights(GraphBuilder, SceneTextures, ...);

    (......)

    // 渲染半透明物體.
    RenderTranslucency(GraphBuilder, SceneTextures, ...);
    
    (......)
    
    // 後處理 
    AddPostProcessingPasses(GraphBuilder, View, PostProcessingInputs, NaniteResults, InstanceCullingManager);
    
    (......)
}

由此可見,Nanite的渲染流程和普通模式比較型別,都是先更新圖後設資料、GPUScene、裁剪資料,然後渲染BasePass和Lighting,最後是半透明和後處理。不過也存在與普通模式不同點,如增加了GStreamingManager、FInstanceCullingManager、構建HZB、Nanite光柵化等階段。下面藉助RenderDoc擷取示例工程AncientGame以展示UE5相關的主要步驟:

RenderDoc擷取的UE5渲染過程,其中紅框處是UE5相關的步驟。

6.4.3.4 Nanite裁剪

Nanite的例項化裁剪由FInstanceCullingManager擔當,貫穿在FDeferredShadingSceneRenderer::Render的整個過程。下面是它及相關型別的定義和宣告:

// Engine\Source\Runtime\Engine\Public\SceneManagement.h

// 例項化裁剪管理資源, 用於FInstanceCullingManager中.
class FInstanceCullingManagerResources : public FRenderResource
{
public:
    // 最大非直接繪製例項數量是1024*1024=104萬個.
    static constexpr uint32 MaxIndirectInstances = 1024 * 1024;

    // 初始化和釋放RHI資源.
    virtual void InitRHI() override;
    virtual void ReleaseRHI() override;

    // 獲取資料介面.
    FRHIBuffer* GetInstancesIdBuffer() const { return InstanceIdsBuffer.Buffer; }
    FRHIShaderResourceView* GetInstancesIdBufferSrv() const { return InstanceIdsBuffer.SRV.GetReference(); }
    FRHIShaderResourceView* GetPageInfoBufferSrv() const { return PageInfoBuffer.SRV.GetReference(); }
    FUnorderedAccessViewRHIRef GetInstancesIdBufferUav() const { return InstanceIdsBuffer.UAV; }
    FUnorderedAccessViewRHIRef GetPageInfoBufferUav() const { return PageInfoBuffer.UAV; }

private:
    FRWBuffer PageInfoBuffer; // 頁面資訊緩衝.
    FRWBuffer InstanceIdsBuffer; // 例項化ID緩衝.
};

// 全域性FInstanceCullingManagerResources物件.
extern ENGINE_API TGlobalResource<FInstanceCullingManagerResources> GInstanceCullingManagerResources;


// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.h

// 例項化裁剪中間資料.
class FInstanceCullingIntermediate
{
public:
    // 每個註冊檢視對應的每個Instance可見性位, 它被CullInstances介面處理.
    FRDGBufferRef VisibleInstanceFlags = nullptr;
    // 所有例項ID擴充套件所使用的寫偏移量, 用於在全域性例項ID緩衝區中分配空間. 被CullInstances初始化為0.
    FRDGBufferRef InstanceIdOutOffsetBuffer = nullptr;
    
    // 例項化數量.
    int32 NumInstances = 0;
    // 檢視數量.
    int32 NumViews = 0;
};

// 例項化裁剪結果.
struct FInstanceCullingResult
{
    // 非直接繪製引數緩衝.
    FRDGBufferRef DrawIndirectArgsBuffer = nullptr;
    // 例項化ID偏移緩衝.
    FRDGBufferRef InstanceIdOffsetBuffer = nullptr;

    // 獲取繪製引數到FInstanceCullingDrawParams中.
    void GetDrawParameters(FInstanceCullingDrawParams &OutParams) const
    {
        OutParams.DrawIndirectArgsBuffer = DrawIndirectArgsBuffer;
        OutParams.InstanceIdOffsetBuffer = InstanceIdOffsetBuffer;
    }

    // 帶檢測地獲取繪製引數.
    static void CondGetDrawParameters(const FInstanceCullingResult* InstanceCullingResult, FInstanceCullingDrawParams& OutParams)
    {
        if (InstanceCullingResult)
        {
            InstanceCullingResult->GetDrawParameters(OutParams);
        }
        else
        {
            OutParams.DrawIndirectArgsBuffer = nullptr;
            OutParams.InstanceIdOffsetBuffer = nullptr;
        }
    }
};

// 管理所有例項繪製的非直接引數和裁剪作業的分配, 使用GPUScene裁剪.
class FInstanceCullingManager
{
public:
    FInstanceCullingManager(FInstanceCullingManagerResources& InResources, bool bInIsEnabled);
    
    // 圖元展開後的最大平均例項數.
    static constexpr uint32 MaxAverageInstanceFactor = 128;
    
    bool IsEnabled() const { return bIsEnabled; }
    // 註冊需要裁剪的檢視, 返回檢視的id.
    int32 RegisterView(const Nanite::FPackedViewParams& Params);
    int32 RegisterView(const FViewInfo& ViewInfo);

    // 裁剪例項, 需要在檢視被初始化和註冊之後, 需要在GPUScene被更新之後且渲染指令被提交之前.
    void CullInstances(FRDGBuilder& GraphBuilder, FGPUScene& GPUScene);
    
    // 由CullInstances填充, 被用於執行最終裁剪和渲染之時.
    FInstanceCullingIntermediate CullingIntermediate;
    
private:
    FInstanceCullingManagerResources& Resources;
    TArray<Nanite::FPackedView> CullingViews;
    bool bIsEnabled;
    
    (....)
};

接下來分析FInstanceCullingManager::CullInstances的程式碼:

// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.cpp

void FInstanceCullingManager::CullInstances(FRDGBuilder& GraphBuilder, FGPUScene& GPUScene)
{
#if GPUCULL_TODO
    // 獲取檢視和例項化數量.
    int32 NumViews = CullingViews.Num();
    int32 NumInstances = GPUScene.InstanceDataAllocator.GetMaxSize();
    RDG_EVENT_SCOPE(GraphBuilder, "CullInstances [%d Views X %d Instances]", NumViews, NumInstances);
    
    (......)

    TArray<uint32> NullArray;
    NullArray.AddZeroed(1);

    // 初始化裁剪中間資料CullingIntermediate.
    CullingIntermediate.InstanceIdOutOffsetBuffer = CreateStructuredBuffer(GraphBuilder, TEXT("InstanceCulling.OutputOffsetBufferOut"), NullArray);

    int32 NumInstanceFlagWords = FMath::DivideAndRoundUp(NumInstances, int32(sizeof(uint32) * 8));

    CullingIntermediate.NumInstances = NumInstances;
    CullingIntermediate.NumViews = NumViews;

    if (NumInstances && NumViews) // 檢視數量和例項化數量同時大於0才需要GPU裁剪.
    {
        // 為每個檢視的每個例項建立一個緩衝區記錄一個位,
        CullingIntermediate.VisibleInstanceFlags = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(uint32), NumInstanceFlagWords * NumViews), TEXT("InstanceCulling.VisibleInstanceFlags"));
        FRDGBufferUAVRef VisibleInstanceFlagsUAV = GraphBuilder.CreateUAV(CullingIntermediate.VisibleInstanceFlags);

        if (CVarCullInstances.GetValueOnRenderThread() != 0)
        {
            // 清理UAV.
            AddClearUAVPass(GraphBuilder, VisibleInstanceFlagsUAV, 0);

            // 處理裁剪例項CS的引數.
            FCullInstancesCs::FParameters* PassParameters = GraphBuilder.AllocParameters<FCullInstancesCs::FParameters>();

            // 從GPUScene獲取例項化和圖後設資料.
            PassParameters->GPUSceneInstanceSceneData = GPUScene.InstanceDataBuffer.SRV;
            PassParameters->GPUScenePrimitiveSceneData = GPUScene.PrimitiveBuffer.SRV;
            PassParameters->InstanceDataSOAStride = GPUScene.InstanceDataSOAStride;
            PassParameters->NumInstances = NumInstances;
            PassParameters->NumInstanceFlagWords = NumInstanceFlagWords;

            // GPU側View的型別是Nanite::FPackedView.
            // GPU側InViews的型別是StructuredBuffer< Nanite::FPackedView >.
            PassParameters->InViews = GraphBuilder.CreateSRV(CreateStructuredBuffer(GraphBuilder, TEXT("InstanceCulling.CullingViews"), CullingViews));
            PassParameters->NumViews = NumViews;

            // 儲存可見性結果的緩衝區.
            PassParameters->InstanceVisibilityFlagsOut = VisibleInstanceFlagsUAV;

            // CS用的是FCullInstancesCs, 後面再解析之.
            auto ComputeShader = GetGlobalShaderMap(GMaxRHIFeatureLevel)->GetShader<FCullInstancesCs>();

            // 增加裁剪的CS Pass.
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("CullInstancesCs"),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(NumInstances, FCullInstancesCs::NumThreadsPerGroup)
            );
        }
        else // 檢視數量和例項化數量都是0
        {
            // 所有都清理成可見.
            AddClearUAVPass(GraphBuilder, VisibleInstanceFlagsUAV, 0xFFFFFFFF);
        }
    }
#endif // GPUCULL_TODO
}

上面的邏輯就是構建裁剪著色器FCullInstancesCs的引數,呼叫FComputeShaderUtils::AddPass進行裁剪工作。下面繼續分析FCullInstancesCs的程式碼:

// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.cpp

class FCullInstancesCs : public FGlobalShader
{
    DECLARE_GLOBAL_SHADER(FCullInstancesCs);
    SHADER_USE_PARAMETER_STRUCT(FCullInstancesCs, FGlobalShader)
public:
    static constexpr int32 NumThreadsPerGroup = 64;
    static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)
    {
        return UseGPUScene(Parameters.Platform);
    }
    static void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)
    {
        FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);
        OutEnvironment.SetDefine(TEXT("INDIRECT_ARGS_NUM_WORDS"), FInstanceCullingContext::IndirectArgsNumWords);
        OutEnvironment.SetDefine(TEXT("VF_SUPPORTS_PRIMITIVE_SCENE_DATA"), 1);
        OutEnvironment.SetDefine(TEXT("USE_GLOBAL_GPU_SCENE_DATA"), 1);
        OutEnvironment.SetDefine(TEXT("NUM_THREADS_PER_GROUP"), NumThreadsPerGroup);
        OutEnvironment.SetDefine(TEXT("NANITE_MULTI_VIEW"), 1);
    }

    // 宣告著色器需要使用到的引數.
    BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
        SHADER_PARAMETER_SRV(StructuredBuffer<float4>, GPUSceneInstanceSceneData)
        SHADER_PARAMETER_SRV(StructuredBuffer<float4>, GPUScenePrimitiveSceneData)
        SHADER_PARAMETER(uint32, InstanceDataSOAStride)

        SHADER_PARAMETER_RDG_BUFFER_SRV(StructuredBuffer< Nanite::FPackedView >, InViews)

        // 儲存可見性結果的緩衝區.
        SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<uint>, InstanceVisibilityFlagsOut)

        SHADER_PARAMETER(int32, NumInstances)
        SHADER_PARAMETER(int32, NumInstanceFlagWords)
        SHADER_PARAMETER(int32, NumViews)
    END_SHADER_PARAMETER_STRUCT()
};

// 實現著色器.
IMPLEMENT_GLOBAL_SHADER(FCullInstancesCs, "/Engine/Private/InstanceCulling/CullInstances.usf", "CullInstancesCs", SF_Compute);

上面的最後一句實現巨集可知FCullInstancesCs呼叫的shader程式碼檔案是CullInstances.usf,分析之:

// Engine\Shaders\Private\InstanceCulling\CullInstances.usf

#include "../Common.ush"
#include "../SceneData.ush"
#include "../Nanite/NaniteDataDecode.ush"
#include "../Nanite/HZBCull.ush"

RWStructuredBuffer<uint> InstanceVisibilityFlagsOut;
uint NumInstances;
uint NumInstanceFlagWords;
uint NumViews;
uint InstanceDataSOAStride;

// 裁剪例項主入口.
[numthreads(NUM_THREADS_PER_GROUP, 1, 1)]
void CullInstancesCs(uint InstanceId : SV_DispatchThreadID)
{
    // 防止InstanceId越界.
    if (InstanceId >= NumInstances)
    {
        return;
    }

    const bool bNearClip = true;

    // 解壓Instance資料成Mask和Offset.
    FInstanceSceneData InstanceData = GetInstanceData(InstanceId, InstanceDataSOAStride);
    uint WordMask = 1U << (InstanceId % 32U);
    uint InstanceWordOffset = InstanceId / 32U;

    // 判定是否有效: PrimitiveId不是最大值且區域性包圍盒長度不為0.
    bool bIsValid = InstanceData.PrimitiveId != 0xFFFFFFFFu && dot(InstanceData.LocalBoundsExtent, InstanceData.LocalBoundsExtent) > 0.0f;

    // 遍歷所有view, 每個view的視錐體和例項的包圍盒做相交測試.
    for (uint ViewId = 0; ViewId < NumViews; ++ViewId)
    {
        uint Flag = WordMask;
        if (bIsValid)
        {
            FNaniteView NaniteView = GetNaniteView(ViewId);

            // 計算區域性到裁剪空間的變換矩陣.
            float4x4 LocalToTranslatedWorld = InstanceData.LocalToWorld;
            LocalToTranslatedWorld[3].xyz += NaniteView.PreViewTranslation.xyz;
            float4x4 LocalToClip = mul(LocalToTranslatedWorld, NaniteView.TranslatedWorldToClip);

            // 立方體和視錐體相交檢測.
            FFrustumCullData Cull = BoxCullFrustum(InstanceData.LocalBoundsCenter, InstanceData.LocalBoundsExtent, LocalToClip, bNearClip, false);

            if (!Cull.bIsVisible)
            {
                Flag = 0U;
            }
        }

        // 若例項可見, 設定InstanceVisibilityFlagsOut對應位置的值為1.
        if (Flag != 0U)
        {
            uint WordOffset = NumInstanceFlagWords * ViewId + InstanceWordOffset;
            // 注意CS裡需要呼叫原子操作InterlockXXX介面, 避免競爭條件.
            InterlockedOr(InstanceVisibilityFlagsOut[WordOffset], Flag);
        }
    }
}

有了VisibleInstanceFlags可見性資料,後續的Pass繪製就可以根據它來動態生成繪製指令和繪製引數,以達成GPU裁剪和驅動的渲染管線。

6.4.3.5 Nanite光柵化

Nanite光柵化主要是給每個View構建並初始化一個FCullingContext的例項,接著呼叫CullRasterize,儲存光柵化結果,構建HZB,關鍵程式碼如下:

for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
{
    const FViewInfo& View = Views[ViewIndex];

    // 初始化裁剪上下文.
    Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(
        GraphBuilder, *Scene,
        !bIsEarlyDepthComplete ? View.PrevViewInfo.NaniteHZB : View.PrevViewInfo.HZB,
        View.ViewRect,
        bTwoPassOcclusion, bUpdateStreaming, bSupportsMultiplePasses, bForceHWRaster, bPrimaryContext, bDiscardNonMoving);

    static FString EmptyFilterName = TEXT(""); 
    const bool bExtractStats = Nanite::IsStatFilterActive(EmptyFilterName);

    Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(View, RasterTextureSize, VIEW_FLAG_HZBTEST, 3);

    // 帶裁剪的光柵化.
    Nanite::CullRasterize(GraphBuilder, *Scene, { PackedView }, CullingContext, RasterContext, RasterState, nullptr, bExtractStats);

    Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];

    // 渲染提前渲染.
    if (bNeedsPrePass)
    {
        Nanite::EmitDepthTargets(GraphBuilder, *Scene, Views[ViewIndex], CullingContext.SOAStrides, CullingContext.VisibleClustersSWHW, CullingContext.ViewsBuffer, SceneTextures.Depth.Target, RasterContext.VisBuffer64,RasterResults.MaterialDepth,RasterResults.NaniteMask,RasterResults.VelocityBuffer,bNeedsPrePass);
    }

    // 構建HZB.
    if (!bIsEarlyDepthComplete && bTwoPassOcclusion && View.ViewState)
    {
        RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BuildHZB");

        FRDGTextureRef SceneDepth = SystemTextures.Black;
        FRDGTextureRef GraphHZB = nullptr;

        BuildHZBFurthest(GraphBuilder,SceneDepth, RasterContext.VisBuffer64, PrimaryViewRect, FeatureLevel, ShaderPlatform, TEXT("Nanite.HZB"), &GraphHZB );

        GraphBuilder.QueueTextureExtraction( GraphHZB, &View.ViewState->PrevFrameViewInfo.NaniteHZB );
    }

    // 提取光柵化和裁剪結果.
    Nanite::ExtractResults(GraphBuilder, CullingContext, RasterContext, RasterResults);
}

著重分析一下Nanite::CullRasterize的程式碼:

// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp

void CullRasterize(
    FRDGBuilder& GraphBuilder,
    const FScene& Scene,
    const TArray<FPackedView, SceneRenderingAllocator>& Views,
    uint32 NumPrimaryViews,    // Number of non-mip views
    FCullingContext& CullingContext,
    const FRasterContext& RasterContext,
    const FRasterState& RasterState,
    const TArray<FInstanceDraw, SceneRenderingAllocator>* OptionalInstanceDraws,
    // VirtualShadowMapArray is the supplier of virtual to physical translation, probably could abstract this a bit better,
    FVirtualShadowMapArray* VirtualShadowMapArray,
    bool bExtractStats
)
{
    // 如果檢視太多, 拆分到多個Pass去光柵化. 只有depth-only渲染才可能發生.
    if (Views.Num() > MAX_VIEWS_PER_CULL_RASTERIZE_PASS)
    {
        CullRasterizeMultiPass(GraphBuilder, Scene, Views, NumPrimaryViews, CullingContext, RasterContext, RasterState, OptionalInstanceDraws, VirtualShadowMapArray, bExtractStats);
        return;
    }

    RDG_EVENT_SCOPE(GraphBuilder, "Nanite::CullRasterize");

    (......)

    // 建立檢視的結構化緩衝.
    {
        const uint32 ViewsBufferElements = FMath::RoundUpToPowerOfTwo(Views.Num());
        CullingContext.ViewsBuffer = CreateStructuredBuffer(GraphBuilder, TEXT("Nanite.Views"), Views.GetTypeSize(), ViewsBufferElements, Views.GetData(), Views.Num() * Views.GetTypeSize());
    }

    // 處理裁剪上下文的結構化緩衝.
    if (OptionalInstanceDraws)
    {
        const uint32 InstanceDrawsBufferElements = FMath::RoundUpToPowerOfTwo(OptionalInstanceDraws->Num());
        CullingContext.InstanceDrawsBuffer = CreateStructuredBuffer
        (
            GraphBuilder,
            TEXT("Nanite.InstanceDraws"),
            OptionalInstanceDraws->GetTypeSize(),
            InstanceDrawsBufferElements,
            OptionalInstanceDraws->GetData(),
            OptionalInstanceDraws->Num() * OptionalInstanceDraws->GetTypeSize()
        );
        CullingContext.NumInstancesPreCull = OptionalInstanceDraws->Num();
    }
    else
    {
        CullingContext.InstanceDrawsBuffer = nullptr;
        CullingContext.NumInstancesPreCull = Scene.GPUScene.InstanceDataAllocator.GetMaxSize();
    }

    (......)
    
    // 裁剪引數.
    FCullingParameters CullingParameters;
    {
        CullingParameters.InViews        = GraphBuilder.CreateSRV(CullingContext.ViewsBuffer);
        CullingParameters.NumViews        = Views.Num();
        CullingParameters.NumPrimaryViews = NumPrimaryViews;
        CullingParameters.DisocclusionLodScaleFactor = GNaniteDisocclusionHack ? 0.01f : 1.0f;    // TODO: Get rid of this hack
        CullingParameters.HZBTexture    = RegisterExternalTextureWithFallback(GraphBuilder, CullingContext.PrevHZB, GSystemTextures.BlackDummy);
        CullingParameters.HZBSize        = CullingContext.PrevHZB ? CullingContext.PrevHZB->GetDesc().Extent : FVector2D(0.0f);
        CullingParameters.HZBSampler    = TStaticSamplerState< SF_Point, AM_Clamp, AM_Clamp, AM_Clamp >::GetRHI();
        CullingParameters.SOAStrides    = CullingContext.SOAStrides;
        CullingParameters.MaxCandidateClusters    = Nanite::FGlobalResources::GetMaxCandidateClusters();
        CullingParameters.MaxVisibleClusters    = Nanite::FGlobalResources::GetMaxVisibleClusters();
        CullingParameters.RenderFlags    = CullingContext.RenderFlags;
        CullingParameters.DebugFlags    = CullingContext.DebugFlags;
        CullingParameters.CompactedViewInfo = nullptr;
        CullingParameters.CompactedViewsAllocation = nullptr;
    }

    FVirtualTargetParameters VirtualTargetParameters;
    // 處理VSM(虛擬陰影圖)陣列.
    if (VirtualShadowMapArray)
    {
        VirtualTargetParameters.VirtualShadowMap = VirtualShadowMapArray->GetUniformBuffer(GraphBuilder);
        VirtualTargetParameters.PageFlags = GraphBuilder.CreateSRV(VirtualShadowMapArray->PageFlagsRDG, PF_R32_UINT);
        VirtualTargetParameters.HPageFlags = GraphBuilder.CreateSRV(VirtualShadowMapArray->HPageFlagsRDG, PF_R32_UINT);
        VirtualTargetParameters.PageRectBounds = GraphBuilder.CreateSRV(VirtualShadowMapArray->PageRectBoundsRDG);

        // 如果提供了來自上一幀的HZB, 也需要上一幀的Page表.
        FRDGBufferRef HZBPageTableRDG = VirtualShadowMapArray->PageTableRDG;
        if (CullingContext.PrevHZB)
        {
            check( VirtualShadowMapArray->CacheManager );
            TRefCountPtr<FRDGPooledBuffer> HZBPageTable = VirtualShadowMapArray->CacheManager->PrevBuffers.PageTable;
            check( HZBPageTable );
            HZBPageTableRDG = GraphBuilder.RegisterExternalBuffer( HZBPageTable, TEXT( "Shadow.Virtual.HZBPageTable" ) );
        }
        VirtualTargetParameters.ShadowHZBPageTable = GraphBuilder.CreateSRV( HZBPageTableRDG, PF_R32_UINT );
    }
    
    // 處理GPUScene資料.
    FGPUSceneParameters GPUSceneParameters;
    GPUSceneParameters.GPUSceneInstanceSceneData = Scene.GPUScene.InstanceDataBuffer.SRV;
    GPUSceneParameters.GPUScenePrimitiveSceneData = Scene.GPUScene.PrimitiveBuffer.SRV;
    GPUSceneParameters.GPUSceneFrameNumber = Scene.GPUScene.GetSceneFrameNumber();
    
    // 裁剪VSM.
    if (VirtualShadowMapArray && CVarCompactVSMViews.GetValueOnRenderThread() != 0)
    {
        RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteInstanceCullVSM);

        // 壓縮檢視來刪除不必要的(空的)mip檢視, 需要在GPU上做,因為GPU側才知道mip擁有哪些page。
        const uint32 ViewsBufferElements = FMath::RoundUpToPowerOfTwo(Views.Num());
        FRDGBufferRef CompactedViews = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(FPackedView), ViewsBufferElements), TEXT("Shadow.Virtual.CompactedViews"));
        FRDGBufferRef CompactedViewInfo = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(FCompactedViewInfo), Views.Num()), TEXT("Shadow.Virtual.CompactedViewInfo"));

        const static uint32 TheZeros[2] = { 0U, 0U };
        FRDGBufferRef CompactedViewsAllocation = CreateStructuredBuffer(GraphBuilder, TEXT("Shadow.Virtual.CompactedViewsAllocation"), sizeof(uint32), 2, TheZeros, sizeof(TheZeros), ERDGInitialDataFlags::NoCopy);
        {
            FCompactViewsVSM_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FCompactViewsVSM_CS::FParameters >();

            PassParameters->GPUSceneParameters = GPUSceneParameters;
            PassParameters->CullingParameters = CullingParameters;
            PassParameters->VirtualShadowMap = VirtualTargetParameters;

            PassParameters->CompactedViewsOut = GraphBuilder.CreateUAV(CompactedViews);
            PassParameters->CompactedViewInfoOut = GraphBuilder.CreateUAV(CompactedViewInfo);
            PassParameters->CompactedViewsAllocationOut = GraphBuilder.CreateUAV(CompactedViewsAllocation);

            auto ComputeShader = CullingContext.ShaderMap->GetShader<FCompactViewsVSM_CS>();

            // 利用CS壓縮並裁剪VSM.
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("CompactViewsVSM"),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(NumPrimaryViews, 64)
            );
        }

        // 用壓縮的檢視覆蓋原有的資訊.
        CullingParameters.InViews = GraphBuilder.CreateSRV(CompactedViews);
        CullingContext.ViewsBuffer = CompactedViews;
        CullingParameters.CompactedViewInfo = GraphBuilder.CreateSRV(CompactedViewInfo);
        CullingParameters.CompactedViewsAllocation = GraphBuilder.CreateSRV(CompactedViewsAllocation);
    }
    
    // 初始化裁剪上下文的引數.
    {
        FInitArgs_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FInitArgs_CS::FParameters >();

        PassParameters->RenderFlags = CullingParameters.RenderFlags;

        PassParameters->OutMainAndPostPassPersistentStates    = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
        PassParameters->InOutMainPassRasterizeArgsSWHW        = GraphBuilder.CreateUAV( CullingContext.MainRasterizeArgsSWHW );

        uint32 ClampedDrawPassIndex = FMath::Min(CullingContext.DrawPassIndex, 2u);

        if (CullingContext.bTwoPassOcclusion)
        {
            PassParameters->OutOccludedInstancesArgs = GraphBuilder.CreateUAV( CullingContext.OccludedInstancesArgs );
            PassParameters->InOutPostPassRasterizeArgsSWHW = GraphBuilder.CreateUAV( CullingContext.PostRasterizeArgsSWHW );
        }
        
        if (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA)
        {
            PassParameters->InOutTotalPrevDrawClusters = GraphBuilder.CreateUAV(CullingContext.TotalPrevDrawClustersBuffer);
        }
        else
        {
            // Use any UAV just to keep render graph happy that something is bound, but the shader doesn't actually touch this.
            PassParameters->InOutTotalPrevDrawClusters = PassParameters->OutMainAndPostPassPersistentStates;
        }

        FInitArgs_CS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FInitArgs_CS::FOcclusionCullingDim>( CullingContext.bTwoPassOcclusion );
        PermutationVector.Set<FInitArgs_CS::FDrawPassIndexDim>( ClampedDrawPassIndex );
        
        auto ComputeShader = CullingContext.ShaderMap->GetShader< FInitArgs_CS >( PermutationVector );

        // 也是用CS初始化引數.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME( "InitArgs" ),
            ComputeShader,
            PassParameters,
            FIntVector( 1, 1, 1 )
        );
    }

    // 分配候選緩衝區, 生命週期只在CullRasterize期間.
    FRDGBufferRef MainCandidateNodesAndClustersBuffer = nullptr;
    FRDGBufferRef PostCandidateNodesAndClustersBuffer = nullptr;
    AllocateCandidateBuffers(GraphBuilder, CullingContext.ShaderMap, &MainCandidateNodesAndClustersBuffer, CullingContext.bTwoPassOcclusion ? &PostCandidateNodesAndClustersBuffer : nullptr);

    // 例項化層級和Cluster裁剪, 包含無遮擋Pass或遮擋主Pass.
    AddPass_InstanceHierarchyAndClusterCull(
        GraphBuilder,
        Scene,
        CullingParameters,
        Views,
        NumPrimaryViews,
        CullingContext,
        RasterContext,
        RasterState,
        GPUSceneParameters,
        MainCandidateNodesAndClustersBuffer,
        PostCandidateNodesAndClustersBuffer,
        CullingContext.bTwoPassOcclusion ? CULLING_PASS_OCCLUSION_MAIN : CULLING_PASS_NO_OCCLUSION,
        VirtualShadowMapArray,
        VirtualTargetParameters
    );

    // 光柵化.
    AddPass_Rasterize(
        GraphBuilder,
        Views,
        RasterContext,
        RasterState,
        CullingContext.SOAStrides,
        CullingContext.RenderFlags,
        CullingContext.ViewsBuffer,
        CullingContext.VisibleClustersSWHW,
        nullptr,
        CullingContext.SafeMainRasterizeArgsSWHW,
        CullingContext.TotalPrevDrawClustersBuffer,
        GPUSceneParameters,
        true,
        VirtualShadowMapArray,
        VirtualTargetParameters
    );
    
    // 遮擋後置Pass. 重新檢測上一幀不可見的例項和Cluster, 如果它們此幀可見, 渲染之.
    if (CullingContext.bTwoPassOcclusion)
    {
        // 用上一幀的遮擋體建立一個最近的HZB,以再次檢測剩餘的遮擋體。
        {
            RDG_EVENT_SCOPE(GraphBuilder, "BuildPreviousOccluderHZB");
            
            FSceneTextureParameters SceneTextures = GetSceneTextureParameters(GraphBuilder);

            FRDGTextureRef SceneDepth = SceneTextures.SceneDepthTexture;
            FRDGTextureRef RasterizedDepth = RasterContext.VisBuffer64;

            if( RasterContext.RasterTechnique == ERasterTechnique::DepthOnly )
            {
                SceneDepth = GraphBuilder.RegisterExternalTexture( GSystemTextures.BlackDummy );
                RasterizedDepth = RasterContext.DepthBuffer;
            }

            FRDGTextureRef OutFurthestHZBTexture;

            FIntRect ViewRect(0, 0, RasterContext.TextureSize.X, RasterContext.TextureSize.Y);
            if (Views.Num() == 1)
            {
                ViewRect = FIntRect(Views[0].ViewRect.X, Views[0].ViewRect.Y, Views[0].ViewRect.Z, Views[0].ViewRect.W);
            }
            
            // 構建HZB.
            BuildHZBFurthest(
                GraphBuilder,
                SceneDepth,
                RasterizedDepth,
                CullingContext.HZBBuildViewRect,
                Scene.GetFeatureLevel(),
                Scene.GetShaderPlatform(),
                TEXT("Nanite.PreviousOccluderHZB"),
                /* OutFurthestHZBTexture = */ &OutFurthestHZBTexture);

            CullingParameters.HZBTexture = OutFurthestHZBTexture;
            CullingParameters.HZBSize = CullingParameters.HZBTexture->Desc.Extent;
        }

        // 後置Pass.
        AddPass_InstanceHierarchyAndClusterCull(
            GraphBuilder,
            Scene,
            CullingParameters,
            Views,
            NumPrimaryViews,
            CullingContext,
            RasterContext,
            RasterState,
            GPUSceneParameters,
            MainCandidateNodesAndClustersBuffer,
            PostCandidateNodesAndClustersBuffer,
            CULLING_PASS_OCCLUSION_POST,
            VirtualShadowMapArray,
            VirtualTargetParameters
        );

        // 渲染後置Pass.
        AddPass_Rasterize(
            GraphBuilder,
            Views,
            RasterContext,
            RasterState,
            CullingContext.SOAStrides,
            CullingContext.RenderFlags,
            CullingContext.ViewsBuffer,
            CullingContext.VisibleClustersSWHW,
            CullingContext.MainRasterizeArgsSWHW,
            CullingContext.SafePostRasterizeArgsSWHW,
            CullingContext.TotalPrevDrawClustersBuffer,
            GPUSceneParameters,
            false,
            VirtualShadowMapArray,
            VirtualTargetParameters
        );
    }

    if (RasterContext.RasterTechnique != ERasterTechnique::DepthOnly)
    {
        // 上一個Pass渲染的Cluster索引和數量和僅深度渲染毫無關聯.
        CullingContext.DrawPassIndex++;
        CullingContext.RenderFlags |= RENDER_FLAG_HAVE_PREV_DRAW_DATA;
    }

    (......)
}

下面將注意力放到AddPass_InstanceHierarchyAndClusterCull和AddPass_Rasterize兩個介面。首先是AddPass_InstanceHierarchyAndClusterCull:

void AddPass_InstanceHierarchyAndClusterCull(
    FRDGBuilder& GraphBuilder,
    const FScene& Scene,
    const FCullingParameters& CullingParameters,
    const TArray<FPackedView, SceneRenderingAllocator>& Views,
    const uint32 NumPrimaryViews,
    const FCullingContext& CullingContext,
    const FRasterContext& RasterContext,
    const FRasterState& RasterState,
    const FGPUSceneParameters &GPUSceneParameters,
    FRDGBufferRef MainCandidateNodesAndClusters,
    FRDGBufferRef PostCandidateNodesAndClusters,
    uint32 CullingPass,
    FVirtualShadowMapArray *VirtualShadowMapArray,
    FVirtualTargetParameters &VirtualTargetParameters
    )
{
    (......)

    const bool bMultiView = Views.Num() > 1 || VirtualShadowMapArray != nullptr;

    if (VirtualShadowMapArray)
    {
        (......)
    }
    // 處理例項化裁剪.
    else if (CullingContext.NumInstancesPreCull > 0 || CullingPass == CULLING_PASS_OCCLUSION_POST)
    {
        RDG_GPU_STAT_SCOPE( GraphBuilder, NaniteInstanceCull );
        
        // 處理例項化裁剪CS的引數.
        FInstanceCull_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FInstanceCull_CS::FParameters >();

        PassParameters->NumInstances                        = CullingContext.NumInstancesPreCull;
        PassParameters->MaxNodes                            = Nanite::FGlobalResources::GetMaxNodes();
        PassParameters->ImposterMaxPixels                    = GNaniteImposterMaxPixels;

        PassParameters->GPUSceneParameters = GPUSceneParameters;
        PassParameters->RasterParameters = RasterContext.Parameters;
        PassParameters->CullingParameters = CullingParameters;

        const ERasterTechnique Technique = RasterContext.RasterTechnique;
        PassParameters->OnlyCastShadowsPrimitives = Technique == ERasterTechnique::DepthOnly ? 1 : 0;

        PassParameters->ImposterAtlas = Nanite::GStreamingManager.GetRootPagesSRV();

        PassParameters->OutMainAndPostPassPersistentStates    = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
        
        if (CullingContext.StatsBuffer)
        {
            PassParameters->OutStatsBuffer                    = GraphBuilder.CreateUAV(CullingContext.StatsBuffer);
        }

        // 根據不同的裁剪方式設定不同的引數.
        if( CullingPass == CULLING_PASS_NO_OCCLUSION )
        {
            if( CullingContext.InstanceDrawsBuffer )
            {
                PassParameters->InInstanceDraws            = GraphBuilder.CreateSRV( CullingContext.InstanceDrawsBuffer );
            }
            PassParameters->OutCandidateNodesAndClusters    = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters);
        }
        else if( CullingPass == CULLING_PASS_OCCLUSION_MAIN )
        {
            PassParameters->OutOccludedInstances        = GraphBuilder.CreateUAV( CullingContext.OccludedInstances );
            PassParameters->OutOccludedInstancesArgs    = GraphBuilder.CreateUAV( CullingContext.OccludedInstancesArgs );
            PassParameters->OutCandidateNodesAndClusters    = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters );
        }
        else
        {
            PassParameters->InInstanceDraws                = GraphBuilder.CreateSRV( CullingContext.OccludedInstances );
            PassParameters->InOccludedInstancesArgs        = GraphBuilder.CreateSRV( CullingContext.OccludedInstancesArgs );
            PassParameters->OutCandidateNodesAndClusters    = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters);
        }
        
        check(CullingContext.ViewsBuffer);

        // 處理排列引數.
        const uint32 InstanceCullingPass = CullingContext.InstanceDrawsBuffer != nullptr ? CULLING_PASS_EXPLICIT_LIST : CullingPass;
        FInstanceCull_CS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FInstanceCull_CS::FCullingPassDim>(InstanceCullingPass);
        PermutationVector.Set<FInstanceCull_CS::FMultiViewDim>(bMultiView);
        PermutationVector.Set<FInstanceCull_CS::FNearClipDim>(RasterState.bNearClip);
        PermutationVector.Set<FInstanceCull_CS::FDebugFlagsDim>(CullingContext.DebugFlags != 0);
        PermutationVector.Set<FInstanceCull_CS::FRasterTechniqueDim>(int32(RasterContext.RasterTechnique));

        auto ComputeShader = CullingContext.ShaderMap->GetShader<FInstanceCull_CS>(PermutationVector);
        
        // 後置Pass例項裁剪.
        if( InstanceCullingPass == CULLING_PASS_OCCLUSION_POST )
        {
            PassParameters->IndirectArgs = CullingContext.OccludedInstancesArgs;
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME( "Post Pass: InstanceCull" ),
                ComputeShader,
                PassParameters,
                PassParameters->IndirectArgs,
                0
            );
        }
        else // 主通道例項裁剪.
        {
            FComputeShaderUtils::AddPass(
                GraphBuilder,
                InstanceCullingPass == CULLING_PASS_OCCLUSION_MAIN ?    RDG_EVENT_NAME( "Main Pass: InstanceCull" ) : 
                InstanceCullingPass == CULLING_PASS_NO_OCCLUSION ?        RDG_EVENT_NAME( "Main Pass: InstanceCull - No occlusion" ) :
                                                                        RDG_EVENT_NAME( "Main Pass: InstanceCull - Explicit list" ),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(CullingContext.NumInstancesPreCull, 64)
            );
        }
    }

    // Cluster裁剪.
    {
        RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteClusterCull);
        FPersistentClusterCull_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FPersistentClusterCull_CS::FParameters >();

        // Cluster裁剪用到了GPUScene、GStreamingManager等引數。
        PassParameters->GPUSceneParameters    = GPUSceneParameters;
        PassParameters->CullingParameters    = CullingParameters;
        PassParameters->MaxNodes            = Nanite::FGlobalResources::GetMaxNodes();
        
        PassParameters->ClusterPageHeaders    = Nanite::GStreamingManager.GetClusterPageHeadersSRV();
        PassParameters->ClusterPageData        = Nanite::GStreamingManager.GetClusterPageDataSRV();
        PassParameters->HierarchyBuffer        = Nanite::GStreamingManager.GetHierarchySRV();
        
        check(CullingContext.DrawPassIndex == 0 || CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA); // sanity check
        // 處理上一幀資料.
        if (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA)
        {
            PassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(CullingContext.TotalPrevDrawClustersBuffer);
        }
        else
        {
            FRDGBufferRef Dummy = GraphBuilder.RegisterExternalBuffer(Nanite::GGlobalResources.GetStructureBufferStride8(), TEXT("Nanite.StructuredBufferStride8"));
            PassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(Dummy);
        }

        PassParameters->MainAndPostPassPersistentStates    = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
        
        // 候選節點和Cluster.
        if( CullingPass == CULLING_PASS_NO_OCCLUSION || CullingPass == CULLING_PASS_OCCLUSION_MAIN )
        {
            PassParameters->InOutCandidateNodesAndClusters    = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters );
            PassParameters->VisibleClustersArgsSWHW    = GraphBuilder.CreateUAV( CullingContext.MainRasterizeArgsSWHW );
            
            if( CullingPass == CULLING_PASS_OCCLUSION_MAIN )
            {
                PassParameters->OutOccludedNodesAndClusters    = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters );
            }
        }
        else
        {
            PassParameters->InOutCandidateNodesAndClusters    = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters );
            PassParameters->OffsetClustersArgsSWHW    = GraphBuilder.CreateSRV( CullingContext.MainRasterizeArgsSWHW );
            PassParameters->VisibleClustersArgsSWHW    = GraphBuilder.CreateUAV( CullingContext.PostRasterizeArgsSWHW );
        }

        // 輸出結果UAV, 包含可見Cluster和流請求.
        PassParameters->OutVisibleClustersSWHW            = GraphBuilder.CreateUAV( CullingContext.VisibleClustersSWHW );
        PassParameters->OutStreamingRequests            = GraphBuilder.CreateUAV( CullingContext.StreamingRequests );

        if (VirtualShadowMapArray)
        {
            PassParameters->VirtualShadowMap = VirtualTargetParameters;
            PassParameters->OutDynamicCasterFlags = GraphBuilder.CreateUAV(VirtualShadowMapArray->DynamicCasterPageFlagsRDG, PF_R32_UINT);
        }

        if (CullingContext.StatsBuffer)
        {
            PassParameters->OutStatsBuffer = GraphBuilder.CreateUAV(CullingContext.StatsBuffer);
        }

        PassParameters->LargePageRectThreshold = CVarLargePageRectThreshold.GetValueOnRenderThread();

        check(CullingContext.ViewsBuffer);

        // 排列.
        FPersistentClusterCull_CS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FPersistentClusterCull_CS::FCullingPassDim>(CullingPass);
        PermutationVector.Set<FPersistentClusterCull_CS::FMultiViewDim>(bMultiView);
        PermutationVector.Set<FPersistentClusterCull_CS::FNearClipDim>(RasterState.bNearClip);
        PermutationVector.Set<FPersistentClusterCull_CS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
        PermutationVector.Set<FPersistentClusterCull_CS::FClusterPerPageDim>(GNaniteClusterPerPage && VirtualShadowMapArray != nullptr);
        PermutationVector.Set<FPersistentClusterCull_CS::FDebugFlagsDim>(CullingContext.DebugFlags != 0);

        auto ComputeShader = CullingContext.ShaderMap->GetShader<FPersistentClusterCull_CS>(PermutationVector);

        // CS Pass呼叫.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            CullingPass == CULLING_PASS_NO_OCCLUSION    ? RDG_EVENT_NAME( "Main Pass: PersistentCull - No occlusion" ) :
            CullingPass == CULLING_PASS_OCCLUSION_MAIN    ? RDG_EVENT_NAME( "Main Pass: PersistentCull" ) :
            RDG_EVENT_NAME( "Post Pass: PersistentCull" ),
            ComputeShader,
            PassParameters,
            FIntVector(GRHIPersistentThreadGroupCount, 1, 1)
        );
    }

    // 計算光柵化引數, 以保證後續的光柵化通道正確且安全.
    {
        FCalculateSafeRasterizerArgs_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FCalculateSafeRasterizerArgs_CS::FParameters >();

        const bool bPrevDrawData    = (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA) != 0;
        const bool bPostPass        = (CullingPass == CULLING_PASS_OCCLUSION_POST) != 0;

        if (bPrevDrawData)
        {
            PassParameters->InTotalPrevDrawClusters        = GraphBuilder.CreateSRV(CullingContext.TotalPrevDrawClustersBuffer);
        }
        
        if (bPostPass)
        {
            PassParameters->OffsetClustersArgsSWHW        = GraphBuilder.CreateSRV(CullingContext.MainRasterizeArgsSWHW);
            PassParameters->InRasterizerArgsSWHW        = GraphBuilder.CreateSRV(CullingContext.PostRasterizeArgsSWHW);
            PassParameters->OutSafeRasterizerArgsSWHW    = GraphBuilder.CreateUAV(CullingContext.SafePostRasterizeArgsSWHW);
        }
        else
        {
            PassParameters->InRasterizerArgsSWHW        = GraphBuilder.CreateSRV(CullingContext.MainRasterizeArgsSWHW);
            PassParameters->OutSafeRasterizerArgsSWHW    = GraphBuilder.CreateUAV(CullingContext.SafeMainRasterizeArgsSWHW);
        }
        
        PassParameters->MaxVisibleClusters                = Nanite::FGlobalResources::GetMaxVisibleClusters();
        PassParameters->RenderFlags                        = CullingContext.RenderFlags;
        
        FCalculateSafeRasterizerArgs_CS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FCalculateSafeRasterizerArgs_CS::FHasPrevDrawData>(bPrevDrawData);
        PermutationVector.Set<FCalculateSafeRasterizerArgs_CS::FIsPostPass>(bPostPass);

        auto ComputeShader = CullingContext.ShaderMap->GetShader< FCalculateSafeRasterizerArgs_CS >(PermutationVector);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            bPostPass ? RDG_EVENT_NAME("Post Pass: CalculateSafeRasterizerArgs") : RDG_EVENT_NAME("Main Pass: CalculateSafeRasterizerArgs"),
            ComputeShader,
            PassParameters,
            FIntVector(1, 1, 1)
        );
    }
}

上面涉及多次Compute Shader的呼叫,限於篇幅,就不對其shader程式碼進行剖析了。下面將重點放到AddPass_Rasterize:

void AddPass_Rasterize(
    FRDGBuilder& GraphBuilder,
    const TArray<FPackedView, SceneRenderingAllocator>& Views,
    const FRasterContext& RasterContext,
    const FRasterState& RasterState,
    FIntVector4 SOAStrides, 
    uint32 RenderFlags,
    FRDGBufferRef ViewsBuffer,
    FRDGBufferRef VisibleClustersSWHW,
    FRDGBufferRef ClusterOffsetSWHW,
    FRDGBufferRef IndirectArgs,
    FRDGBufferRef TotalPrevDrawClustersBuffer,
    const FGPUSceneParameters& GPUSceneParameters,
    bool bMainPass,
    FVirtualShadowMapArray* VirtualShadowMapArray,
    FVirtualTargetParameters& VirtualTargetParameters
)
{
    (......)

    // 分配光柵化引數.
    auto* RasterPassParameters = GraphBuilder.AllocParameters<FHWRasterizePS::FParameters>();
    auto* CommonPassParameters = &RasterPassParameters->Common;

    // 設定Cluster頁面和頁面頭.
    CommonPassParameters->ClusterPageData = GStreamingManager.GetClusterPageDataSRV();
    CommonPassParameters->ClusterPageHeaders = GStreamingManager.GetClusterPageHeadersSRV();

    // 檢視緩衝資料.
    if (ViewsBuffer)
    {
        CommonPassParameters->InViews = GraphBuilder.CreateSRV(ViewsBuffer);
    }

    // 繪製引數.
    CommonPassParameters->GPUSceneParameters = GPUSceneParameters;
    CommonPassParameters->RasterParameters = RasterContext.Parameters;
    CommonPassParameters->VisualizeModeBitMask = RasterContext.VisualizeModeBitMask;
    CommonPassParameters->SOAStrides = SOAStrides;
    CommonPassParameters->MaxVisibleClusters = Nanite::FGlobalResources::GetMaxVisibleClusters();
    CommonPassParameters->RenderFlags = RenderFlags;
    if (RasterState.CullMode == CM_CCW)
    {
        CommonPassParameters->RenderFlags |= RENDER_FLAG_REVERSE_CULLING;
    }
    CommonPassParameters->VisibleClustersSWHW = GraphBuilder.CreateSRV(VisibleClustersSWHW);
    
    if (VirtualShadowMapArray)
    {
        CommonPassParameters->VirtualShadowMap = VirtualTargetParameters;
    }

    if (!bMainPass)
    {
        CommonPassParameters->InClusterOffsetSWHW = GraphBuilder.CreateSRV(ClusterOffsetSWHW);
    }
    CommonPassParameters->IndirectArgs = IndirectArgs;

    const bool bHavePrevDrawData = (RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA);
    if (bHavePrevDrawData)
    {
        CommonPassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(TotalPrevDrawClustersBuffer);
    }

    const ERasterTechnique Technique = RasterContext.RasterTechnique;
    const ERasterScheduling Scheduling = RasterContext.RasterScheduling;
    const bool bNearClip = RasterState.bNearClip;
    const bool bMultiView = Views.Num() > 1 || VirtualShadowMapArray != nullptr;

    ERDGPassFlags ComputePassFlags = ERDGPassFlags::Compute;

    // 如果是軟硬體結合的方式, 建立帶SkipBarrier標記的UAV.
    if (Scheduling == ERasterScheduling::HardwareAndSoftwareOverlap)
    {
        const auto CreateSkipBarrierUAV = [&](auto& InOutUAV)
        {
            if (InOutUAV)
            {
                // 帶了ERDGUnorderedAccessViewFlags::SkipBarrier標記.
                InOutUAV = GraphBuilder.CreateUAV(InOutUAV->Desc, ERDGUnorderedAccessViewFlags::SkipBarrier);
            }
        };

        // 建立帶SkipBarrier標記的UAV, 以允許軟硬體交叉重疊.
        CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDepthBuffer);
        CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutVisBuffer64);
        CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDbgBuffer64);
        CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDbgBuffer32);
        CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.LockBuffer);

        ComputePassFlags = ERDGPassFlags::AsyncCompute;
    }

    FIntRect ViewRect(Views[0].ViewRect.X, Views[0].ViewRect.Y, Views[0].ViewRect.Z, Views[0].ViewRect.W);
    if (bMultiView)
    {
        ViewRect.Min = FIntPoint::ZeroValue;
        ViewRect.Max = RasterContext.TextureSize;
    }

    // 處理VSM.
    if (VirtualShadowMapArray)
    {
        ViewRect.Min = FIntPoint::ZeroValue;
        if( GNaniteClusterPerPage )
        {
            ViewRect.Max = FIntPoint( FVirtualShadowMap::PageSize, FVirtualShadowMap::PageSize ) * FVirtualShadowMap::RasterWindowPages;
        }
        else
        {
            ViewRect.Max = FIntPoint( FVirtualShadowMap::VirtualMaxResolutionXY, FVirtualShadowMap::VirtualMaxResolutionXY );
        }
    }

    // 先用傳統的硬體渲染管線光柵化.
    {
        const bool bUsePrimitiveShader = UsePrimitiveShader();

        const bool bUseAutoCullingShader =
            GRHISupportsPrimitiveShaders &&
            !bUsePrimitiveShader &&
            GNaniteAutoShaderCulling != 0;

        // 處理VS引數.
        FHWRasterizeVS::FPermutationDomain PermutationVectorVS;
        PermutationVectorVS.Set<FHWRasterizeVS::FRasterTechniqueDim>(int32(Technique));
        PermutationVectorVS.Set<FHWRasterizeVS::FAddClusterOffset>(bMainPass ? 0 : 1);
        PermutationVectorVS.Set<FHWRasterizeVS::FMultiViewDim>(bMultiView);
        PermutationVectorVS.Set<FHWRasterizeVS::FPrimShaderDim>(bUsePrimitiveShader);
        PermutationVectorVS.Set<FHWRasterizeVS::FAutoShaderCullDim>(bUseAutoCullingShader);
        PermutationVectorVS.Set<FHWRasterizeVS::FHasPrevDrawData>(bHavePrevDrawData);
        PermutationVectorVS.Set<FHWRasterizeVS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
        PermutationVectorVS.Set<FHWRasterizeVS::FNearClipDim>(bNearClip);
        PermutationVectorVS.Set<FHWRasterizeVS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
        PermutationVectorVS.Set<FHWRasterizeVS::FClusterPerPageDim>(GNaniteClusterPerPage && VirtualShadowMapArray != nullptr );

        // 處理PS引數.
        FHWRasterizePS::FPermutationDomain PermutationVectorPS;
        PermutationVectorPS.Set<FHWRasterizePS::FRasterTechniqueDim>(int32(Technique));
        PermutationVectorPS.Set<FHWRasterizePS::FMultiViewDim>(bMultiView);
        PermutationVectorPS.Set<FHWRasterizePS::FPrimShaderDim>(bUsePrimitiveShader);
        PermutationVectorPS.Set<FHWRasterizePS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
        PermutationVectorPS.Set<FHWRasterizePS::FNearClipDim>(bNearClip);
        PermutationVectorPS.Set<FHWRasterizePS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
        PermutationVectorPS.Set<FHWRasterizePS::FClusterPerPageDim>( GNaniteClusterPerPage && VirtualShadowMapArray != nullptr );

        auto VertexShader = RasterContext.ShaderMap->GetShader<FHWRasterizeVS>(PermutationVectorVS);
        auto PixelShader  = RasterContext.ShaderMap->GetShader<FHWRasterizePS>(PermutationVectorPS);

        // 增加光柵化Pass.
        GraphBuilder.AddPass(
            bMainPass ? RDG_EVENT_NAME("Main Pass: Rasterize") : RDG_EVENT_NAME("Post Pass: Rasterize"),
            RasterPassParameters,
            ERDGPassFlags::Raster | ERDGPassFlags::SkipRenderPass,
            [VertexShader, PixelShader, RasterPassParameters, ViewRect, bUsePrimitiveShader, bMainPass](FRHICommandListImmediate& RHICmdList)
        {
            // 渲染Pass資訊.
            FRHIRenderPassInfo RPInfo;
            // Resolve引數.
            RPInfo.ResolveParameters.DestRect.X1 = ViewRect.Min.X;
            RPInfo.ResolveParameters.DestRect.Y1 = ViewRect.Min.Y;
            RPInfo.ResolveParameters.DestRect.X2 = ViewRect.Max.X;
            RPInfo.ResolveParameters.DestRect.Y2 = ViewRect.Max.Y;
            
            RHICmdList.BeginRenderPass(RPInfo, bMainPass ? TEXT("Main Pass: Rasterize") : TEXT("Post Pass: Rasterize"));
            RHICmdList.SetViewport(ViewRect.Min.X, ViewRect.Min.Y, 0.0f, FMath::Min(ViewRect.Max.X, 32767), FMath::Min(ViewRect.Max.Y, 32767), 1.0f);

            FGraphicsPipelineStateInitializer GraphicsPSOInit;
            RHICmdList.ApplyCachedRenderTargets(GraphicsPSOInit);

            // PSO.
            GraphicsPSOInit.BlendState = TStaticBlendState<>::GetRHI();
            GraphicsPSOInit.RasterizerState = GetStaticRasterizerState<false>(FM_Solid, CM_CW);
            GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always>::GetRHI();
            GraphicsPSOInit.PrimitiveType = bUsePrimitiveShader ? PT_PointList : PT_TriangleList;
            GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GEmptyVertexDeclaration.VertexDeclarationRHI;
            GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
            GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();

            SetGraphicsPipelineState( RHICmdList, GraphicsPSOInit );
            
            SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), RasterPassParameters->Common);
            SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *RasterPassParameters);

            RHICmdList.SetStreamSource( 0, nullptr, 0 );
            // 注意呼叫的是Indirect型別的介面, 並且IndirectArgs就是AddPass_InstanceHierarchyAndClusterCull的結果.
            RHICmdList.DrawPrimitiveIndirect(RasterPassParameters->Common.IndirectArgs->GetIndirectRHICallBuffer(), 16);
            RHICmdList.EndRenderPass();
        });
    }

    // 軟體光柵化(用Compute Shader計算).
    if (Scheduling != ERasterScheduling::HardwareOnly)
    {
        // 處理軟體光柵化CS的引數.
        FMicropolyRasterizeCS::FPermutationDomain PermutationVectorCS;
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FAddClusterOffset>(bMainPass ? 0 : 1);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FMultiViewDim>(bMultiView);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FHasPrevDrawData>(bHavePrevDrawData);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FRasterTechniqueDim>(int32(Technique));
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FNearClipDim>(bNearClip);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
        PermutationVectorCS.Set<FMicropolyRasterizeCS::FClusterPerPageDim>(GNaniteClusterPerPage&& VirtualShadowMapArray != nullptr);

        auto ComputeShader = RasterContext.ShaderMap->GetShader<FMicropolyRasterizeCS>(PermutationVectorCS);

        // 派發呼叫, 光柵化的資料和引數在CommonPassParameters內.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            bMainPass ? RDG_EVENT_NAME("Main Pass: Rasterize") : RDG_EVENT_NAME("Post Pass: Rasterize"),
            ComputePassFlags,
            ComputeShader,
            CommonPassParameters,
            CommonPassParameters->IndirectArgs,
            0);
    }
}

為了更進一步探查硬體光柵化和軟體光柵化的過程,有必要進入它們的shader邏輯進行分析:

// Engine\Shaders\Private\Nanite\Rasterizer.usf

(......)

// 光柵化三角形(用於軟體光柵化)
void RasterizeTri(
    FNaniteView NaniteView,
    int4 ViewRect,
    uint PixelValue,
#if VISUALIZE
    uint2 VisualizeValues,
#endif
    float3 Verts[3],
    bool bUsePageTable )
{
    float3 v01 = Verts[1] - Verts[0];
    float3 v02 = Verts[2] - Verts[0];

    // 背面剔除
    float DetXY = v01.x * v02.y - v01.y * v02.x;
    if( DetXY >= 0.0f )
    {
        return;
    }

    float InvDet = rcp( DetXY );
    float2 GradZ;
    GradZ.x = ( v01.z * v02.y - v01.y * v02.z ) * InvDet;
    GradZ.y = ( v01.x * v02.z - v01.z * v02.x ) * InvDet;

    // 16.8定點數
    float2 Vert0 = Verts[0].xy;
    float2 Vert1 = Verts[1].xy;
    float2 Vert2 = Verts[2].xy;

    // 矩形包圍盒
    const float2 MinSubpixel = min3( Vert0, Vert1, Vert2 );
    const float2 MaxSubpixel = max3( Vert0, Vert1, Vert2 );

    // 四捨五入到最近畫素
    int2 MinPixel = (int2)floor( ( MinSubpixel + (SUBPIXEL_SAMPLES / 2) - 1 ) * (1.0 / SUBPIXEL_SAMPLES) );
    int2 MaxPixel = (int2)floor( ( MaxSubpixel - (SUBPIXEL_SAMPLES / 2) - 1 ) * (1.0 / SUBPIXEL_SAMPLES) );

    // 裁剪到檢視.
    MinPixel = max( MinPixel, ViewRect.xy );
    MaxPixel = min( MaxPixel, ViewRect.zw - 1 );
    
    // 裁剪無畫素覆蓋的三角形
    if( any( MinPixel > MaxPixel ) )
        return;

    // 限制光柵化邊界到一個合理的最大值。
    MaxPixel = min( MaxPixel, MinPixel + 63 );

    // 4.8 定點數
    float2 Edge01 = -v01.xy;
    float2 Edge12 = Vert1 - Vert2;
    float2 Edge20 = v02.xy;
    
    // 用MinPixel調整MinPixel的畫素偏移
    // 4.8 fixed point
    // 最大三角形尺寸 = 127x127畫素
    const float2 BaseSubpixel = (float2)MinPixel * SUBPIXEL_SAMPLES + (SUBPIXEL_SAMPLES / 2);
    Vert0 -= BaseSubpixel;
    Vert1 -= BaseSubpixel;
    Vert2 -= BaseSubpixel;

    // 半邊常量
    // 8.16 fixed point
    float C0 = Edge01.y * Vert0.x - Edge01.x * Vert0.y;
    float C1 = Edge12.y * Vert1.x - Edge12.x * Vert1.y;
    float C2 = Edge20.y * Vert2.x - Edge20.x * Vert2.y;

    // 校正填充規則
    // Top left rule for CCW
    C0 -= saturate(Edge01.y + saturate(1.0f - Edge01.x));
    C1 -= saturate(Edge12.y + saturate(1.0f - Edge12.x));
    C2 -= saturate(Edge20.y + saturate(1.0f - Edge20.x));

    float Z0 = Verts[0].z - ( GradZ.x * Vert0.x + GradZ.y * Vert0.y );
    GradZ *= SUBPIXEL_SAMPLES;

    // 計算步進常量, 和SUBPIXEL_SAMPLES相關, SUBPIXEL_SAMPLES越大, 步進越小, 光柵化結果越精準, 但消耗越大.
    float CY0 = C0 * (1.0f / SUBPIXEL_SAMPLES);
    float CY1 = C1 * (1.0f / SUBPIXEL_SAMPLES);
    float CY2 = C2 * (1.0f / SUBPIXEL_SAMPLES);
    float ZY = Z0;

    // 是否使用掃描線
#if COMPILER_SUPPORTS_WAVE_VOTE
    bool bScanLine = WaveActiveAnyTrue( MaxPixel.x - MinPixel.x > 4 );
#else
    bool bScanLine = false;
#endif
    
    if( bScanLine ) // 掃描線演算法.
    {
        float3 Edge012 = { Edge01.y, Edge12.y, Edge20.y };
        bool3 bOpenEdge = Edge012 < 0;
        float3 InvEdge012 = Edge012 == 0 ? 1e8 : rcp( Edge012 );

        int y = MinPixel.y;
        while( true )
        {
            // No longer fixed point
            float3 CrossX = float3( CY0, CY1, CY2 ) * InvEdge012;

            float3 MinX = bOpenEdge ? CrossX : 0;
            float3 MaxX = bOpenEdge ? MaxPixel.x - MinPixel.x : CrossX;

            float x0 = ceil( max3( MinX.x, MinX.y, MinX.z ) );
            float x1 = min3( MaxX.x, MaxX.y, MaxX.z );
            float ZX = ZY + GradZ.x * x0;

            x0 += MinPixel.x;
            x1 += MinPixel.x;
            // 遍歷x方向上的所有畫素, 寫入畫素資料.
            for( float x = x0; x <= x1; x++ )
            {
                // 寫入畫素值和深度值.
                WritePixel(OutVisBuffer64, PixelValue, uint2(x,y), ZX, NaniteView, bUsePageTable);
            #if VISUALIZE
                WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x,y), ZX, NaniteView, bUsePageTable);
                InterlockedAdd(OutDbgBuffer32[uint2(x,y)], VisualizeValues.y);
            #endif

                ZX += GradZ.x;
            }

            if( y >= MaxPixel.y )
                break;

            // 增加Y方向的步進
            CY0 += Edge01.x;
            CY1 += Edge12.x;
            CY2 += Edge20.x;
            ZY += GradZ.y;
            y++;
        }
    }
    else // 非掃描線演算法(矩形框演算法, 需要檢測是否在三角形內部)
    {
        int y = MinPixel.y;

        while (true)
        {
            int x = MinPixel.x;
            // 3個都是正數, 說明在三角形內.
            if (min3(CY0, CY1, CY2) >= 0)
            {
                WritePixel(OutVisBuffer64, PixelValue, uint2(x, y), ZY, NaniteView, bUsePageTable);
            #if VISUALIZE
                WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x, y), ZY, NaniteView, bUsePageTable);
                InterlockedAdd(OutDbgBuffer32[uint2(x, y)], VisualizeValues.y);
            #endif
            }

            if (x < MaxPixel.x)
            {
                float CX0 = CY0 - Edge01.y;
                float CX1 = CY1 - Edge12.y;
                float CX2 = CY2 - Edge20.y;
                float ZX = ZY + GradZ.x;
                x++;

                HOIST_DESCRIPTORS
                while (true)
                {
                    if (min3(CX0, CX1, CX2) >= 0)
                    {
                        WritePixel(OutVisBuffer64, PixelValue, uint2(x, y), ZX, NaniteView, bUsePageTable);
                    #if VISUALIZE
                        WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x, y), ZX, NaniteView, bUsePageTable);
                        InterlockedAdd(OutDbgBuffer32[uint2(x, y)], VisualizeValues.y);
                    #endif
                    }

                    if (x >= MaxPixel.x)
                        break;

                    CX0 -= Edge01.y;
                    CX1 -= Edge12.y;
                    CX2 -= Edge20.y;
                    ZX += GradZ.x;
                    x++;
                }
            }

            if (y >= MaxPixel.y)
                break;

            CY0 += Edge01.x;
            CY1 += Edge12.x;
            CY2 += Edge20.x;
            ZY += GradZ.y;
            y++;
        }
    }
}

#if USE_CONSTRAINED_CLUSTERS
groupshared float3 GroupVerts[256];
#else
groupshared float3 GroupVerts[384];
#endif

// 檢測裁剪模式, 模式是順時針(CW). 如果返回true, 需要逆時針(CCW).
bool ReverseWindingOrder(FInstanceSceneData InstanceData)
{
    bool bReverseInstanceCull = (InstanceData.InvNonUniformScaleAndDeterminantSign.w < 0.0f);
    bool bRasterStateReverseCull = (RenderFlags & RENDER_FLAG_REVERSE_CULLING);
    
    // Logical XOR
    return (bReverseInstanceCull != bRasterStateReverseCull);
}

StructuredBuffer< uint2 >    InTotalPrevDrawClusters;
Buffer<uint>                InClusterOffsetSWHW;

groupshared float4x4 LocalToSubpixelLDS;

// 微表面光柵化, 用於Nanite的CS軟光柵.
[numthreads(128, 1, 1)]
void MicropolyRasterize(
    uint    VisibleIndex    : SV_GroupID,
    uint    GroupIndex        : SV_GroupIndex) 
{
    // 計算可見索引.
#if HAS_PREV_DRAW_DATA
    VisibleIndex += InTotalPrevDrawClusters[0].x;
#endif
#if ADD_CLUSTER_OFFSET
    VisibleIndex += InClusterOffsetSWHW[0];
#endif

    // 獲取可見的Cluster和例項資料.
    FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
    FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );
    // 獲取Nanite檢視.
    FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );

    // 獲取頁面資訊.
#if CLUSTER_PER_PAGE
    // Scalar
    uint2 vPage = VisibleCluster.vPage;
    FShadowPhysicalPage pPage = ShadowGetPhysicalPage( CalcPageTableLevelOffset( NaniteView.TargetLayerIndex, NaniteView.TargetMipLevel ) + CalcPageOffsetInLevel( NaniteView.TargetMipLevel, vPage ) );
#endif

    float4x4 LocalToSubpixel;

    // InstancedDynamicData是Group不變的, 所以只需計算一次, 然後儲存在groupshared的變數中以供後續使用.
    if( GroupIndex == 0 )
    {
        LocalToSubpixel = CalculateInstanceDynamicData(NaniteView, InstanceData).LocalToClip;
        
        float2 Scale = float2( 0.5, -0.5 ) * NaniteView.ViewSizeAndInvSize.xy * SUBPIXEL_SAMPLES;
        float2 Bias = ( 0.5 * NaniteView.ViewSizeAndInvSize.xy + NaniteView.ViewRect.xy ) * SUBPIXEL_SAMPLES + 0.5f;

#if CLUSTER_PER_PAGE
        Bias += ( (float2)pPage.PageIndex - (float2)vPage ) * VSM_PAGE_SIZE * SUBPIXEL_SAMPLES;
#endif

        LocalToSubpixel._m00_m10_m20_m30 = LocalToSubpixel._m00_m10_m20_m30 * Scale.x + LocalToSubpixel._m03_m13_m23_m33 * Bias.x;
        LocalToSubpixel._m01_m11_m21_m31 = LocalToSubpixel._m01_m11_m21_m31 * Scale.y + LocalToSubpixel._m03_m13_m23_m33 * Bias.y;

        LocalToSubpixelLDS = LocalToSubpixel;
    }
    
    // 使用Group記憶體屏障以同步Group資料.
    GroupMemoryBarrierWithGroupSync();
    LocalToSubpixel = LocalToSubpixelLDS;

    // 獲取Cluster資料.
    FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);

    UNROLL
    for( uint i = 0; i < 2; i++ )
    {
        uint VertIndex = GroupIndex + i * 128;
        if( VertIndex < Cluster.NumVerts )
        {
            // 變換頂點, 且保持到組間共享記憶體中.
            float3 PointLocal = DecodePosition( VertIndex, Cluster );
            float4 PointClipSubpixel = mul( float4( PointLocal, 1 ), LocalToSubpixel );
            float3 Subpixel = PointClipSubpixel.xyz / PointClipSubpixel.w;
            GroupVerts[ VertIndex ] = float3(floor(Subpixel.xy), Subpixel.z);
        }
    }
    
    // 使用Group記憶體屏障以同步Group資料.
    GroupMemoryBarrierWithGroupSync();

    int4 ViewRect = NaniteView.ViewRect;

#if CLUSTER_PER_PAGE
    ViewRect.xy = pPage.PageIndex * VSM_PAGE_SIZE;
    ViewRect.zw = ViewRect.xy + VSM_PAGE_SIZE;
#endif

    if (GroupIndex < Cluster.NumTris)
    {
        // 三角形ID就是Group索引.
        uint TriangleID = GroupIndex;
        // 生成三角形索引, 同時處理需要翻轉的情況.
        uint3 TriangleIndices = ReadTriangleIndices(Cluster, TriangleID);
        if (ReverseWindingOrder(InstanceData))
        {
            TriangleIndices = uint3(TriangleIndices.x, TriangleIndices.z, TriangleIndices.y);
        }

        // 獲取三角形位置.
        float3 Vertices[3];
        Vertices[0] = GroupVerts[TriangleIndices.x];
        Vertices[1] = GroupVerts[TriangleIndices.y];
        Vertices[2] = GroupVerts[TriangleIndices.z];

        // 畫素值就是三角形ID.
        uint PixelValue = ((VisibleIndex + 1) << 7) | TriangleID;

        // 光柵化該三角形, 寫入對應的id和深度.
        RasterizeTri(
            NaniteView,
            ViewRect,
            PixelValue,
        #if VISUALIZE
            GetVisualizeValues(),
        #endif
            Vertices,
            !CLUSTER_PER_PAGE );
    }
}

#define PIXEL_VALUE                    (RASTER_TECHNIQUE != RASTER_TECHNIQUE_DEPTHONLY)
#define VERTEX_TO_TRIANGLE_MASKS    (NANITE_PRIM_SHADER && PIXEL_VALUE)

struct VSOut
{
    noperspective  float DeviceZ            : TEXCOORD0;
#if PIXEL_VALUE
    nointerpolation uint PixelValue            : TEXCOORD1;
#endif
#if NANITE_MULTI_VIEW
    nointerpolation int4 ViewRect            : TEXCOORD2;
#endif
#if VISUALIZE
    nointerpolation uint2 VisualizeValues    : TEXCOORD3;
#endif
#if VIRTUAL_TEXTURE_TARGET
    nointerpolation int ViewId                : TEXCOORD4;
#endif
#if VERTEX_TO_TRIANGLE_MASKS
    CUSTOM_INTERPOLATION uint4 ToTriangleMasks    : TEXCOORD5;
#endif
    float4 Position                            : SV_Position;
};

// 硬體光柵化的VS, 主要是將頂點資料從Cluster中解壓出來, 然後變換到裁剪空間.
VSOut CommonRasterizerVS(FNaniteView NaniteView, FInstanceSceneData InstanceData, FVisibleCluster VisibleCluster, FCluster Cluster, uint VertIndex, out float4 PointClipNoScaling)
{
    VSOut Out;

    float4x4 LocalToWorld = InstanceData.LocalToWorld;

    float3 PointLocal = DecodePosition( VertIndex, Cluster );
    float3 PointRotated = LocalToWorld[0].xyz * PointLocal.xxx + LocalToWorld[1].xyz * PointLocal.yyy + LocalToWorld[2].xyz * PointLocal.zzz;
    float3 PointTranslatedWorld = PointRotated + (LocalToWorld[3].xyz + NaniteView.PreViewTranslation.xyz);
    float4 PointClip = mul( float4( PointTranslatedWorld, 1 ), NaniteView.TranslatedWorldToClip );
    PointClipNoScaling = PointClip;
#if CLUSTER_PER_PAGE
    PointClip.xy = NaniteView.ClipSpaceScaleOffset.xy * PointClip.xy + NaniteView.ClipSpaceScaleOffset.zw * PointClip.w;

    // Offset 0,0 to be at vPage for a 0, VSM_PAGE_SIZE * VSM_RASTER_WINDOW_PAGES viewport.
    PointClip.xy += PointClip.w * ( float2(-2, 2) / VSM_RASTER_WINDOW_PAGES ) * VisibleCluster.vPage;

    Out.ViewRect.xy = VisibleCluster.vPage * VSM_PAGE_SIZE;
    Out.ViewRect.zw = NaniteView.ViewRect.zw;
#elif NANITE_MULTI_VIEW
    PointClip.xy = NaniteView.ClipSpaceScaleOffset.xy * PointClip.xy + NaniteView.ClipSpaceScaleOffset.zw * PointClip.w;
    Out.ViewRect = NaniteView.ViewRect;
#endif
#if VIRTUAL_TEXTURE_TARGET
    Out.ViewId = VisibleCluster.ViewId;
#endif
    Out.Position = PointClip;
    Out.DeviceZ = PointClip.z / PointClip.w;

    // Shader workaround to avoid HW depth clipping. Should be replaced with rasterizer state ideally.
#if !NEAR_CLIP
    Out.Position.z = 0.5f * Out.Position.w;
#endif

#if VISUALIZE
    Out.VisualizeValues = GetVisualizeValues();
#endif
    return Out;
}

#if NANITE_PRIM_SHADER

#pragma argument(wavemode=wave64)
#pragma argument(realtypes)

struct PrimitiveInput
{
    uint Index        : PRIM_SHADER_SEM_VERT_INDEX;
    uint WaveIndex    : PRIM_SHADER_SEM_WAVE_INDEX;
};

struct PrimitiveOutput
{
    VSOut Out;

    uint PrimExport    : PRIM_SHADER_SEM_PRIM_EXPORT;
    uint VertCount    : PRIM_SHADER_SEM_VERT_COUNT;
    uint PrimCount    : PRIM_SHADER_SEM_PRIM_COUNT;
};

// 壓縮三角形索引, 其中x,y,z的位數是10,10,12.
uint PackTriangleExport(uint3 TriangleIndices)
{
    return TriangleIndices.x | (TriangleIndices.y << 10) | (TriangleIndices.z << 20);
}

// 解壓三角形索引.
uint3 UnpackTriangleExport(uint Packed)
{
    const uint Index0 = (Packed & 0x3FF); // 提取前10位.
    const uint Index1 = (Packed >> 10) & 0x3FF; // 提取中間10位
    const uint Index2 = (Packed >> 20); // 提取後12位.
    return uint3(Index0, Index1, Index2);
}

#if VERTEX_TO_TRIANGLE_MASKS // 三角形掩碼渲染模式.
groupshared uint GroupVertexToTriangleMasks[256][4];
#endif
groupshared uint GroupTriangleCount;
groupshared uint GroupVertexCount;
groupshared uint GroupClusterIndex;

PRIM_SHADER_OUTPUT_TRIANGLES
PRIM_SHADER_PRIM_COUNT(1)
PRIM_SHADER_VERT_COUNT(1)
PRIM_SHADER_VERT_LIMIT(256)
PRIM_SHADER_AMP_FACTOR(128)
PRIM_SHADER_AMP_ENABLE

// 硬體光柵化VS入口(三角形掩碼渲染模式).
PrimitiveOutput HWRasterizeVS(PrimitiveInput Input)
{
    const uint LaneIndex = WaveGetLaneIndex();
    const uint LaneCount = WaveGetLaneCount();

    const uint GroupThreadID = LaneIndex + Input.WaveIndex * LaneCount;

    if (GroupThreadID == 0)
    {
        // Input index is only initialized for lane 0, so we need to manually communicate it to all other threads in subgroup (not just wavefront).
        GroupClusterIndex = Input.Index;
    }
    
    GroupMemoryBarrierWithGroupSync();

    // 下面的程式碼和MicropolyRasterize型別, 省略之.
    uint VisibleIndex = GroupClusterIndex;
#if HAS_PREV_DRAW_DATA
    VisibleIndex += InTotalPrevDrawClusters[0].y;
#endif
#if ADD_CLUSTER_OFFSET
    VisibleIndex += InClusterOffsetSWHW[GetHWClusterCounterIndex(RenderFlags)];
#endif
    VisibleIndex = (MaxVisibleClusters - 1) - VisibleIndex;

    // Should be all scalar.
    FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
    FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );
    FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );

    FInstanceDynamicData InstanceDynamicData = CalculateInstanceDynamicData(NaniteView, InstanceData);

    FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);

#if VERTEX_TO_TRIANGLE_MASKS
    if (GroupThreadID < Cluster.NumVerts)
    {
        GroupVertexToTriangleMasks[GroupThreadID][0] = 0;
        GroupVertexToTriangleMasks[GroupThreadID][1] = 0;
        GroupVertexToTriangleMasks[GroupThreadID][2] = 0;
        GroupVertexToTriangleMasks[GroupThreadID][3] = 0;
    }
#endif

    GroupMemoryBarrierWithGroupSync();

    PrimitiveOutput PrimOutput;
    PrimOutput.VertCount = Cluster.NumVerts;
    PrimOutput.PrimCount = Cluster.NumTris;

    bool bCullTriangle = false;

    if (GroupThreadID < Cluster.NumTris)
    {
        uint TriangleID = GroupThreadID;
        uint3 TriangleIndices = ReadTriangleIndices(Cluster, TriangleID);
        if (ReverseWindingOrder(InstanceData))
        {
            TriangleIndices = uint3(TriangleIndices.x, TriangleIndices.z, TriangleIndices.y);
        }

#if VERTEX_TO_TRIANGLE_MASKS
        const uint DwordIndex   = (GroupThreadID >> 5) & 3;
        const uint TriangleMask = 1 << (GroupThreadID & 31);
        InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.x][DwordIndex], TriangleMask);
        InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.y][DwordIndex], TriangleMask);
        InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.z][DwordIndex], TriangleMask);
#endif
        PrimOutput.PrimExport = PackTriangleExport(TriangleIndices);
    }

    GroupMemoryBarrierWithGroupSync();

    if (GroupThreadID < Cluster.NumVerts)
    {
        float4 PointClipNoScaling;
        // 光柵化三角形.
        PrimOutput.Out = CommonRasterizerVS(NaniteView, InstanceData, VisibleCluster, Cluster, GroupThreadID, PointClipNoScaling);
#if VERTEX_TO_TRIANGLE_MASKS
        PrimOutput.Out.PixelValue = ((VisibleIndex + 1) << 7);
        PrimOutput.Out.ToTriangleMasks = uint4(GroupVertexToTriangleMasks[GroupThreadID][0],
                                               GroupVertexToTriangleMasks[GroupThreadID][1],
                                               GroupVertexToTriangleMasks[GroupThreadID][2],
                                               GroupVertexToTriangleMasks[GroupThreadID][3]);
#endif
    }

    return PrimOutput;
}

#else // NANITE_PRIM_SHADER(Nanite圖元著色模式)

// 硬體光柵化VS入口(圖元著色模式).
VSOut HWRasterizeVS(
    uint VertexID        : SV_VertexID,
    uint VisibleIndex    : SV_InstanceID
    )
{
#if HAS_PREV_DRAW_DATA
    VisibleIndex += InTotalPrevDrawClusters[0].y;
#endif

#if ADD_CLUSTER_OFFSET
    VisibleIndex += InClusterOffsetSWHW[GetHWClusterCounterIndex(RenderFlags)];
#endif
    VisibleIndex = (MaxVisibleClusters - 1) - VisibleIndex;

    uint TriIndex = VertexID / 3;
    VertexID = VertexID - TriIndex * 3;

    VSOut Out;
    Out.Position = float4(0,0,0,1);
    Out.DeviceZ = 0.0f;

    FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
    FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );

    FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );
    FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);

    if( TriIndex < Cluster.NumTris )
    {
        uint3 TriangleIndices = ReadTriangleIndices( Cluster, TriIndex );
        if( ReverseWindingOrder( InstanceData ) )
        {
            TriangleIndices = uint3( TriangleIndices.x, TriangleIndices.z, TriangleIndices.y );
        }

        uint VertIndex = TriangleIndices[ VertexID ];
        float4 PointClipNoScaling;
        // 光柵化三角形.
        Out = CommonRasterizerVS(NaniteView, InstanceData, VisibleCluster, Cluster, VertIndex, PointClipNoScaling);
    #if PIXEL_VALUE
        Out.PixelValue  = ((VisibleIndex + 1) << 7) | TriIndex;
    #endif
    }

    return Out;
}

#endif // NANITE_PRIM_SHADER

// 硬體光柵化的PS入口.
void HWRasterizePS(VSOut In)
{
    uint2 PixelPos = (uint2)In.Position.xy;

    uint PixelValue = 0;
#if PIXEL_VALUE
    PixelValue = In.PixelValue;
#endif

#if VERTEX_TO_TRIANGLE_MASKS
    uint4 Masks0 = LoadParameterCacheP0( In.ToTriangleMasks );
    uint4 Masks1 = LoadParameterCacheP1( In.ToTriangleMasks );
    uint4 Masks2 = LoadParameterCacheP2( In.ToTriangleMasks );

    uint4 Masks = Masks0 & Masks1 & Masks2;
    uint TriangleIndex =    Masks.x ? firstbitlow( Masks.x ) :
                            Masks.y ? firstbitlow( Masks.y ) + 32 :
                            Masks.z ? firstbitlow( Masks.z ) + 64 :
                            firstbitlow( Masks.w ) + 96;

    PixelValue += TriangleIndex;
#endif

#if VIRTUAL_TEXTURE_TARGET
    FNaniteView NaniteView = GetNaniteView(In.ViewId);
#else
    FNaniteView NaniteView;
#endif

#if CLUSTER_PER_PAGE
    PixelPos += In.ViewRect.xy;
    if (all(PixelPos < In.ViewRect.zw))
#elif NANITE_MULTI_VIEW
    // In multi-view mode every view has its own scissor, so we have to scissor manually.
    if (all(PixelPos >= In.ViewRect.xy && PixelPos < In.ViewRect.zw))
#endif
    {
        // 寫入畫素資料: 三角形id(PixelValue), 深度(In.DeviceZ)
        WritePixel(OutVisBuffer64, PixelValue, PixelPos, In.DeviceZ, NaniteView, VIRTUAL_TEXTURE_TARGET);
    #if VISUALIZE
        WritePixel(OutDbgBuffer64, In.VisualizeValues.x, PixelPos, In.DeviceZ, NaniteView, VIRTUAL_TEXTURE_TARGET);
        InterlockedAdd(OutDbgBuffer32[PixelPos], In.VisualizeValues.y);
    #endif
    }
}

從上面的分析可知,無論是軟體光柵還是硬體光柵,寫入資料的只有ClusterID、三角形ID和深度(如果是視覺化模式還有其它資料),也就是說此階段並沒有真正地著色,而是類似於延遲渲染的BasePass,但輸出的資訊遠沒有BasePass的多,由此引數的IO、視訊記憶體都顯著降低。其實這個技術就是Visibility Buffer技術,具體可以參見剖析虛幻渲染體系(04)- 延遲渲染管線的小節4.2.3.5 Visibility Buffer

Nanite光柵化後的儲存結構:ClusterID佔25位,三角形ID佔7位,深度佔32位。

Nanite光柵化後的結果示意圖,從上到下依次是ClusterID、三角形ID、深度。

在Nanite光柵化之後,還有個重要的步驟是Nanite::EmitDepthTargets,它的作用在於場景的深度、模板、速度、材質深度等緩衝資料:

其中模板緩衝表示哪些畫素是被Nanite渲染的:

而最有意思的是材質深度,表明位於場景最前面的每個畫素被哪個材質所覆蓋,本質上是一個轉換為唯一深度值並儲存在深度模板紋理中的材質ID。實際上,每種材質都有一個灰度值,以便後續將利用Early Z進行優化。

6.4.3.6 Nanite BasePass

本小節主要闡述Nanite的BasePass對GBuffer的生成。其在FDeferredShadingSceneRenderer::Render主流程如下:

void FDeferredShadingSceneRenderer::Render(FRDGBuilder& GraphBuilder)
{
    (......)
    
    // 渲染Nanite的BasePass.
    {
        // 繪製普通模式的BasePass.
        RenderBasePass(GraphBuilder, SceneTextures, DBufferTextures, BasePassDepthStencilAccess, ForwardScreenSpaceShadowMaskTexture, InstanceCullingManager);
        AddServiceLocalQueuePass(GraphBuilder);
        
        if (bNaniteEnabled && bShouldApplyNaniteMaterials)
        {
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ++ViewIndex)
            {
                const FViewInfo& View = Views[ViewIndex];
                Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];

                // 如果沒有提前繪製深度, 則現在繪製深度
                if (!bNeedsPrePass)
                {
                    Nanite::EmitDepthTargets(
                        GraphBuilder,
                        *Scene,
                        Views[ViewIndex],
                        RasterResults.SOAStrides,
                        RasterResults.VisibleClustersSWHW,
                        RasterResults.ViewsBuffer,
                        SceneTextures.Depth.Target,
                        RasterResults.VisBuffer64,
                        RasterResults.MaterialDepth,
                        RasterResults.NaniteMask,
                        RasterResults.VelocityBuffer,
                        bNeedsPrePass
                    );
                }

                // 繪製Nanite模式的BasePass.
                Nanite::DrawBasePass(
                    GraphBuilder,
                    SceneTextures,
                    DBufferTextures,
                    *Scene,
                    View,
                    RasterResults
                );
            }
        }

        // 解析場景深度.
        if (!bAllowReadOnlyDepthBasePass)
        {
            AddResolveSceneDepthPass(GraphBuilder, Views, SceneTextures.Depth);
        }

        (......)
    }
    
    (......)
}

需要注意的是,上面有兩次BasePass的繪製:一次是傳統的BasePass繪製RenderBasePass,另一次是Nanite模式的BasePass繪製Nanite::DrawBasePass。下面是Nanite::DrawBasePass的解析:

// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp

void DrawBasePass(
    FRDGBuilder& GraphBuilder,
    const FSceneTextures& SceneTextures,
    const FDBufferTextures& DBufferTextures,
    const FScene& Scene,
    const FViewInfo& View,
    const FRasterResults& RasterResults
)
{
    (......)
    
    RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BasePass");

    const int32 ViewWidth        = View.ViewRect.Max.X - View.ViewRect.Min.X;
    const int32 ViewHeight        = View.ViewRect.Max.Y - View.ViewRect.Min.Y;
    const FIntPoint ViewSize    = FIntPoint(ViewWidth, ViewHeight);

    const FRDGSystemTextures& SystemTextures = FRDGSystemTextures::Get(GraphBuilder);

    FRenderTargetBindingSlots GBufferRenderTargets;
    SceneTextures.GetGBufferRenderTargets(ERenderTargetLoadAction::ELoad, GBufferRenderTargets);

    // 初始化紋理引用.
    FRDGTextureRef MaterialDepth    = RasterResults.MaterialDepth ? RasterResults.MaterialDepth : SystemTextures.Black;
    FRDGTextureRef VisBuffer64        = RasterResults.VisBuffer64   ? RasterResults.VisBuffer64   : SystemTextures.Black;
    FRDGTextureRef DbgBuffer64        = RasterResults.DbgBuffer64   ? RasterResults.DbgBuffer64   : SystemTextures.Black;
    FRDGTextureRef DbgBuffer32        = RasterResults.DbgBuffer32   ? RasterResults.DbgBuffer32   : SystemTextures.Black;

    FRDGBufferRef VisibleClustersSWHW    = RasterResults.VisibleClustersSWHW;

    // 檢測材質裁剪模式. 波操作需要SM6才支援,不支援的平臺將切換成4.
    if (!FDataDrivenShaderPlatformInfo::GetSupportsWaveOperations(GMaxRHIShaderPlatform) &&
        (GNaniteMaterialCulling == 1 || GNaniteMaterialCulling == 2))
    {
        UE_LOG(LogNanite, Warning, TEXT("r.Nanite.MaterialCulling set to %d which requires wave-ops (not supported on this platform), switching to mode 4"), GNaniteMaterialCulling);
        GNaniteMaterialCulling = 4;
    }

    // 使用區域性賦值, 可以不用修改全部檢視達到覆蓋的目的.
    int32 NaniteMaterialCulling = GNaniteMaterialCulling;
    if ((NaniteMaterialCulling == 1 || NaniteMaterialCulling == 2) && (View.ViewRect.Min.X != 0 || View.ViewRect.Min.Y != 0))
    {
        NaniteMaterialCulling = 4;

        static bool bLoggedAlready = false;
        if (!bLoggedAlready)
        {
            bLoggedAlready = true;
            UE_LOG(LogNanite, Warning, TEXT("View has non-zero viewport offset, using material culling mode 4 (overrides r.Nanite.MaterialCulling = %d)."), GNaniteMaterialCulling);
        }
    }

    // 位掩碼裁剪
    const bool b32BitMaskCulling = (NaniteMaterialCulling == 1 || NaniteMaterialCulling == 2);
    // 分塊裁剪
    const bool bTileGridCulling  = (NaniteMaterialCulling == 3 || NaniteMaterialCulling == 4);

    const FIntPoint TileGridDim = bTileGridCulling ? FMath::DivideAndRoundUp(ViewSize, { 64, 64 }) : FIntPoint(1, 1);

    // 建立紋理和緩衝.
    FRDGBufferDesc     VisibleMaterialsDesc    = FRDGBufferDesc::CreateStructuredDesc(4, b32BitMaskCulling ? FNaniteCommandInfo::MAX_STATE_BUCKET_ID+1 : 1);
    FRDGBufferRef      VisibleMaterials        = GraphBuilder.CreateBuffer(VisibleMaterialsDesc, TEXT("Nanite.VisibleMaterials"));
    FRDGBufferUAVRef   VisibleMaterialsUAV    = GraphBuilder.CreateUAV(VisibleMaterials);
    FRDGTextureDesc    MaterialRangeDesc    = FRDGTextureDesc::Create2D(TileGridDim, PF_R32G32_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
    FRDGTextureRef     MaterialRange        = GraphBuilder.CreateTexture(MaterialRangeDesc, TEXT("Nanite.MaterialRange"));
    FRDGTextureUAVRef  MaterialRangeUAV        = GraphBuilder.CreateUAV(MaterialRange);
    FRDGTextureSRVDesc MaterialRangeSRVDesc    = FRDGTextureSRVDesc::Create(MaterialRange);
    FRDGTextureSRVRef  MaterialRangeSRV        = GraphBuilder.CreateSRV(MaterialRangeSRVDesc);

    // 清理紋理緩衝
    AddClearUAVPass(GraphBuilder, VisibleMaterialsUAV, 0);
    AddClearUAVPass(GraphBuilder, MaterialRangeUAV, { 0u, 1u, 0u, 0u });

    // 分類材質以分塊裁剪
    if (b32BitMaskCulling || bTileGridCulling)
    {
        FClassifyMaterialsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FClassifyMaterialsCS::FParameters>();
        PassParameters->View                    = View.ViewUniformBuffer;
        PassParameters->VisibleClustersSWHW        = GraphBuilder.CreateSRV(VisibleClustersSWHW);
        PassParameters->SOAStrides                = RasterResults.SOAStrides;
        PassParameters->ClusterPageData            = Nanite::GStreamingManager.GetClusterPageDataSRV();
        PassParameters->ClusterPageHeaders        = Nanite::GStreamingManager.GetClusterPageHeadersSRV();
        PassParameters->VisBuffer64                = VisBuffer64;
        PassParameters->MaterialDepthTable        = Scene.MaterialTables[ENaniteMeshPass::BasePass].GetDepthTableSRV();

        uint32 DispatchGroupSize = 0;

        PassParameters->ViewRect = FIntVector4(View.ViewRect.Min.X, View.ViewRect.Min.Y, View.ViewRect.Max.X, View.ViewRect.Max.Y);
        if (b32BitMaskCulling)
        {
            checkf(View.ViewRect.Min.X == 0 && View.ViewRect.Min.Y == 0, TEXT("Viewport offset support is not implemented."));
            DispatchGroupSize = 8;
            PassParameters->VisibleMaterials = VisibleMaterialsUAV;

        }
        else if (bTileGridCulling)
        {
            DispatchGroupSize = 64;
            PassParameters->FetchClamp = View.ViewRect.Max - 1;
            PassParameters->MaterialRange = MaterialRangeUAV;
        }

        const FIntVector DispatchDim = FComputeShaderUtils::GetGroupCount(View.ViewRect.Max - View.ViewRect.Min, DispatchGroupSize);

        FClassifyMaterialsCS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FClassifyMaterialsCS::FCullingMethodDim>(NaniteMaterialCulling);
        auto ComputeShader = View.ShaderMap->GetShader<FClassifyMaterialsCS>(PermutationVector.ToDimensionValueId());

        // 分類材質的CS Pass.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("Classify Materials"),
            ComputeShader,
            PassParameters,
            DispatchDim
        );
    }

    // 渲染GBuffer.
    {
        // 處理Pass資料
        FNaniteEmitGBufferParameters* PassParameters = GraphBuilder.AllocParameters<FNaniteEmitGBufferParameters>();

        PassParameters->SOAStrides    = RasterResults.SOAStrides;
        PassParameters->MaxVisibleClusters    = RasterResults.MaxVisibleClusters;
        PassParameters->MaxNodes    = RasterResults.MaxNodes;
        PassParameters->RenderFlags    = RasterResults.RenderFlags;
            
        PassParameters->ClusterPageData        = Nanite::GStreamingManager.GetClusterPageDataSRV(); 
        PassParameters->ClusterPageHeaders    = Nanite::GStreamingManager.GetClusterPageHeadersSRV();

        PassParameters->VisibleClustersSWHW = GraphBuilder.CreateSRV(VisibleClustersSWHW);

        PassParameters->MaterialRange = MaterialRange;
        PassParameters->VisibleMaterials = GraphBuilder.CreateSRV(VisibleMaterials, PF_R32_UINT);

        PassParameters->VisBuffer64 = VisBuffer64; // 可見性
        PassParameters->DbgBuffer64 = DbgBuffer64;
        PassParameters->DbgBuffer32 = DbgBuffer32;
        PassParameters->RenderTargets = GBufferRenderTargets; // 渲染紋理

        // Uniform Buffer
        PassParameters->View = View.ViewUniformBuffer; // To get VTFeedbackBuffer
        PassParameters->BasePass = CreateOpaqueBasePassUniformBuffer(GraphBuilder, View, 0, {}, DBufferTextures, nullptr);

        switch (NaniteMaterialCulling)
        {
        // 使用8x4的格子渲染, 共32bit, 每個bit一個tile.
        case 1:
        case 2:
            PassParameters->GridSize.X = 8;
            PassParameters->GridSize.Y = 4;
            break;

        // 用64x64的畫素分塊渲染.
        case 3:
        case 4:
            PassParameters->GridSize = FMath::DivideAndRoundUp(View.ViewRect.Max - View.ViewRect.Min, { 64, 64 });
            break;

        // 使用全屏方塊渲染.
        default:
            PassParameters->GridSize.X = 1;
            PassParameters->GridSize.Y = 1;
            break;
        }

        const FExclusiveDepthStencil MaterialDepthStencil = UseComputeDepthExport()
            ? FExclusiveDepthStencil::DepthWrite_StencilNop
            : FExclusiveDepthStencil::DepthWrite_StencilWrite;

        PassParameters->RenderTargets.DepthStencil = FDepthStencilBinding(
            MaterialDepth,
            ERenderTargetLoadAction::ELoad,
            ERenderTargetLoadAction::ELoad,
            MaterialDepthStencil
        );

        TShaderMapRef<FNaniteMaterialVS> NaniteVertexShader(View.ShaderMap);

        // 增加渲染pass.
        GraphBuilder.AddPass(
            RDG_EVENT_NAME("Emit GBuffer"),
            PassParameters,
            ERDGPassFlags::Raster,
            [PassParameters, &Scene, NaniteVertexShader, ViewRect = View.ViewRect, NaniteMaterialCulling](FRHICommandListImmediate& RHICmdList)
        {
            RHICmdList.SetViewport(ViewRect.Min.X, ViewRect.Min.Y, 0.0f, ViewRect.Max.X, ViewRect.Max.Y, 1.0f);

            // 處理全域性緩衝引數.
            FNaniteUniformParameters UniformParams;
            UniformParams.SOAStrides = PassParameters->SOAStrides;
            UniformParams.MaxVisibleClusters= PassParameters->MaxVisibleClusters;
            UniformParams.MaxNodes = PassParameters->MaxNodes;
            UniformParams.RenderFlags = PassParameters->RenderFlags;

            UniformParams.MaterialConfig.X = NaniteMaterialCulling;
            UniformParams.MaterialConfig.Y = PassParameters->GridSize.X;
            UniformParams.MaterialConfig.Z = PassParameters->GridSize.Y;
            UniformParams.MaterialConfig.W = 0;

            UniformParams.RectScaleOffset = FVector4(1.0f, 1.0f, 0.0f, 0.0f); // Render a rect that covers the entire screen

            // 材質裁剪模式
            if (NaniteMaterialCulling == 3 || NaniteMaterialCulling == 4)
            {
                FIntPoint ScaledSize = PassParameters->GridSize * 64;
                UniformParams.RectScaleOffset.X = float(ScaledSize.X) / float(ViewRect.Max.X - ViewRect.Min.X);
                UniformParams.RectScaleOffset.Y = float(ScaledSize.Y) / float(ViewRect.Max.Y - ViewRect.Min.Y);
            }

            // Cluster頁面及可見性資料
            UniformParams.ClusterPageData = PassParameters->ClusterPageData;
            UniformParams.ClusterPageHeaders = PassParameters->ClusterPageHeaders;
            UniformParams.VisibleClustersSWHW = PassParameters->VisibleClustersSWHW->GetRHI();

            // 材質資料
            UniformParams.MaterialRange = PassParameters->MaterialRange->GetRHI();
            UniformParams.VisibleMaterials = PassParameters->VisibleMaterials->GetRHI();

            // 可見性資料
            UniformParams.VisBuffer64 = PassParameters->VisBuffer64->GetRHI();
            UniformParams.DbgBuffer64 = PassParameters->DbgBuffer64->GetRHI();
            UniformParams.DbgBuffer32 = PassParameters->DbgBuffer32->GetRHI();
            const_cast<FScene&>(Scene).UniformBuffers.NaniteUniformBuffer.UpdateUniformBufferImmediate(UniformParams);

            FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;

            TArray<FNaniteMaterialPassCommand, SceneRenderingAllocator> NaniteMaterialPassCommands;
            // 構建Nanite材質Pass的命令.
            BuildNaniteMaterialPassCommands(RHICmdList, Scene.NaniteDrawCommands[ENaniteMeshPass::BasePass], NaniteMaterialPassCommands);

            FMeshDrawCommandStateCache StateCache;

            const uint32 TileCount = UniformParams.MaterialConfig.Y * UniformParams.MaterialConfig.Z; // (W * H)
            // 遍歷所有材質通道命令, 逐個提交.
            for (auto CommandsIt = NaniteMaterialPassCommands.CreateConstIterator(); CommandsIt; ++CommandsIt)
            {
                SubmitNaniteMaterialPassCommand(*CommandsIt, NaniteVertexShader, GraphicsMinimalPipelineStateSet, TileCount, RHICmdList, StateCache);
            }
        });
    }
}

在渲染BasePass之前,需要執行材質分類Pass,以對材質進行分類(Classify Material),對後續的材質剔除等操作有著重要作用。它用Compute Shader分析全屏的Visibility Buffer,輸出20x12=240的畫素(被稱為材質範圍,格式是R32G32_UINT),每個畫素(材質範圍)對每個分塊表示的64×64區域中出現的材質範圍進行了編碼。它呈現的顏色如下所示:

上面程式碼涉及的Wave Operation翻譯成波操作,是DX的概念,VK至於對應的概念是Subgroup,只有SM6以上才支援。具體可以參見GDC2017的Talk:Wave-Programming-D3D12-Vulkan

上面程式碼構建Nanite材質Pass的繪製指令時的源資料是Scene.NaniteDrawCommands[ENaniteMeshPass::BasePass],該資料是在FPrimitiveSceneInfo::UpdateStaticMeshes時生成的,呼叫堆疊如下:

// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp

void FPrimitiveSceneInfo::UpdateStaticMeshes(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bReAddToDrawLists)
{
    (......)

    if (bReAddToDrawLists)
    {
        CacheMeshDrawCommands(RHICmdList, Scene, SceneInfos);
        // 快取Nanite繪製指令.
        CacheNaniteDrawCommands(RHICmdList, Scene, SceneInfos);
    }
}

void FPrimitiveSceneInfo::CacheNaniteDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos)
{
    (......)
    
    // 遍歷場景的所有圖元場景資訊, 逐個構建Nanite繪製指令.
    for (FPrimitiveSceneInfo* PrimitiveSceneInfo : SceneInfos)
    {
        BuildNaniteDrawCommands(RHICmdList, Scene, PrimitiveSceneInfo);
    }
    
    (......)
}

void BuildNaniteDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, FPrimitiveSceneInfo* PrimitiveSceneInfo)
{
    (......)

    for (int32 MeshPass = 0; MeshPass < ENaniteMeshPass::Num; ++MeshPass)
    {
        FNaniteDrawListContext NaniteDrawListContext(Scene->NaniteDrawCommandLock[MeshPass], Scene->NaniteDrawCommands[MeshPass]);

        // 建立Nanite模式的MeshProcessor.
        FMeshPassProcessor* NaniteMeshProcessor = nullptr;
        switch (MeshPass)
        {
            case ENaniteMeshPass::BasePass:
                NaniteMeshProcessor = CreateNaniteMeshProcessor(Scene, nullptr, &NaniteDrawListContext);
                break;
            case ENaniteMeshPass::LumenCardCapture:
                NaniteMeshProcessor = CreateLumenCardNaniteMeshProcessor(Scene, nullptr, &NaniteDrawListContext);
                break;
            default:
                check(false);
        }

        // 遍歷所有靜態網格, 對支援Nanite渲染的網格構建Nanite繪製指令.
        int32 StaticMeshesCount = PrimitiveSceneInfo->StaticMeshes.Num();
        for (int32 MeshIndex = 0; MeshIndex < StaticMeshesCount; ++MeshIndex)
        {
            FStaticMeshBatchRelevance& MeshRelevance = PrimitiveSceneInfo->StaticMeshRelevances[MeshIndex];
            FStaticMeshBatch& Mesh = PrimitiveSceneInfo->StaticMeshes[MeshIndex];

            if (MeshRelevance.bSupportsNaniteRendering)
            {
                uint64 BatchElementMask = ~0ull;
                // 向MeshProcessor加入網格批次, 後續的步驟跟傳統的類似, 不再追蹤.
                NaniteMeshProcessor->AddMeshBatch(Mesh, BatchElementMask, Proxy);
                FNaniteCommandInfo CommandInfo = NaniteDrawListContext.GetCommandInfoAndReset();
                PrimitiveSceneInfo->NaniteCommandInfos[MeshPass].Add(CommandInfo);
                const uint32 MaterialDepthId = CommandInfo.GetMaterialId();
                const uint32 SectionIndex = Mesh.SegmentIndex;
                PrimitiveSceneInfo->NaniteMaterialIds[MeshPass][SectionIndex] = MaterialDepthId;
            }
        }

        NaniteMeshProcessor->~FMeshPassProcessor();
    }
    
    (......)
}

下面繼續解析Nanite::DrawBasePass的兩個重要介面BuildNaniteMaterialPassCommands和SubmitNaniteMaterialPassCommand:

// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp

// 構建Nanite材質Pass的命令
static void BuildNaniteMaterialPassCommands(
    FRHICommandListImmediate& RHICmdList,
    const FStateBucketMap& NaniteDrawCommands,
    TArray<FNaniteMaterialPassCommand, SceneRenderingAllocator>& OutNaniteMaterialPassCommands)
{
    OutNaniteMaterialPassCommands.Reset(NaniteDrawCommands.Num());
    FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;
    const int32 MaterialSortMode = GNaniteMaterialSortMode;

    // 遍歷所有Nanite繪製指令, 構建對應的FNaniteMaterialPassCommand.
    for (auto& Command : NaniteDrawCommands)
    {
        // 構建FNaniteMaterialPassCommand例項.
        FNaniteMaterialPassCommand PassCommand(Command.Key);

        Experimental::FHashElementId SetId = NaniteDrawCommands.FindId(Command.Key);

        int32 DrawIdx = SetId.GetIndex();
        PassCommand.MaterialDepth = FNaniteCommandInfo::GetDepthId(DrawIdx);

        // 使用渲染狀態的排序鍵值替換原有的.
        if (MaterialSortMode == 2 && GRHISupportsPipelineStateSortKey)
        {
            const FMeshDrawCommand& MeshDrawCommand = Command.Key;
            const FGraphicsMinimalPipelineStateInitializer& MeshPipelineState = MeshDrawCommand.CachedPipelineId.GetPipelineState(GraphicsMinimalPipelineStateSet);
            FGraphicsPipelineState* PipelineState = PipelineStateCache::GetAndOrCreateGraphicsPipelineState(RHICmdList, MeshPipelineState.AsGraphicsPipelineStateInitializer(), EApplyRendertargetOption::DoNothing);
            if (PipelineState)
            {
                const uint64 StateSortKey = PipelineStateCache::RetrieveGraphicsPipelineStateSortKey(PipelineState);
                if (StateSortKey != 0)
                {
                    PassCommand.SortKey = StateSortKey;
                }
            }
        }

        // 新增到命令列表.
        OutNaniteMaterialPassCommands.Emplace(PassCommand);
    }

    // 排序材質.
    if (MaterialSortMode != 0)
    {
        OutNaniteMaterialPassCommands.Sort();
    }
}

// 提交單個材質通道繪製命令
static void SubmitNaniteMaterialPassCommand(
    const FMeshDrawCommand& MeshDrawCommand,
    const float MaterialDepth,
    const TShaderRef<FNaniteMaterialVS>& NaniteVertexShader,
    const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
    const uint32 InstanceFactor,
    FRHICommandList& RHICmdList,
    FMeshDrawCommandStateCache& StateCache)
{
    // 提交繪製開始.
    FMeshDrawCommand::SubmitDrawBegin(MeshDrawCommand, GraphicsMinimalPipelineStateSet, nullptr, 0, InstanceFactor, RHICmdList, StateCache);

    // 所有Nanite網格繪製指令都是使用相同的VS, 該命令擁有在渲染時刻賦值的材質深度.
    {
        FNaniteMaterialVS::FParameters Parameters;
        Parameters.MaterialDepth = MaterialDepth;
        SetShaderParameters(RHICmdList, NaniteVertexShader, NaniteVertexShader.GetVertexShader(), Parameters);
    }

    // 提交繪製結束.
    FMeshDrawCommand::SubmitDrawEnd(MeshDrawCommand, InstanceFactor, RHICmdList);
}

不過奇怪的是,繪製BasePass只指定了VS,而沒有指定PS,那麼PS究竟在哪裡設定的或者本來就是空的?為了探明真相,利用RenderDoc截幀分析,發現PS使用的依然是傳統的BasePassPixelShader,並且經過此階段之後渲染的GBuffer和傳統的基本一致:

左上:渲染畫面,右上:GBufferA,左下:GBufferB,右下:GBufferC

Nanite在渲染BasePass的過程中,是以材質為Pass來進行提交的,這意味著,可以利用之前渲染的材質範圍紋理和材質深度進行快速剔除,以下面了兩圖為例:

在渲染上面的第一幅圖的材質區域時,會根據材質深度和材質範圍來快速判斷和剔除畫素,如第二幅圖所示,紅色方框表示其覆蓋的所有畫素均沒有通過材質範圍檢測,會被頂點著色器完全拋棄,而綠色的畫素則表示通過了深度測試和材質範圍測試,將送入PS執行GBuffer的輸出。

6.4.3.7 Nanite光影

Nanite的光影計算和傳統的光影混夾在一起,都在RenderLights介面中:

// Engine\Source\Runtime\Renderer\Private\LightRendering.cpp

void FDeferredShadingSceneRenderer::RenderLights(
    FRDGBuilder& GraphBuilder,
    FMinimalSceneTextures& SceneTextures,
    const FTranslucencyLightingVolumeTextures& TranslucencyLightingVolumeTextures,
    FRDGTextureRef LightingChannelsTexture,
    FSortedLightSetSceneInfo& SortedLightSet)
{
    (......)

    const FSimpleLightArray &SimpleLights = SortedLightSet.SimpleLights;
    const TArray<FSortedLightSceneInfo, SceneRenderingAllocator> &SortedLights = SortedLightSet.SortedLights;
    const int32 AttenuationLightStart = SortedLightSet.AttenuationLightStart;
    const int32 SimpleLightsEnd = SortedLightSet.SimpleLightsEnd;

    (......)

    {
        RDG_EVENT_SCOPE(GraphBuilder, "DirectLighting");
        
        if (ViewFamily.EngineShowFlags.DirectLighting &&
            Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
        {
            // 更新模板緩衝, 為所有後續的Pass只標記一次簡單/複雜的階層材質.
            Strata::AddStrataStencilPass(GraphBuilder, Views, SceneTextures);
        }

        (......)

        // 無陰影光照.
        if(ViewFamily.EngineShowFlags.DirectLighting)
        {
            RDG_EVENT_SCOPE(GraphBuilder, "NonShadowedLights");
            
            (......)
        }

        // 帶陰影光照.
        {
            RDG_EVENT_SCOPE(GraphBuilder, "ShadowedLights");

            (......)

            // 繪製陰影和帶光照函式的光源.
            for (int32 LightIndex = AttenuationLightStart; LightIndex < SortedLights.Num(); LightIndex++)
            {
                (......)

                if (bDrawShadows)
                {
                    INC_DWORD_STAT(STAT_NumShadowedLights);
                    
                    (......)

                    else // (OcclusionType == FOcclusionType::Shadowmap)
                    {
                        (......)

                        // 清理陰影遮蔽紋理.
                        ClearShadowMask(ScreenShadowMaskTexture);

                        // 渲染陰影投射.
                        RenderDeferredShadowProjections(GraphBuilder, SceneTextures, TranslucencyLightingVolumeTextures, &LightSceneInfo, ScreenShadowMaskTexture, ScreenShadowMaskSubPixelTexture, bInjectedTranslucentVolume);
                    }

                    bUsedShadowMaskTexture = true;
                }

                (......)
                
                if (bDirectLighting)
                {
                    const bool bRenderOverlap = false;
                    // 渲染單個光源.
                    RenderLight(GraphBuilder, SceneTextures, &LightSceneInfo, ScreenShadowMaskTexture, LightingChannelsTexture, bRenderOverlap);
                }

                (......)
            }
        }
    }
}

由於UE5的RenderLights的處理邏輯和UE4高度相似,僅增加了Strata模板的初始化。下面繼續看RenderLight的邏輯:

void FDeferredShadingSceneRenderer::RenderLight(
    FRHICommandList& RHICmdList,
    const FViewInfo& View,
    const FLightSceneInfo* LightSceneInfo,
    FRHITexture* ScreenShadowMaskTexture,
    FRHITexture* LightingChannelsTexture,
    bool bRenderOverlap, bool bIssueDrawEvent)
{
    (......)

    // 渲染光源的內部介面.
    auto RenderInternalLight = [&](bool bStrataFastPath)
    {
    (......)
    
    // 設定Strata深度模板緩衝.
    if (Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
    {
        GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<
            false, CF_Always,
            true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,
            true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,
            Strata::StencilBit, 0x0>::GetRHI();
    }
    else
    {
        GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always>::GetRHI();
    }

    (......)

    if (LightProxy->GetLightType() == LightType_Directional)
    {
        (......)
        
        else
        {
            (......)

            FDeferredLightPS::FPermutationDomain PermutationVector;
            
            (......)
            
            // 增加了Strata(階層)排序
            PermutationVector.Set< FDeferredLightPS::FStrata >(Strata::IsStrataEnabled());
            PermutationVector.Set< FDeferredLightPS::FStrataFastPath >(Strata::IsStrataEnabled() && Strata::IsClassificationEnabled() && bStrataFastPath);

            TShaderMapRef< FDeferredLightPS > PixelShader( View.ShaderMap, PermutationVector );
            
            (......)
        }
        
        (......)

        // 設定Strata目標值.
        RHICmdList.SetStencilRef(bStrataFastPath ? Strata::StencilBit : 0u);

        // 全螢幕繪製平行光.
        DrawRectangle(
            RHICmdList,
            0, 0,
            View.ViewRect.Width(), View.ViewRect.Height(),
            View.ViewRect.Min.X, View.ViewRect.Min.Y,
            View.ViewRect.Width(), View.ViewRect.Height(),
            View.ViewRect.Size(),
            GetSceneTextureExtent(),
            VertexShader,
            EDRF_UseTriangleOptimization);
    }
    else // 非平行光(區域性光源)
    {
        (......)
        
        TShaderMapRef<TDeferredLightVS<true> > VertexShader(View.ShaderMap);

        // 相機是否在光源幾何體內部.
        const bool bCameraInsideLightGeometry = ((FVector)View.ViewMatrices.GetViewOrigin() - LightBounds.Center).SizeSquared() < FMath::Square(LightBounds.W * 1.05f + View.NearClippingDistance * 2.0f)
            || !View.IsPerspectiveProjection();

        // 設定繫結幾何體光柵化和深度狀態, 其中bCameraInsideLightGeometry在此傳進入.
        SetBoundingGeometryRasterizerAndDepthState(GraphicsPSOInit, View, bCameraInsideLightGeometry);

        (......)
        else
        {
            (......)
            
            // Strata.
            PermutationVector.Set< FDeferredLightPS::FStrata >(Strata::IsStrataEnabled());
            PermutationVector.Set< FDeferredLightPS::FStrataFastPath >(Strata::IsStrataEnabled() && Strata::IsClassificationEnabled() && bStrataFastPath);

            TShaderMapRef< FDeferredLightPS > PixelShader( View.ShaderMap, PermutationVector );
            
            (......)
        }

        (......)
        
        RHICmdList.SetStencilRef(bStrataFastPath ? Strata::StencilBit : 0u);

        (......)

        // 根據不同型別的區域性光選擇不同的形狀繪製.
        if( LightProxy->GetLightType() == LightType_Point ||
            LightProxy->GetLightType() == LightType_Rect )
        {
            StencilingGeometry::DrawSphere(RHICmdList);
        }
        else if (LightProxy->GetLightType() == LightType_Spot)
        {
            StencilingGeometry::DrawCone(RHICmdList);
        }
    }
    };

    // 呼叫一次非Strata版本的光源繪製(UE4的光源計算模式).
    RenderInternalLight(false);
    
    // 如果開啟了Strata, 則再呼叫一次Strata版本的光源繪製.
    if (Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
    {
        RenderInternalLight(true);
    }
}

而光照的Shader程式碼也僅僅是增加了對Strata的支援,此處就不展開探討了。

另外,值得一提的是,UE5的陰影計算使用了虛擬陰影圖(VirtualShadowMap,VSM)技術,它是一種新的陰影投射方法,用於提供一致的、高解析度的陰影、與電影質量的資產和大型開放世界的動態照明。

VSM最早由Markus Giegl等人在2007年提出,並發表了論文Queried Virtual Shadow Maps ,隨後又發表了改進篇Fitted Virtual Shadow Maps。多年後的2015年,Olsson Ola等人結合了Clusterred等渲染技術,發表了論文More efficient virtual shadow maps for many lights

該技術的核心在於它以一種適應性的方式渲染陰影圖,即在需要的地方建立更大的陰影貼圖解析度,不需要儲存來自前一幀的資訊,使其適用於完全動態的場景。因此,它可以保證陰影圖亞畫素精度的查詢,消除了傳統陰影圖的投影和透視鋸齒。

VSM採用虛擬分塊陰影圖(Virtual Tiled Shadow Mapping)技術,演算法描述如下:

  • 分配GPU能夠支援的最大紋理解析度(早期一般取\(4096 \times 4096\),現今\(16k \times 16k\)或更高)。
  • 沿著X和Y方向劃分陰影圖成\(n \times n\)(如\(16 \times 16\))大小相等的分塊(每個分塊使用最大陰影圖紋理解析度的紋素),因此最大陰影圖的有效解析度相當於\((16\times 4096)\times (16\times 4096) = 65536\times65536\)。針對每個分塊:
    • 渲染陰影到陰影圖紋理(使用分塊的光源視錐進行裁剪,且覆蓋之前分塊的陰影圖)。
    • 立即使用它來遮蔽場景中被當前陰影圖覆蓋的部分。

上:使用傳統的陰影圖,出現了嚴重的鋸齒問題;下:使用了32x32 2048x2048的QVSM,陰影精度得到極大提升。

在UE5的實現中,VSM的最大解析度為\(16k \times 16k\)畫素,每個分塊(頁面)大小為\(128 \times 128\),以便在合理的記憶體成本下保持較高的效能。分塊的分配和渲染只需要根據螢幕上需要著色的畫素(基於深度緩衝區的分析)。分塊會被快取在幀之間,除非它們涉及的物體或燈光移動,這進一步提高了效能。

另外,UE5對定向光的陰影採納了ClipMap技術,以取代CSM獲取更高的陰影圖解析度。ClipMap最早於1998年由Christopher C. Tanner等人在論文The clipmap: a virtual mipmap中提出。該技術的核心在於設定一個陰影圖mipmap大小的上限,超過這個上限的mipmap會被clip掉(不會載入到記憶體中):

由此構成了Clipmap Stack(堆疊)和Clipmap Pyramid(金字塔):

當攝像機(視野)發生變化時,需要修改重對映Clipmap Stack的區域,並載入重對映之後的Clipmap資料,使得Clipmap Stack部分和視野相對應:

視野發生變化後的Clipmap更新示意圖,此處使用了環形更新(Toroidal Update)來提升效能。

6.4.4 Nanite總結

Nanite技術涉及了渲染前的預處理構建、渲染時的各級粒度裁剪、光柵化、BasePass和Lighting階段。這期間應用了大量的資料結構、演算法、渲染技術以及對應的優化技術。

Nanite並非如之前所傳的使用了Geometry Image技術,而是使用了Cluster、ClusterGroup、Page為基礎的各級粗糙代表,這種技術可以充分利用預計算提前構建簡化的資料以及對應的儲存資料,以便在渲染時較高效地重建、索引、處理和渲染Nanite資料,但也導致了Nanite只支援靜態網格的缺點。

Nanite的渲染階段穿插於傳統的渲染管線中,先後經歷GPUScene更新、流管理、裁剪、光柵化、BasePass和Readback等階段,充分發揮了GPU-Driven Rendering Pipeline的威力,最終將Nanite的資料良好地呈現到RenderTrage上。每個步驟都歷經了眾多Pass、渲染技術和優化技巧,比如:裁剪有逐Instance、逐Cluster、逐Page、逐三角形等不同粒度的裁剪,都是GPU Driven的裁剪,以減少CPU和GPU的IO;光柵化階段預設使用了CS軟光柵+PS硬光柵的混合關係,其中CS軟光柵負責面積很小的三角光柵化(避免Quad Overdraw),而PS負責面積較大的三角形光柵化,光柵化之後輸出的只是三角形ID和深度(Visibility Buffer技術),以減少GBuffer的佔用和頻寬的消耗;BasePass輸出的結果跟傳統的一樣,儲存於GBufferA、GBufferB...之中;後續的光照計算階段,除了增加Strata模式的支援,其它光照邏輯基本和傳統一樣。

此外,為了提升陰影的質量和優化陰影的消耗,使用了VSM和Clipmap計算,獲得了效果和消耗相平衡的實時渲染。

 

以下章節將在UE5特輯Part 2呈現:

6.5 Lumen

6.6 其它渲染技術

6.7 本篇總結

 

特別說明

  • 感謝所有參考文獻的作者,部分圖片來自參考文獻和網路,侵刪。
  • 本系列文章為筆者原創,只發表在部落格園上,歡迎分享本文連結,但未經同意,不允許轉載
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目

 

參考文獻

相關文章