序言
FPTL也叫Fine Pruned Tiled Light Lists。自從我開發HDRP的時候,就一直想把這個FPTL的實現完全看明白。但是無論是國內還是國外都幾乎找不到關於HDRP相關演算法的解析,基本上都是寥寥草草幾句話帶過,於是乎就有了這篇水文。
FPTL的大致的流程
CPU準備燈光資料流程:
使用Jobs ProcessedVisibleLightsBuilder 轉換VisibleLight成LightRenderData(HQLightRenderDatabase中的m_LightData),主要目的是排序燈光資料,使其更加緊湊,對GPU遍歷效能也更加友好
使用Jobs GpuLightsBuilder,將LightRenderData轉成LightData(渲染中實際上要用的資料,剔除要用的資料lightBounds,lightVolumes)
最後Push Light Data到GPU
GPU渲染流程:
預先生成Pre Depth深度圖,
清理LightList.(一般是更改解析度觸發)
計算Light的ScrBound,(GenerateLightsScreenSpaceAABBs)
BigTileLightList建立(可選BigTilePrepass),
更細小的Tile的LightList建立。(BuildPerTileLightList)
VoxelLightList建立(透明物體接受光照使用VoxelLightList)
//HDRenderPipeline.LightLoop.cs
BuildGPULightListOutput BuildGPULightList(
RenderGraph renderGraph,
HDCamera hdCamera,
TileAndClusterData tileAndClusterData,
int totalLightCount,
ref ShaderVariablesLightList constantBuffer,
TextureHandle depthStencilBuffer,
TextureHandle stencilBufferCopy,
GBufferOutput gBuffer)
{
using (var builder = renderGraph.AddRenderPass<BuildGPULightListPassData>("Build Light List", out var passData, ProfilingSampler.Get(HDProfileId.BuildLightList)))
{
builder.EnableAsyncCompute(hdCamera.frameSettings.BuildLightListRunsAsync());
PrepareBuildGPULightListPassData(renderGraph, builder, hdCamera, tileAndClusterData, ref constantBuffer, totalLightCount, depthStencilBuffer, stencilBufferCopy, gBuffer, passData);
builder.SetRenderFunc(
(BuildGPULightListPassData data, RenderGraphContext context) =>
{
bool tileFlagsWritten = false;
ClearLightLists(data, context.cmd);
GenerateLightsScreenSpaceAABBs(data, context.cmd);
BigTilePrepass(data, context.cmd);
BuildPerTileLightList(data, ref tileFlagsWritten, context.cmd);
VoxelLightListGeneration(data, context.cmd);
BuildDispatchIndirectArguments(data, tileFlagsWritten, context.cmd);
});
return passData.output;
}
}
從HDAdditionalLightData(前端)到HDLightRenderDatabase(後端)
建立燈光的時候,Light自動Attach了一個MonoBehavior HDAdditionalLightData,這一功能實現是在Light的Editor內。
透過繼承LightEditor,再加上CustomEditorForRenderPipeline Attribute就可以讓BuildIn的燈光Editor在不同的管線切換到管線擴充套件的燈光Editor。
//HDLightEditor.cs
[CanEditMultipleObjects]
[CustomEditorForRenderPipeline(typeof(Light), typeof(HDRenderPipelineAsset))]
sealed partial class HDLightEditor : LightEditor
{
...
protected override void OnEnable()
{
base.OnEnable();
// 自動Attach邏輯
m_AdditionalLightDatas = CoreEditorUtils.GetAdditionalData<HDAdditionalLightData>(targets, HDAdditionalLightData.InitDefaultHDAdditionalLightData);
m_SerializedHDLight = new SerializedHDLight(m_AdditionalLightDatas, settings);
// Update emissive mesh and light intensity when undo/redo
Undo.undoRedoPerformed += OnUndoRedo;
HDLightUI.RegisterEditor(this);
}
...
}
建立燈光的時候,HDAdditionalLightData(Light建立時自動Attach的MonoBehavior)就需要在HDLightRenderDatabase中建立出HDLightRenderEntity(用於定址用的控制代碼)
可以看到HDLightRenderDatabase燈光資料庫,用於方便(容易繞暈)索引燈光資料,採用的是單例模式。
//HDAdditionalLightData.cs
internal void CreateHDLightRenderEntity(bool autoDestroy = false)
{
if (!this.lightEntity.valid)
{
HDLightRenderDatabase lightEntities = HDLightRenderDatabase.instance;
this.lightEntity = lightEntities.CreateEntity(autoDestroy);
lightEntities.AttachGameObjectData(this.lightEntity, legacyLight.GetInstanceID(), this, legacyLight.gameObject);
}
UpdateRenderEntity();
}
void OnEnable()
{
...
CreateHDLightRenderEntity();
}
//HDLightRenderDatabase.cs
//Light rendering entity. This struct acts as a handle to set / get light render information into the database.
internal struct HDLightRenderEntity
{
public int entityIndex;
public static readonly HDLightRenderEntity Invalid = new HDLightRenderEntity() { entityIndex = HDLightRenderDatabase.InvalidDataIndex };
public bool valid { get { return entityIndex != HDLightRenderDatabase.InvalidDataIndex; } }
}
//HDLightRenderDatabase.cs
internal partial class HDLightRenderDatabase
{
....
static public HDLightRenderDatabase instance
{
get
{
if (s_Instance == null)
s_Instance = new HDLightRenderDatabase();
return s_Instance;
}
}
...
}
在修改燈光相關資料的時候,需要透過lightEntity控制代碼進行定址,然後修改HDLightRenderDatabase中的HDLightRenderData
//HDAdditionalLightData.cs
//類似的還有UpdateRenderEntity
public void SetAreaLightSize(Vector2 size)
{
...
if (lightEntity.valid)
{
ref HDLightRenderData lightRenderData = ref HDLightRenderDatabase.instance.EditLightDataAsRef(lightEntity);
lightRenderData.shapeWidth = m_ShapeWidth;
lightRenderData.shapeHeight = m_ShapeHeight;
}
...
}
//HDLightRenderDatabase.cs
....
//Gets and edits a reference. Must be not called during rendering pipeline, only during game object modification.
public ref HDLightRenderData EditLightDataAsRef(in HDLightRenderEntity entity) => ref EditLightDataAsRef(m_LightEntities[entity.entityIndex].dataIndex);
//Gets and edits a reference. Must be not called during rendering pipeline, only during game object modification.
public ref HDLightRenderData EditLightDataAsRef(int dataIndex)
{
if (dataIndex >= m_LightCount)
throw new Exception("Entity passed in is out of bounds. Index requested " + dataIndex + " and maximum length is " + m_LightCount);
unsafe
{
HDLightRenderData* data = (HDLightRenderData*)m_LightData.GetUnsafePtr<HDLightRenderData>() + dataIndex;
return ref UnsafeUtility.AsRef<HDLightRenderData>(data);
}
}
...
同樣,刪除燈光的時候也需要銷燬對應的lightEntity
//HDAdditionalLightData.cs
void OnDestroy()
{
...
DestroyHDLightRenderEntity();
}
internal void DestroyHDLightRenderEntity()
{
if (!lightEntity.valid)
return;
HDLightRenderDatabase.instance.DestroyEntity(lightEntity);
lightEntity = HDLightRenderEntity.Invalid;
}
//HDLightRenderDatabase.cs
public void DestroyEntity(HDLightRenderEntity lightEntity)
{
Assert.IsTrue(IsValid(lightEntity));
m_FreeIndices.Enqueue(lightEntity.entityIndex);
LightEntityInfo entityData = m_LightEntities[lightEntity.entityIndex];
m_LightsToEntityItem.Remove(entityData.lightInstanceID);
if (m_HDAdditionalLightData[entityData.dataIndex] != null)
--m_AttachedGameObjects;
RemoveAtSwapBackArrays(entityData.dataIndex);
if (m_LightCount == 0)
{
DeleteArrays();
}
else
{
HDLightRenderEntity entityToUpdate = m_OwnerEntity[entityData.dataIndex];
LightEntityInfo dataToUpdate = m_LightEntities[entityToUpdate.entityIndex];
dataToUpdate.dataIndex = entityData.dataIndex;
m_LightEntities[entityToUpdate.entityIndex] = dataToUpdate;
if (dataToUpdate.lightInstanceID != entityData.lightInstanceID)
m_LightsToEntityItem[dataToUpdate.lightInstanceID] = dataToUpdate;
}
}
而HDLightRenderData就是我們基於BuildIn的燈光擴充套件資料型別,也是存放在HDLightRenderDatabase最基本的儲存資料。
而渲染的時候,我們通常來說只能拿到CullingResults的VisibleLights陣列也就是說可以拿到BuildIn的Light物件。
而這時候就需要我們對Light物件操作一次TryGetComponent<HDAdditionalLightData>(及其離譜)。
//HDProcessedVisibleLightsBuilder.LightLoop.cs
private void BuildVisibleLightEntities(in CullingResults cullResults)
{
...
//明知道是bullshit還是不改,TODO了一萬年了
//TODO: this should be accelerated by a c++ API
var defaultEntity = HDLightRenderDatabase.instance.GetDefaultLightEntity();
for (int i = 0; i < cullResults.visibleLights.Length; ++i)
{
Light light = cullResults.visibleLights[i].light;
int dataIndex = HDLightRenderDatabase.instance.FindEntityDataIndex(light);
if (dataIndex == HDLightRenderDatabase.InvalidDataIndex)
{
//Shuriken lights bullshit: this happens because shuriken lights dont have the HDAdditionalLightData OnEnabled.
//Because of this, we have to forcefully create a light render entity on the rendering side. Horrible!!!
if (light.TryGetComponent<HDAdditionalLightData>(out var hdAdditionalLightData))
{
if (!hdAdditionalLightData.lightEntity.valid)
hdAdditionalLightData.CreateHDLightRenderEntity(autoDestroy: true);
}
else
dataIndex = HDLightRenderDatabase.instance.GetEntityDataIndex(defaultEntity);
}
m_VisibleLightEntityDataIndices[i] = dataIndex;
m_VisibleLightBakingOutput[i] = light.bakingOutput;
m_VisibleLightShadowCasterMode[i] = light.lightShadowCasterMode;
m_VisibleLightShadows[i] = light.shadows;
}
}
HDProcessedVisibleLightsBuilder,HDGpuLightsBuilder
HDProcessedVisibleLightsBuilder
HDProcessedVisibleLightsBuilder顧名思義,就是用來處理VisibleLight的Builder(採用的是設計模式中的建立者模式)。
由於上面提到渲染管線的CullingResult只能拿到Light物件,
所以為了拿到對應Visible的HDLightRenderData,就需要用HDProcessedVisibleLightsBuilder對燈光進行預處理(拿到燈光資料,排序)。
//LightingLoop.cs
// Compute data that will be used during the light loop for a particular light.
void PreprocessVisibleLights(CommandBuffer cmd, HDCamera hdCamera, in CullingResults cullResults, DebugDisplaySettings debugDisplaySettings, in AOVRequestData aovRequest)
{
using (new ProfilingScope(cmd, ProfilingSampler.Get(HDProfileId.ProcessVisibleLights)))
{
m_ProcessedLightsBuilder.Build(
hdCamera,
cullResults,
m_ShadowManager,
m_ShadowInitParameters,
aovRequest,
lightLoopSettings,
m_CurrentDebugDisplaySettings);
...
}
}
BuildVisibleLightEntities就是上面Light物件操作一次TryGetComponent然後拿到lightEntity的函式,就不再貼程式碼了。
//HDProcessedVisibleLightsBuilder.cs
//Builds sorted HDProcessedVisibleLight structures.
public void Build(
HDCamera hdCamera,
in CullingResults cullingResult,
HDShadowManager shadowManager,
in HDShadowInitParameters inShadowInitParameters,
in AOVRequestData aovRequestData,
in GlobalLightLoopSettings lightLoopSettings,
DebugDisplaySettings debugDisplaySettings)
{
BuildVisibleLightEntities(cullingResult);
if (m_Size == 0)
return;
FilterVisibleLightsByAOV(aovRequestData);
StartProcessVisibleLightJob(hdCamera, cullingResult.visibleLights, lightLoopSettings, debugDisplaySettings);
CompleteProcessVisibleLightJob();
SortLightKeys();
ProcessShadows(hdCamera, shadowManager, inShadowInitParameters, cullingResult);
}
可以看到Build函式里面開啟了一個Jobs,StartProcessVisibleLightJob,這個Jobs主要是用來對不同燈光種類進行分類計數並且按照重要性進行排序
m_ProcessVisibleLightCounts 是分類計數器的陣列,陣列元素統計了(Directional,Area,Punctual,Area,Shadow,BakeShadow)的數量。
m_ProcessedLightVolumeType 每個VisibleLight對應的LightVolumeType。
m_ProcessedEntities 就是每個VisibleLight對應的HDProcessedVisibleLight
m_SortKeys就是後續用來排序重要性的陣列
//HDProcessedVisibleLightsBuilder.Jobs.cs
...
#region output processed lights
[WriteOnly]
public NativeArray<int> processedVisibleLightCountsPtr;
[WriteOnly]
public NativeArray<LightVolumeType> processedLightVolumeType;
[WriteOnly]
public NativeArray<HDProcessedVisibleLight> processedEntities;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<uint> sortKeys;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<int> shadowLightsDataIndices;
#endregion
...
public void StartProcessVisibleLightJob(
HDCamera hdCamera,
NativeArray<VisibleLight> visibleLights,
in GlobalLightLoopSettings lightLoopSettings,
DebugDisplaySettings debugDisplaySettings)
{
if (m_Size == 0)
return;
var lightEntityCollection = HDLightRenderDatabase.instance;
var processVisibleLightJob = new ProcessVisibleLightJob()
{
//Parameters.
....
//render light entities.
lightData = lightEntityCollection.lightData,
//data of all visible light entities.
visibleLights = visibleLights,
....
//Output processed lights.
processedVisibleLightCountsPtr = m_ProcessVisibleLightCounts,
processedLightVolumeType = m_ProcessedLightVolumeType,
processedEntities = m_ProcessedEntities,
sortKeys = m_SortKeys,
shadowLightsDataIndices = m_ShadowLightsDataIndices
};
m_ProcessVisibleLightJobHandle = processVisibleLightJob.Schedule(m_Size, 32);
}
HDProcessedVisibleLight是VisibleLight轉換到LightData之前的中間載體,
記錄了在HDLightRenderDatabase中lightEntity的Index,
渲染時GPULightType,以及對應的HDLightType。
這裡為什麼要有HDLightType,主要是因為HDRP把面光源也當做了Point Light Type,這樣才能夠不被剔除(在cullingResult.VisibleLights裡面),因為面光在BuildIn裡面只能夠烘焙,不能設定成RealTime。
其他資料暫時用不上就不說了。
public enum HDLightType
{
/// <summary>Spot Light. Complete this type by setting the SpotLightShape too.</summary>
Spot = LightType.Spot,
/// <summary>Directional Light.</summary>
Directional = LightType.Directional,
/// <summary>Point Light.</summary>
Point = LightType.Point,
/// <summary>Area Light. Complete this type by setting the AreaLightShape too.</summary>
Area = LightType.Area,
}
enum GPULightType
{
Directional,
Point,
Spot,
ProjectorPyramid,
ProjectorBox,
// AreaLight
Tube, // Keep Line lights before Rectangle. This is needed because of a compiler bug (see LightLoop.hlsl)
Rectangle,
// Currently not supported in real time (just use for reference)
Disc,
// Sphere,
};
internal struct HDProcessedVisibleLight
{
public int dataIndex;
public GPULightType gpuLightType;
public HDLightType lightType;
public float lightDistanceFade;
public float lightVolumetricDistanceFade;
public float distanceToCamera;
public HDProcessedVisibleLightsBuilder.ShadowMapFlags shadowMapFlags;
public bool isBakedShadowMask;
}
LightVolumeType主要是用來描述燈光體積的形狀,就可以在ComputeShader內應用不同的的剔除計算
internal enum LightVolumeType
{
Cone,
Sphere,
Box,
Count
}
LightCategory這裡的命名我認為是有問題的。叫燈光種類,實際上貼花,體積霧都進來了,這些實際上是因為HDRP的貼花和體積霧也想要參與剔除計算(一般體積霧和貼花都能夠用LightVolumeType描述自身體積的形狀)。
後續剔除之後,LightList裡面也會根據不同的LightCategory劃分Buffer佈局。
internal enum LightCategory
{
Punctual,
Area,
Env,
Decal,
LocalVolumetricFog, // WARNING: Currently lightlistbuild.compute assumes Local Volumetric Fog is the last element in the LightCategory enum. Do not append new LightCategory types after LocalVolumetricFog. TODO: Fix .compute code.
Count
}
回到ProcessVisibleLightJob上面來,可以看到這裡並行地對cullingResult.VisibleLight進行處理,得出light對應的lightCategory,gpuLightType,lightVolumeType。
將lightCategory, gpuLightType, lightVolumeType, index打包成SortKey.從PackLightSortKey看得出來平行光的重要性是最高的。
//HDProcessedVisibleLightsBuilder.Jobs.cs
//對計數器進行原子操作
private int IncrementCounter(HDProcessedVisibleLightsBuilder.ProcessLightsCountSlots counterSlot)
{
int outputIndex = 0;
unsafe
{
int* ptr = (int*)processedVisibleLightCountsPtr.GetUnsafePtr<int>() + (int)counterSlot;
outputIndex = Interlocked.Increment(ref UnsafeUtility.AsRef<int>(ptr));
}
return outputIndex;
}
private int NextOutputIndex() => IncrementCounter(HDProcessedVisibleLightsBuilder.ProcessLightsCountSlots.ProcessedLights) - 1;
//在HDGpuLightsBuilder.cs中
public static uint PackLightSortKey(LightCategory lightCategory, GPULightType gpuLightType, LightVolumeType lightVolumeType, int lightIndex)
{
//We sort directional lights to be in the beginning of the list.
//This ensures that we can access directional lights very easily after we sort them.
uint isDirectionalMSB = gpuLightType == GPULightType.Directional ? 0u : 1u;
uint sortKey = (uint)isDirectionalMSB << 31 | (uint)lightCategory << 27 | (uint)gpuLightType << 22 | (uint)lightVolumeType << 17 | (uint)lightIndex;
return sortKey;
}
//Unpacks a sort key for a light
public static void UnpackLightSortKey(uint sortKey, out LightCategory lightCategory, out GPULightType gpuLightType, out LightVolumeType lightVolumeType, out int lightIndex)
{
lightCategory = (LightCategory)((sortKey >> 27) & 0xF);
gpuLightType = (GPULightType)((sortKey >> 22) & 0x1F);
lightVolumeType = (LightVolumeType)((sortKey >> 17) & 0x1F);
lightIndex = (int)(sortKey & 0xFFFF);
}
//End of HDGpuLightsBuilder.cs
public void Execute(int index)
{
VisibleLight visibleLight = visibleLights[index];
int dataIndex = visibleLightEntityDataIndices[index];
LightBakingOutput bakingOutput = visibleLightBakingOutput[index];
LightShadows shadows = visibleLightShadows[index];
if (TrivialRejectLight(visibleLight, dataIndex))
return;
ref HDLightRenderData lightRenderData = ref GetLightData(dataIndex);
...
//防止超出同螢幕(Area,Punctual,Directional)光源限制
if (!IncrementLightCounterAndTestLimit(lightCategory, gpuLightType))
return;
//原子操作
int outputIndex = NextOutputIndex();
sortKeys[outputIndex] = HDGpuLightsBuilder.PackLightSortKey(lightCategory, gpuLightType, lightVolumeType, index);
processedLightVolumeType[index] = lightVolumeType;
processedEntities[index] = new HDProcessedVisibleLight()
{
dataIndex = dataIndex,
gpuLightType = gpuLightType,
lightType = lightType,
lightDistanceFade = lightDistanceFade,
lightVolumetricDistanceFade = volumetricDistanceFade,
distanceToCamera = distanceToCamera,
shadowMapFlags = shadowMapFlags,
isBakedShadowMask = isBakedShadowMaskLight
};
...
}
所以經過HDProcessedVisibleLightsBuilder的ProcessVisibleLightJob排程Jobs對visibleLights進行處理之後,就得到了預處理過後的HDProcessedVisibleLight以及LightVolumeType,以及一個根據重要性排序過的SortKey陣列。
而這個SortKey陣列則是後續CreateGpuLightDataJob中定址燈光資料最主要的手段。
(透過UnpackSortKey,我們能夠得到當前燈光在m_ProcessedLightVolumeType/m_ProcessedEntities的Index)
HDGpuLightsBuilder
經過 HDProcessedVisibleLightsBuilder的預處理,得到了SortKey,每個VisibleLight對應的HDProcessedVisibleLight以及LightVolumeType。
接下來就是透過Jobs對HDProcessedVisibleLight以及LightVolumeType轉化成最終在渲染時的LightData,DirectionalLightData,剔除計算用的SFiniteLightBound,LightVolumeData。
//LightLoop.cs
void PrepareGPULightdata(CommandBuffer cmd, HDCamera hdCamera, CullingResults cullResults)
{
using (new ProfilingScope(cmd, ProfilingSampler.Get(HDProfileId.PrepareGPULightdata)))
{
// 2. Go through all lights, convert them to GPU format.
// Simultaneously create data for culling (LightVolumeData and SFiniteLightBound)
m_GpuLightsBuilder.Build(cmd, hdCamera, cullResults,
m_ProcessedLightsBuilder,
HDLightRenderDatabase.instance, m_ShadowInitParameters, m_CurrentDebugDisplaySettings);
...
}
}
//HDGpuLightsBuilder.LightLoop.cs
public void Build(
CommandBuffer cmd,
HDCamera hdCamera,
in CullingResults cullingResult,
HDProcessedVisibleLightsBuilder visibleLights,
HDLightRenderDatabase lightEntities,
in HDShadowInitParameters shadowInitParams,
DebugDisplaySettings debugDisplaySettings)
{
...
if (totalLightsCount > 0)
{
...
StartCreateGpuLightDataJob(hdCamera, cullingResult, hdShadowSettings, visibleLights, lightEntities);
CompleteGpuLightDataJob();
CalculateAllLightDataTextureInfo(cmd, hdCamera, cullingResult, visibleLights, lightEntities, hdShadowSettings, shadowInitParams, debugDisplaySettings);
}
}
從Execute可以看出來,對於平行光直接轉換成GPUFormat(DirectionalLightData),而其他光源執行StoreAndConvertLightToGPUFormat。
//HDGpuLightsBuilder.Jobs.cs
#region output processed lights
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<LightData> lights;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<DirectionalLightData> directionalLights;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<LightsPerView> lightsPerView;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<SFiniteLightBound> lightBounds;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<LightVolumeData> lightVolumes;
[WriteOnly]
[NativeDisableContainerSafetyRestriction]
public NativeArray<int> gpuLightCounters;
#endregion
public void Execute(int index)
{
var sortKey = sortKeys[index];
HDGpuLightsBuilder.UnpackLightSortKey(sortKey, out var lightCategory, out var gpuLightType, out var lightVolumeType, out var lightIndex);
if (gpuLightType == GPULightType.Directional)
{
int outputIndex = index;
ConvertDirectionalLightToGPUFormat(outputIndex, lightIndex, lightCategory, gpuLightType, lightVolumeType);
}
else
{
int outputIndex = index - directionalSortedLightCounts;
StoreAndConvertLightToGPUFormat(outputIndex, lightIndex, lightCategory, gpuLightType, lightVolumeType);
}
}
public void StartCreateGpuLightDataJob(
HDCamera hdCamera,
in CullingResults cullingResult,
HDShadowSettings hdShadowSettings,
HDProcessedVisibleLightsBuilder visibleLights,
HDLightRenderDatabase lightEntities)
{
...
var createGpuLightDataJob = new CreateGpuLightDataJob()
{
//Parameters
....
//outputs
gpuLightCounters = m_LightTypeCounters,
lights = m_Lights,
directionalLights = m_DirectionalLights,
lightsPerView = m_LightsPerView,
lightBounds = m_LightBounds,
lightVolumes = m_LightVolumes
};
m_CreateGpuLightDataJobHandle = createGpuLightDataJob.Schedule(visibleLights.sortedLightCounts, 32);
}
StoreAndConvertLightToGPUFormat具體的內容就是跟平行光類似,除了需要轉換成GPUFormat(LightData),還需要轉換出剔除計算用的SFiniteLightBound,LightVolumeData。
需要留意的是,在這裡也對LightCategory.Punctual,LightCategory.Area的光源進行計數了。(不啟用FPTL或者Cluster的時候進行遍歷LightData用)
//HDGpuLightsBuilder.Jobs.cs
private void ComputeLightVolumeDataAndBound(
LightCategory lightCategory, GPULightType gpuLightType, LightVolumeType lightVolumeType,
in VisibleLight light, in LightData lightData, in Vector3 lightDimensions, in Matrix4x4 worldToView, int outputIndex)
{
// Then Culling side
var range = lightDimensions.z;
var lightToWorld = light.localToWorldMatrix;
Vector3 positionWS = lightData.positionRWS;
Vector3 positionVS = worldToView.MultiplyPoint(positionWS);
Vector3 xAxisVS = worldToView.MultiplyVector(lightToWorld.GetColumn(0));
Vector3 yAxisVS = worldToView.MultiplyVector(lightToWorld.GetColumn(1));
Vector3 zAxisVS = worldToView.MultiplyVector(lightToWorld.GetColumn(2));
// Fill bounds
var bound = new SFiniteLightBound();
var lightVolumeData = new LightVolumeData();
lightVolumeData.lightCategory = (uint)lightCategory;
lightVolumeData.lightVolume = (uint)lightVolumeType;
if (gpuLightType == GPULightType.Spot || gpuLightType == GPULightType.ProjectorPyramid)
{
...
}
else if (gpuLightType == GPULightType.Point)
{
...
}
else if (gpuLightType == GPULightType.Tube)
{
...
}
else if (gpuLightType == GPULightType.Rectangle)
{
...
}
else if (gpuLightType == GPULightType.ProjectorBox)
{
...
}
else if (gpuLightType == GPULightType.Disc)
{
//not supported at real time at the moment
}
else
{
Debug.Assert(false, "TODO: encountered an unknown GPULightType.");
}
lightBounds[outputIndex] = bound;
lightVolumes[outputIndex] = lightVolumeData;
}
private void StoreAndConvertLightToGPUFormat(
int outputIndex, int lightIndex,
LightCategory lightCategory, GPULightType gpuLightType, LightVolumeType lightVolumeType)
{
var light = visibleLights[lightIndex];
var processedEntity = processedEntities[lightIndex];
var lightData = new LightData();
ref HDLightRenderData lightRenderData = ref GetLightData(processedEntity.dataIndex);
ConvertLightToGPUFormat(
lightCategory, gpuLightType, globalConfig,
visibleLightShadowCasterMode[lightIndex],
visibleLightBakingOutput[lightIndex],
light,
processedEntity,
lightRenderData,
out var lightDimensions,
ref lightData);
for (int viewId = 0; viewId < viewCounts; ++viewId)
{
var lightsPerViewContainer = lightsPerView[viewId];
ComputeLightVolumeDataAndBound(
lightCategory, gpuLightType, lightVolumeType,
light, lightData, lightDimensions, lightsPerViewContainer.worldToView, outputIndex + lightsPerViewContainer.boundsOffset);
}
if (useCameraRelativePosition)
lightData.positionRWS -= cameraPos;
switch (lightCategory)
{
case LightCategory.Punctual:
IncrementCounter(HDGpuLightsBuilder.GPULightTypeCountSlots.Punctual);
break;
case LightCategory.Area:
IncrementCounter(HDGpuLightsBuilder.GPULightTypeCountSlots.Area);
break;
default:
Debug.Assert(false, "TODO: encountered an unknown LightCategory.");
break;
}
#if DEBUG
if (outputIndex < 0 || outputIndex >= outputLightCounts)
throw new Exception("Trying to access an output index out of bounds. Output index is " + outputIndex + "and max length is " + outputLightCounts);
#endif
lights[outputIndex] = lightData;
}
SFiniteLightBound,LightVolumeData
SFiniteLightBound 記錄了燈光AABB資料(軸朝向,View Space中的中心點座標,半徑)
LightVolumeData則是主要用於記錄LightVolumeType為Box時的剔除用資料(boxInnerDist/boxInvRange),以及Spot,ProjectorPyramid型別燈光剔除用資料(cotan)
[GenerateHLSL]
struct SFiniteLightBound
{
public Vector3 boxAxisX; // Scaled by the extents (half-size)
public Vector3 boxAxisY; // Scaled by the extents (half-size)
public Vector3 boxAxisZ; // Scaled by the extents (half-size)
public Vector3 center; // Center of the bounds (box) in camera space
public float scaleXY; // Scale applied to the top of the box to turn it into a truncated pyramid (X = Y)
public float radius; // Circumscribed sphere for the bounds (box)
};
[GenerateHLSL]
struct LightVolumeData
{
public Vector3 lightPos; // Of light's "origin"
public uint lightVolume; // Type index
public Vector3 lightAxisX; // Normalized
public uint lightCategory; // Category index
public Vector3 lightAxisY; // Normalized
public float radiusSq; // Cone and sphere: light range squared
public Vector3 lightAxisZ; // Normalized
public float cotan; // Cone: cotan of the aperture (half-angle)
public Vector3 boxInnerDist; // Box: extents (half-size) of the inner box
public uint featureFlags;
public Vector3 boxInvRange; // Box: 1 / (OuterBoxExtents - InnerBoxExtents)
public float unused2;
};
先以最簡單的Point Light舉例,捋一遍剔除的流程,這樣後續其他型別的燈光剔除流程也只是計算上的差別了。
Point Light的SFiniteLightBound/LightVolumeData資料
private void ComputeLightVolumeDataAndBound(
LightCategory lightCategory, GPULightType gpuLightType, LightVolumeType lightVolumeType,
in VisibleLight light, in LightData lightData, in Vector3 lightDimensions, in Matrix4x4 worldToView, int outputIndex)
{
...
else if (gpuLightType == GPULightType.Point)
{
// Construct a view-space axis-aligned bounding cube around the bounding sphere.
// This allows us to utilize the same polygon clipping technique for all lights.
// Non-axis-aligned vectors may result in a larger screen-space AABB.
Vector3 vx = new Vector3(1, 0, 0);
Vector3 vy = new Vector3(0, 1, 0);
Vector3 vz = new Vector3(0, 0, 1);
bound.center = positionVS;
bound.boxAxisX = vx * range;
bound.boxAxisY = vy * range;
bound.boxAxisZ = vz * range;
bound.scaleXY = 1.0f;
bound.radius = range;
// fill up ldata
lightVolumeData.lightAxisX = vx;
lightVolumeData.lightAxisY = vy;
lightVolumeData.lightAxisZ = vz;
lightVolumeData.lightPos = bound.center;
lightVolumeData.radiusSq = range * range;
lightVolumeData.featureFlags = (uint)LightFeatureFlags.Punctual;
}
...
}
GenerateLightsScreenSpaceAABBs
在前面介紹FPTL的大致的流程中,
第一步,預先PrePass得到深度圖,這個在一般的延遲管線或者是前向管線也可以拿得到,這裡就不說了。
第二步,清理LightList也一般是解析度更改才會觸發的,這裡先跳過。
第三部,也就是這篇文章的重點。利用前面Jobs得到的SFiniteLightBound,在Scrbound.compute計算燈光在螢幕空間上的AABB。
Scrbound Dispatch
Scrbound Dispatch 64執行緒為1組,4個執行緒計算1盞燈。
//HDRenderPipeline.LightLoop.cs
static void GenerateLightsScreenSpaceAABBs(BuildGPULightListPassData data, CommandBuffer cmd)
{
if (data.totalLightCount != 0)
{
using (new ProfilingScope(cmd, ProfilingSampler.Get(HDProfileId.GenerateLightAABBs)))
{
// With XR single-pass, we have one set of light bounds per view to iterate over (bounds are in view space for each view)
cmd.SetComputeBufferParam(data.screenSpaceAABBShader, data.screenSpaceAABBKernel, HDShaderIDs.g_data, data.convexBoundsBuffer);
cmd.SetComputeBufferParam(data.screenSpaceAABBShader, data.screenSpaceAABBKernel, HDShaderIDs.outputData, data.debugDataReadBackBuffer);
cmd.SetComputeBufferParam(data.screenSpaceAABBShader, data.screenSpaceAABBKernel, HDShaderIDs.g_vBoundsBuffer, data.AABBBoundsBuffer);
ConstantBuffer.Push(cmd, data.lightListCB, data.screenSpaceAABBShader, HDShaderIDs._ShaderVariablesLightList);
const int threadsPerLight = 4; // Shader: THREADS_PER_LIGHT (4)
const int threadsPerGroup = 64; // Shader: THREADS_PER_GROUP (64)
int groupCount = HDUtils.DivRoundUp(data.totalLightCount * threadsPerLight, threadsPerGroup);
cmd.DispatchCompute(data.screenSpaceAABBShader, data.screenSpaceAABBKernel, groupCount, data.viewCount, 1);
}
}
}
//Scrbound.compute
#define MAX_CLIP_VERTS (10)
#define NUM_VERTS (8)
#define NUM_FACES (6)
#define NUM_PLANES (6)
#define THREADS_PER_GROUP (64)
#define THREADS_PER_LIGHT (4) // Set to 1 for debugging
#define LIGHTS_PER_GROUP (THREADS_PER_GROUP / THREADS_PER_LIGHT)
#define VERTS_PER_GROUP (NUM_VERTS * LIGHTS_PER_GROUP)
#define VERTS_PER_THREAD (NUM_VERTS / THREADS_PER_LIGHT)
#define FACES_PER_THREAD DIV_ROUND_UP(NUM_FACES, THREADS_PER_LIGHT)
計算燈光AABB的各個頂點在視錐體的內外情況
1.先拿到剛剛計算的SFiniteLightBound燈光AABB資料
[numthreads(THREADS_PER_GROUP, 1, 1)]
void main(uint threadID : SV_GroupIndex, uint3 groupID : SV_GroupID)
{
const uint t = threadID;
const uint g = groupID.x;
const uint eyeIndex = groupID.y; // Currently, can only be 0 or 1
const uint intraGroupLightIndex = t / THREADS_PER_LIGHT;
const uint globalLightIndex = g * LIGHTS_PER_GROUP + intraGroupLightIndex;
const uint baseVertexOffset = intraGroupLightIndex * NUM_VERTS;
const uint eyeAdjustedInputOffset = GenerateLightCullDataIndex(globalLightIndex, g_iNrVisibLights, eyeIndex);
const SFiniteLightBound cullData = g_data[eyeAdjustedInputOffset];
const float4x4 projMat = g_mProjectionArr[eyeIndex];
const float4x4 invProjMat = g_mInvProjectionArr[eyeIndex];
//gs_CullClipFaceMasks初始化為0,即預設所有的面都在外面
if (t % THREADS_PER_LIGHT == 0)
{
gs_CullClipFaceMasks[intraGroupLightIndex] = 0;
}
// Bounding frustum.
const float3 rbpC = cullData.center.xyz; // View-space
const float3 rbpX = cullData.boxAxisX.xyz; // Pre-scaled
const float3 rbpY = cullData.boxAxisY.xyz; // Pre-scaled
const float3 rbpZ = cullData.boxAxisZ.xyz; // Pre-scaled
const float scale = cullData.scaleXY; // scale.x = scale.y
// Bounding sphere.
const float radius = cullData.radius;
...
}
2.計算各個AABB的頂點,並且把viewSpace的頂點轉換到齊次座標下,
判斷頂點是否在視錐體外部。(0 <= x <= w, 0 <= y <= w, 0 <= z <= w)
若behindMask為0,則說明當前遍歷的頂點在視錐體內部,需要更新當前AABB的ndcAaBbMinPt,ndcAaBbMaxPt
若behindMask不為0,則說明當前遍歷的頂點在視錐體外面,需要記錄點相關的面,留到後面計算相關的面與視錐體的交點,再更新AABB的ndcAaBbMinPt,ndcAaBbMaxPt.
遍歷頂點結束後,透過原子操作InterlockedOr gs_CullClipFaceMasks,同步LDS,就得到了當前燈光所有在視錐體外面的頂點相關的面(需要進一步計算的所有的面)
需要留意intraGroupLightIndex 4個執行緒計算1盞燈,表示4個執行緒平行計算的燈光Index,這裡的InterlockedOr四個執行緒同步的是相同的intraGroupLightIndex。
//VERTS_PER_THREAD=8/4=2,一個執行緒計算兩個頂點,4個執行緒就能算8個頂點,也就是一盞燈
for (i = 0; i < VERTS_PER_THREAD; i++)
{
uint v = i * THREADS_PER_LIGHT + t % THREADS_PER_LIGHT;
// rbpVerts[0] = rbpC - rbpX * scale - rbpY * scale - rbpZ; (-s, -s, -1)
// rbpVerts[1] = rbpC + rbpX * scale - rbpY * scale - rbpZ; (+s, -s, -1)
// rbpVerts[2] = rbpC - rbpX * scale + rbpY * scale - rbpZ; (-s, +s, -1)
// rbpVerts[3] = rbpC + rbpX * scale + rbpY * scale - rbpZ; (+s, +s, -1)
// rbpVerts[4] = rbpC - rbpX - rbpY + rbpZ; (-1, -1, +1)
// rbpVerts[5] = rbpC + rbpX - rbpY + rbpZ; (+1, -1, +1)
// rbpVerts[6] = rbpC - rbpX + rbpY + rbpZ; (-1, +1, +1)
// rbpVerts[7] = rbpC + rbpX + rbpY + rbpZ; (+1, +1, +1)
float3 m = GenerateVertexOfStandardCube(v);
m.xy *= ((v & 4) == 0) ? scale : 1; // X, Y in [-scale, scale]
float3 rbpVertVS = rbpC + m.x * rbpX + m.y * rbpY + m.z * rbpZ;
// Avoid generating (w = 0).
rbpVertVS.z = (abs(rbpVertVS.z) > FLT_MIN) ? rbpVertVS.z : FLT_MIN;
float4 hapVert = mul(projMat, float4(rbpVertVS, 1));
// Warning: the W component may be negative.
// Flipping the -W pyramid by negating all coordinates is incorrect
// and will break both classification and clipping.
// For the orthographic projection, (w = 1).
// Transform the X and Y components: [-w, w] -> [0, w].
hapVert.xy = 0.5 * hapVert.xy + (0.5 * hapVert.w);
// For each vertex, we must determine whether it is within the bounds.
// For culling and clipping, we must know, per culling plane, whether the vertex
// is in the positive or the negative half-space.
uint behindMask = 0; // Initially in front
// Consider the vertex to be inside the view volume if:
// 0 <= x <= w
// 0 <= y <= w <-- include boundary points to avoid clipping them later
// 0 <= z <= w
// w is always valid
// TODO: epsilon for numerical robustness?
//#define NUM_FACES (6) 6/2 一個頂點相關面數為3.
for (uint j = 0; j < (NUM_PLANES / 2); j++)
{
float w = hapVert.w;
behindMask |= (hapVert[j] < 0 ? 1 : 0) << (2 * j + 0); // Planes crossing '0'
behindMask |= (hapVert[j] > w ? 1 : 0) << (2 * j + 1); // Planes crossing 'w'
}
if (behindMask == 0) // Inside?
{
// Clamp to the bounds in case of numerical errors (may still generate -0).
float3 rapVertNDC = saturate(hapVert.xyz * rcp(hapVert.w));
ndcAaBbMinPt = min(ndcAaBbMinPt, float4(rapVertNDC, rbpVertVS.z));
ndcAaBbMaxPt = max(ndcAaBbMaxPt, float4(rapVertNDC, rbpVertVS.z));
}
else // Outside
{
// Mark all the faces of the bounding frustum associated with this vertex.
cullClipFaceMask |= GetFaceMaskOfVertex(v);
}
gs_HapVertsX[baseVertexOffset + v] = hapVert.x;
gs_HapVertsY[baseVertexOffset + v] = hapVert.y;
gs_HapVertsZ[baseVertexOffset + v] = hapVert.z;
gs_HapVertsW[baseVertexOffset + v] = hapVert.w;
gs_BehindMasksOfVerts[baseVertexOffset + v] = behindMask;
}
InterlockedOr(gs_CullClipFaceMasks[intraGroupLightIndex], cullClipFaceMask);
GroupMemoryBarrierWithGroupSync();
測試視錐體八個頂點是否在燈光體積內
若上面經過執行緒同步過後的cullClipFaceMasks!=0,即存在頂點與燈光體積相交,就測試視錐體的八個頂點是否在燈光體積的AABB內。
若在,則將當前測試的頂點也作為視錐體與燈光體積相交的頂點,並用於更新ndcAaBbMinPt,ndcAaBbMaxPt。
// (2) Test the corners of the view volume.
if (cullClipFaceMask != 0)
{
//1.利用之前計算的視角空間的燈光體積center座標以及軸向rbpX,rbpY,rbpZ(cullData.boxAxisX/Y/Z)重構燈光空間矩陣
//2.GenerateVertexOfStandardCube的頂點座標[-1,1],z需要*0.5+0.5轉換到[0,1]這時候的頂點才算是Project Space下的視錐體的八個頂點
//3.視錐體的八個頂點轉換到燈光空間,需要先轉到View Space,再轉換到燈光空間矩陣
// The light is partially outside the view volume.
// Therefore, some of the corners of the view volume may be inside the light volume.
// We perform aggressive culling, so we must make sure they are accounted for.
// The light volume is a special type of cuboid - a right frustum.
// We can exploit this fact by building a light-space projection matrix.
// P_v = T * (R * S) * P_l
// P_l = (R * S)^{-1} * T^{-1} * P_v
float4x4 invTranslateToLightSpace = Translation4x4(-rbpC);
float4x4 invRotateAndScaleInLightSpace = Homogenize3x3(Invert3x3(ScaledRotation3x3(rbpX, rbpY, rbpZ)));
// TODO: avoid full inversion by using unit vectors and passing magnitudes explicitly.
// This (orthographic) projection matrix maps a view-space point to a light-space [-1, 1]^3 cube.
float4x4 lightSpaceMatrix = mul(invRotateAndScaleInLightSpace, invTranslateToLightSpace);
//GPULightType為Spot和ProjectorPyramid的LightSpace才是透視的,
//其他光源(Point,Rectangle,Tube,ProjectorBox)Scale為1
//只關心點光源流程可以不用看
if (scale != 1) // Perspective light space?
{
//(bound.scaleXY = squeeze ? 0.01f : 1.0f;)s為0.01
//e=-1.0202020f;也就是lightSpace的z從[-1, 1]變成[-2,0],把AABB原點往z軸負方向偏移
//n=0.0202020f,f=0.9797979797f,g=0.9797979797f,a(aspect)=1
//PerspectiveProjection4x4返回一個空間投影矩陣,把正交投影變成透視投影矩陣
// Compute the parameters of the perspective projection.
float s = scale;
float e = -1 - 2 * (s * rcp(1 - s)); // Signed distance from the origin to the eye
float n = -e - 1; // Distance from the eye to the near plane
float f = -e + 1; // Distance from the eye to the far plane
float g = f; // Distance from the eye to the projection plane
float4x4 invTranslateEye = Translation4x4(float3(0, 0, -e));
float4x4 perspProjMatrix = PerspectiveProjection4x4(1, g, n, f);
lightSpaceMatrix = mul(mul(perspProjMatrix, invTranslateEye), lightSpaceMatrix);
}
for (i = 0; i < VERTS_PER_THREAD; i++)
{
uint v = i * THREADS_PER_LIGHT + t % THREADS_PER_LIGHT;
float3 rapVertCS = GenerateVertexOfStandardCube(v);
rapVertCS.z = rapVertCS.z * 0.5 + 0.5; // View's projection matrix MUST map Z to [0, 1]
float4 hbpVertVS = mul(invProjMat, float4(rapVertCS, 1)); // Clip to view space
float4 hapVertLS = mul(lightSpaceMatrix, hbpVertVS); // View to light space
// Consider the vertex to be inside the light volume if:
// -w < x < w
// -w < y < w <-- exclude boundary points, as we will not clip using these vertices
// -w < z < w <-- assume that Z-precision is not very important here
// 0 < w
// TODO: epsilon for numerical robustness?
bool inside = Max3(abs(hapVertLS.x), abs(hapVertLS.y), abs(hapVertLS.z)) < hapVertLS.w;
//若視錐體的八個頂點也在燈光體積內部,則將當前測試的頂點也作為視錐體與燈光體積相交的頂點,
//並用於更新ndcAaBbMinPt,ndcAaBbMaxPt。
if (inside)
{
float3 rapVertNDC = float3(rapVertCS.xy * 0.5 + 0.5, rapVertCS.z);
float rbpVertVSz = hbpVertVS.z * rcp(hbpVertVS.w);
ndcAaBbMinPt = min(ndcAaBbMinPt, float4(rapVertNDC, rbpVertVSz));
ndcAaBbMaxPt = max(ndcAaBbMaxPt, float4(rapVertNDC, rbpVertVSz));
}
}
}
InterlockedAnd(gs_CullClipFaceMasks[intraGroupLightIndex], cullClipFaceMask);
GroupMemoryBarrierWithGroupSync();
cullClipFaceMask = gs_CullClipFaceMasks[intraGroupLightIndex];
用CullClipFaceMasks計算相交的面
在前面第一步的時候LDS的gs_BehindMasksOfVerts記錄了燈光體積頂點在視錐體的內外情況。
而CullClipFaceMask只是將在視錐體外面的頂點相關的面記錄了下來,
在記錄的面裡面,可能含有四個頂點都在視錐體外部的面(用gs_BehindMasksOfVerts判斷),這種面需要剔除掉。
//////////////////////////////////Tool Functions/////////////////////////////////////////////////////////
// offset:提取位置的偏移量
// numBits:提取Bit的數量
//ex: data = 1111 1111 1111 0101 0101 0101 , offset = 12 , numBits = 12
// mask = 1111 1111 1111
// data >> 12 = 0000 0000 0000 1111 1111 1111
// result = 1111 1111 1111
// Unsigned integer bit field extraction.
// Note that the intrinsic itself generates a vector instruction.
// Wrap this function with WaveReadLaneFirst() to get scalar output.
uint BitFieldExtract(uint data, uint offset, uint numBits)
{
uint mask = (1u << numBits) - 1u;
return (data >> offset) & mask;
}
#define VERT_LIST_LEFT ((4) << 9 | (6) << 6 | (2) << 3 | (0) << 0)
#define VERT_LIST_RIGHT ((3) << 9 | (7) << 6 | (5) << 3 | (1) << 0)
#define VERT_LIST_BOTTOM ((1) << 9 | (5) << 6 | (4) << 3 | (0) << 0)
#define VERT_LIST_TOP ((6) << 9 | (7) << 6 | (3) << 3 | (2) << 0)
#define VERT_LIST_FRONT ((2) << 9 | (3) << 6 | (1) << 3 | (0) << 0)
#define VERT_LIST_BACK ((5) << 9 | (7) << 6 | (6) << 3 | (4) << 0)
//allVertLists
//VERT_LIST_RIGHT 右邊面對應的頂點序列,有4個點,順時針3 7 5 1
//頂點序號最大為7,所以只需要3位就可以表示點的序號,所以一共需要3*4=12bit表示一個面的頂點序列
//allVertLists[f >> 1]
//f>>1 即f/2 選擇allVertLists的xyz
//left,right 0/2,1/2 => 0 0
//top,bottom 2/2,3/2 => 1 1
//back,front 4/2,5/2 => 2 2
//12 * (f & 1)
//f&1判斷face的奇偶數,控制是否要偏移12bit
//0&1=0 1&1=1
//2&1=0 3&1=1
//偶數需要offset,而奇數不需要offset直接拿就行.
//比如top=>2 拿的是allVertLists[1]中的前面12bit,需要Offset 12bit
//bottom=>3 奇數不需要Offset
uint GetVertexListOfFace(uint f)
{
// Warning: don't add 'static' here unless you want really bad code gen.
const uint3 allVertLists = uint3((VERT_LIST_RIGHT << 12) | VERT_LIST_LEFT,
(VERT_LIST_TOP << 12) | VERT_LIST_BOTTOM,
(VERT_LIST_BACK << 12) | VERT_LIST_FRONT);
return BitFieldExtract(allVertLists[f >> 1], 12 * (f & 1), 12);
}
bool TryCullFace(uint f, uint baseOffsetVertex)
{
//FACE_MASK=>((1 << NUM_FACES) - 1) => (1<<6-1) => 111111
uint cullMaskOfFace = FACE_MASK; // Initially behind
uint vertListOfFace = GetVertexListOfFace(f);
for (uint j = 0; j < 4; j++)
{
uint v = BitFieldExtract(vertListOfFace, 3 * j, 3);
//BehindMask記錄了對應點對於視錐體Volume的內外情況
//11 11 11代表完全在Volume外面,00 00 00代表完全在Volume裡面
//如果有一個點是00 00 00則說明至少有一個點在Volume裡面那這個面就不能被剔除掉即cullMaskOfFace != 0返回false
//
// Consider the vertex to be inside the view volume if:
// 0 <= x <= w
// 0 <= y <= w <-- include boundary points to avoid clipping them later
// 0 <= z <= w
// Non-zero if ALL the vertices are behind any of the planes.
cullMaskOfFace &= gs_BehindMasksOfVerts[baseOffsetVertex + v];
}
return (cullMaskOfFace != 0);
}
//減去前面n個bit==1的位之後,返回第一個bit==1的index
//ex: value = 111100 n = 3
// result = 5
//firstbitlow(111100) = 2
uint NthBitLow(uint value, uint n)
{
uint b = -1; // Consistent with the behavior of firstbitlow()
uint c = countbits(value);
if (n < c) // Validate inputs
{
uint r = n + 1; // Compute the number of remaining bits
do
{
uint f = firstbitlow(value >> (b + 1)); // Find the next set bit
b += f + r; // Make a guess (assume all [b+f+1,b+f+r] bits are set)
c = countbits(value << (32 - (b + 1))); // Count the number of bits actually set
r = (n + 1) - c; // Compute the number of remaining bits
}
while (r > 0);
}
return b;
}
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// (3) Cull the faces.
{
const uint cullFaceMask = cullClipFaceMask;
//countbits返回cullFaceMask有多少位為1,第一步時標記了幾個面需要進行剔除處理
const uint numFacesToCull = countbits(cullFaceMask); // [0, 6]
//FACES_PER_THREAD 6/4 = 2 每個執行緒計算兩個面
for (i = 0; i < FACES_PER_THREAD; i++)
{
uint n = i * THREADS_PER_LIGHT + t % THREADS_PER_LIGHT;
if (n < numFacesToCull)
{
//減去前面n個bit==1的位之後,返回第一個bit==1的index
//即f為cullFaceMask中所有bit==1的面序號
uint f = NthBitLow(cullFaceMask, n);
//若面相關的4個點都在視錐體內(cullMaskOfFace != 0),說明當前面和視錐體不存在相交,可以被剔除。
//只要有一個點是在視錐體外(cullMaskOfFace == 0)而且之前第一步的計算被標記的面是存在點是在視錐體內部的,
//說明當前面不能被完全剔除,面所對應的Bit需要保留。
if (TryCullFace(f, baseVertexOffset))
{
cullClipFaceMask ^= 1 << f; // Clear the bit
}
}
}
}
裁剪相交的面,並計算相交的頂點
上一步得到cullClipFaceMask即燈光體積和視錐體體積相交的面。那麼就需要進一步計算面與面之間相交的頂點,然後再更新ndcAaBbMinPt,ndcAaBbMaxPt。
這裡跟上一步一樣遍歷cullClipFaceMask中標記的面
// (4) Clip the faces.
{
const uint clipFaceMask = cullClipFaceMask;
const uint numFacesToClip = countbits(clipFaceMask); // [0, 6]
for (i = 0; i < FACES_PER_THREAD; i++)
{
uint n = i * THREADS_PER_LIGHT + t % THREADS_PER_LIGHT;
if (n < numFacesToClip)
{
uint f = NthBitLow(clipFaceMask, n);
uint srcBegin, srcSize;
ClipFaceAgainstViewVolume(f, baseVertexOffset,
srcBegin, srcSize, t);
UpdateAaBb(srcBegin, srcSize, t, g_isOrthographic != 0, invProjMat,
ndcAaBbMinPt, ndcAaBbMaxPt);
}
}
}
ClipFaceAgainstViewVolume
RingBuffer
RingBufer是用來儲存面被裁剪之後的頂點,兩平面相交輸入進RingBuffer除了原本的頂點之外,還會加入新的頂點,為了保證原本的求交過程正常進行,每個執行緒都需要獨立的Buffer空間,所以Buffer長度為10*64
// Clipping a plane by a cube may produce a hexagon (6-gon).
// Clipping a hexagon by 4 planes may produce a decagon (10-gon).
#define MAX_CLIP_VERTS (10)
#define THREADS_PER_GROUP (64)
// ----------- Use LDS for the vertex ring buffer as otherwise on FXC we create register spilling
groupshared float gs_VertexRingBufferX[MAX_CLIP_VERTS * THREADS_PER_GROUP];
groupshared float gs_VertexRingBufferY[MAX_CLIP_VERTS * THREADS_PER_GROUP];
groupshared float gs_VertexRingBufferZ[MAX_CLIP_VERTS * THREADS_PER_GROUP];
groupshared float gs_VertexRingBufferW[MAX_CLIP_VERTS * THREADS_PER_GROUP];
float4 GetFromRingBuffer(uint threadIdx, uint entry)
{
float4 outV;
outV.x = gs_VertexRingBufferX[threadIdx * MAX_CLIP_VERTS + entry];
outV.y = gs_VertexRingBufferY[threadIdx * MAX_CLIP_VERTS + entry];
outV.z = gs_VertexRingBufferZ[threadIdx * MAX_CLIP_VERTS + entry];
outV.w = gs_VertexRingBufferW[threadIdx * MAX_CLIP_VERTS + entry];
return outV;
}
void WriteToRingBuffer(uint threadIdx, uint entry, float4 value)
{
gs_VertexRingBufferX[threadIdx * MAX_CLIP_VERTS + entry] = value.x;
gs_VertexRingBufferY[threadIdx * MAX_CLIP_VERTS + entry] = value.y;
gs_VertexRingBufferZ[threadIdx * MAX_CLIP_VERTS + entry] = value.z;
gs_VertexRingBufferW[threadIdx * MAX_CLIP_VERTS + entry] = value.w;
}
///////////////////////////////////////////////////////////////////////////////////////
RingBuffer初始化
void ClipFaceAgainstViewVolume(uint f, uint baseVertexOffset,
out uint srcBegin, out uint srcSize,
uint threadIdx)
{
//RingBuffer儲存當前面的所有頂點(包括裁剪產生的頂點,只需要記錄Begin的Index和頂點數量就可以反覆利用RingBuffer的空間)
srcBegin = 0;
srcSize = 4;
uint clipMaskOfFace = 0; // Initially in front
//得到面相關的頂點列表
uint vertListOfFace = GetVertexListOfFace(f);
//先把第一步存放在gs_BehindMasksOfVerts中的頂點拿出來,然後放入到RingBuffer中。
for (uint j = 0; j < 4; j++)
{
//提取面相關的頂點序號
uint v = BitFieldExtract(vertListOfFace, 3 * j, 3);
//gs_BehindMasksOfVerts記錄了對應點對於視錐體六個面的內外情況,clipMaskOfFace則記錄當前面對於視錐體六個面的相交的情況,
//ex: 點a:000010 點b:000110 點c:000000 點d:000100
//clipMaskOfFace:000110
//clipMaskOfFace==0標誌為當前面完全在視錐體內部
// Non-zero if ANY of the vertices are behind any of the planes.
clipMaskOfFace |= gs_BehindMasksOfVerts[baseVertexOffset + v];
//寫入RingBuffer中
// Not all edges may require clipping. However, filtering the vertex list
// is somewhat expensive, so we currently don't do it.
WriteToRingBuffer(threadIdx, j, float4(gs_HapVertsX[baseVertexOffset + v], gs_HapVertsY[baseVertexOffset + v], gs_HapVertsZ[baseVertexOffset + v], gs_HapVertsW[baseVertexOffset + v]));
//vertRingBuffer[j].x = gs_HapVertsX[baseVertexOffset + v];
//vertRingBuffer[j].y = gs_HapVertsY[baseVertexOffset + v];
//vertRingBuffer[j].z = gs_HapVertsZ[baseVertexOffset + v];
//vertRingBuffer[j].w = gs_HapVertsW[baseVertexOffset + v];
}
// Sutherland-Hodgeman polygon clipping algorithm.
// It works by clipping the entire polygon against one clipping plane at a time.
while (clipMaskOfFace != 0)
{
//接下來就是遍歷燈光體積與視錐體平面相交的平面
uint p = firstbitlow(clipMaskOfFace);
uint dstBegin, dstSize;
ClipPolygonAgainstPlane(p, srcBegin, srcSize, threadIdx, dstBegin, dstSize);
srcBegin = dstBegin;
srcSize = dstSize;
clipMaskOfFace ^= 1 << p; // Clear the bit to continue using firstbitlow()
}
}
AABB平面與視錐體平面求交(ClipPolygonAgainstPlane)
這一步就是AABB平面與視錐體平面求交併把求交後的頂點加入到RingBuffer中
struct ClipVertex
{
float4 pt; // Homogeneous coordinate after perspective
float bc; // Boundary coordinate with respect to the plane 'p'
};
ClipVertex CreateClipVertex(uint p, float4 v)
{
bool evenPlane = (p & 1) == 0;
//不同Clip Plane對應的軸分量:0>>1=0 1>>1=0 2>>1=1 3>>1=1 ....
//01=>0 [left/right:x]
//23=>1 [bottom/top:y]
//45=>2 [front/back:z]
float c = v[p >> 1];
float w = v.w;
ClipVertex cv;
cv.pt = v;
//clip space:[0,w]在視錐體內 所以視錐體的ClipPlane的left,bottom,front Face被視作為對應軸上的原點
//0 2 4:c left=0 bottom=0 front=0
//1 3 5:w-c right=w top=w back=w
//left_plane:0-----right_plane:w-----v:c
//bc為當前點到視錐體平面的投影距離
cv.bc = evenPlane ? c : w - c; // dot(PlaneEquation, HapVertex);
return cv;
}
float4 IntersectEdgeAgainstPlane(ClipVertex v0, ClipVertex v1)
{
//用bc(點到視錐體平面的投影距離)插值求得中間的頂點位置。
float alpha = saturate(v0.bc * rcp(v0.bc - v1.bc)); // Guaranteed to lie between 0 and 1
return lerp(v0.pt, v1.pt, alpha);
}
void ClipPolygonAgainstPlane(uint p, uint srcBegin, uint srcSize,
uint threadIdx,
out uint dstBegin, out uint dstSize)
{
//滑動視窗
//dstBegin標記下一次平面判交的開始Index.
//dstSize記錄平面需要判交頂點的數量
dstBegin = srcBegin + srcSize; // Start at the end; we don't use modular arithmetic here
dstSize = 0;
ClipVertex tailVert = CreateClipVertex(p, GetFromRingBuffer(threadIdx, (srcBegin + srcSize - 1) % MAX_CLIP_VERTS));
//防止越界
uint modSrcIdx = srcBegin % MAX_CLIP_VERTS;
uint modDstIdx = dstBegin % MAX_CLIP_VERTS;
//遍歷RingBuffer讀取放進去的頂點,並轉換成ClipVertex
for (uint j = srcBegin; j < (srcBegin + srcSize); j++)
{
float4 v = GetFromRingBuffer(threadIdx, modSrcIdx);
ClipVertex leadVert = CreateClipVertex(p, v);
// Execute Blinn's line clipping algorithm.
// Classify the line segment. 4 cases:
// 0. v0 out, v1 out -> add nothing
// 1. v0 in, v1 out -> add intersection
// 2. v0 out, v1 in -> add intersection, add v1
// 3. v0 in, v1 in -> add v1
// (bc >= 0) <-> in, (bc < 0) <-> out. Beware of -0.
//bc>=0即點到視錐體平面的投影距離大於0,在平面正側.
//有In有Out,判定當前點為相交點,求交點並放入RingBuffer,留到下一次其他平面與RingBuffer進行求交
if ((tailVert.bc >= 0) != (leadVert.bc >= 0))
{
// The line segment is guaranteed to cross the plane.
float4 clipVert = IntersectEdgeAgainstPlane(tailVert, leadVert);
WriteToRingBuffer(threadIdx, modDstIdx, clipVert);
dstSize++;
modDstIdx++;
modDstIdx = (modDstIdx == MAX_CLIP_VERTS) ? 0 : modDstIdx;
}
//在平面正側的點依舊放入RingBuffer,留到下一次其他平面與RingBuffer進行求交
if (leadVert.bc >= 0)
{
WriteToRingBuffer(threadIdx, modDstIdx, leadVert.pt);
dstSize++;
modDstIdx++;
modDstIdx = (modDstIdx == MAX_CLIP_VERTS) ? 0 : modDstIdx;
}
modSrcIdx++;
modSrcIdx = (modSrcIdx == MAX_CLIP_VERTS) ? 0 : modSrcIdx;
tailVert = leadVert; // Avoid recomputation and overwriting the vertex in the ring buffer
}
}
UpdateAaBb
用上一步計算求交得到的RingBuffer更新ndcAaBbMaxPt/ndcAaBbMinPt
void UpdateAaBb(uint srcBegin, uint srcSize, uint threadIdx,
bool isOrthoProj, float4x4 invProjMat,
inout float4 ndcAaBbMinPt, inout float4 ndcAaBbMaxPt)
{
//滑動視窗遍歷RingBuffer
uint modSrcIdx = srcBegin % MAX_CLIP_VERTS;
for (uint j = srcBegin; j < (srcBegin + srcSize); j++)
{
float4 hapVert = GetFromRingBuffer(threadIdx, modSrcIdx);
//透除轉換到NDC
// Clamp to the bounds in case of numerical errors (may still generate -0).
float3 rapVertNDC = saturate(hapVert.xyz * rcp(hapVert.w));
float rbpVertVSz = hapVert.w;
//正交的時候w=1,需要逆變換回去
if (isOrthoProj) // Must replace (w = 1)
{
rbpVertVSz = dot(invProjMat[2], hapVert);
}
//更新ndcAaBbMaxPt/ndcAaBbMinPt
ndcAaBbMinPt = min(ndcAaBbMinPt, float4(rapVertNDC, rbpVertVSz));
ndcAaBbMaxPt = max(ndcAaBbMaxPt, float4(rapVertNDC, rbpVertVSz));
modSrcIdx++;
modSrcIdx = (modSrcIdx == MAX_CLIP_VERTS) ? 0 : modSrcIdx;
}
}
用BoundingSphere計算NDC上的RectMin/RectMax
這裡需要求得BoundingSphere在XOZ/YOZ投影平面上過原點(CameraPosition)的切線OB和OD。
透過
\(cross(OB' , OC') = |OB'| * |OC'| * Sin[a']\)
\(OB' . OC' = |OB'| *|OC'|* Cos[a'].\)
解得:
\(b.z * c.x - b.x * c.z = |OB'| * |OC'| * Sin[a']\)
\(b.x * c.x + b.z * c.z = |OB'| * |OC'| * Cos[a']\)
而實際上不需要求得B的座標只需要求得x/z(y/z)的比值因為透視投影矩陣只是對xy軸做縮放,轉換到齊次座標還要做透除(除以Z)
//https://www.zhihu.com/question/289794588/answer/466643632
所以等式也就可以化成
\(令(z=t*b.z,x=t*b.x,t=|OC'|/|OB'|)\)
\(z * c.x - x * c.z = |OC'|^3 * Sin[a']\)
\(x * c.x + z * c.z = |OC'|^3 * Cos[a']\)
\(x = -c.z * r + c.x * |OB'|\)
\(z = c.x * r + c.z * |OB'|\)
\(cross(OD' , OC') = |OD'| * |OC'| * Sin[a']\)
\(OD' . OD' = |OD'| *|OC'|* Cos[a'].\)
同理得:
\(x = c.z * r + c.x * |OB'|\)
\(z = -c.x * r + c.z * |OB'|\)
float2 ComputeBoundsOfSphereOnProjectivePlane(float3 C, float r, float projScale, float projOffset)
{
float xMin, xMax;
// See sec. 8.2.1 of https://foundationsofgameenginedev.com/#fged2 for an alternative derivation.
// Goal: find the planes that pass through the origin O, bound the sphere, and form
// an axis-aligned rectangle at the intersection with the projection plane.
// Solution (for the X-coordinate):
// The intersection of the bounding planes and the projection plane must be vertical lines,
// which means that the bounding planes must be tangent to the Y-axis.
// The bounding planes must be also tangent to the sphere.
// Call the intersection points of the two vertical bounding planes and the bounding
// sphere B and D. Assume that B is on the left of C; D is on the right of C.
// Note that C may be behind the origin, so the same generally goes for B and D.
// BC is normal w.r.t. the bounding plane, so it is normal w.r.t. the Y-axis; |BC| = r.
// As a consequence, it lies in a plane parallel to the the O-X-Z plane.
// Consider B'C', which is an orthogonal projection of BC onto the actual O-X-Z plane.
// (Imagine sliding the sphere up or down between the bounding planes).
// We then consider a triangle OB'C' that lies entirely in the O-X-Z plane.
// The coordinates are: OB' = (b.x, 0, b.z), OC' = (c.x, 0, c.z).
float3 B, D;
// OBC is a right triangle. So is OB'C'.
// |BC| = |B'C'| = r.
// |OB'|^2 = |OC'|^2 - |B'C'|^2.
float lenSqOC_ = math.dot(C.xz, C.xz);
float lenSqOB_ = lenSqOC_ - r * r;
// If |OB'| = 0 or |OC'| = 0, the bounding planes tangent to the sphere do not exist.
if (lenSqOB_ > 0)
{
float lenOB_ = math.sqrt(lenSqOB_);
// |OB' x OC'| = |OB'| * |OC'| * Sin[a'].
// OB' . OC' = |OB'| * |OC'| * Cos[a'].
// We can determine Sin[a'] = |B'C'| / |OC'| = R / |OC'|.
// Cos[a'] = Sqrt[1 - Sin[a']^2].
// (OB' x OC') points along Y.
// (OB' x OC').y = b.z * c.x - b.x * c.z.
// Therefore, b.z * c.x - b.x * c.z = |OB'| * |OC'| * Sin[a'].
// OB' . OC' = b.x * c.x + b.z * c.z = |OB'| * |OC'| * Cos[a'].
// Since we don't care about the scale, and |OB'| != 0 and |OC'| != 0,
// we can equivalently solve
// z * c.x - x * c.z = |OC'|^3 * Sin[a'].
// x * c.x + z * c.z = |OC'|^3 * Cos[a'].
// With 2 equations and 2 unknowns, we can easily solve this linear system.
// The solutions is
// x = -c.z * r + c.x * |OB'|.
// z = c.x * r + c.z * |OB'|.
B.x = C.x * lenOB_ - (C.z * r);
B.z = C.z * lenOB_ + (C.x * r);
// (OD' x OC') points along Y.
// (OD' x OC').y = d.z * c.x - d.x * c.z.
// We must solve
// z * c.x - x * c.z = -|OC'|^3 * Sin[a'].
// x * c.x + z * c.z = |OC'|^3 * Cos[a'].
// The solution is
// x = c.z * r + c.x * |OB'|.
// z = -c.x * r + c.z * |OB'|.
D.x = C.x * lenOB_ + (C.z * r);
D.z = C.z * lenOB_ - (C.x * r);
// We can transform OB and OD as direction vectors.
// For the simplification below, see OptimizeProjectionMatrix.
float rapBx = (B.x * math.rcp(B.z)) * projScale + projOffset;
float rapDx = (D.x * math.rcp(D.z)) * projScale + projOffset;
// One problem with the above is that this direction may, for certain spheres,
// point behind the origin (B.z <= 0 or D.z <= 0).
// At this point we know that the sphere at least *partially* in front of the origin,
// and that it is we are not inside the sphere, so there is at least one valid
// plane (and one valid direction). We just need the second direction to go "in front"
// of the first one to extend the bounding box.
xMin = (B.z > 0) ? rapBx : -(float) 0x7F800000;
xMax = (D.z > 0) ? rapDx : (float) 0x7F800000;
}
else
{
// Conservative estimate (we do not cull the bounding sphere using the view frustum).
xMin = -1;
xMax = 1;
}
return new float2(xMin, xMax);
}
// (5) Compute the AABB of the bounding sphere.
if (radius > 0)
{
// Occasionally, an intersection of AABBs of a bounding sphere and a bounding frustum
// results in a tighter AABB when compared to using the AABB of the frustum alone.
// That is the case (mostly) for sphere-capped spot lights with very wide angles.
// Note that, unfortunately, it is not quite as tight as an AABB of a CSG intersection
// of a sphere and frustum. Also note that the algorithm below doesn't clip the bounding
// sphere against the view frustum before computing the bounding box, simply because it is
// too hard/expensive. I will leave it as a TODO in case someone wants to tackle this problem.
if ((rbpC.z + radius) > 0) // Is the sphere at least *partially* in front of the origin?
{
ndcAaBbMinPt.w = max(ndcAaBbMinPt.w, rbpC.z - radius);
ndcAaBbMaxPt.w = min(ndcAaBbMaxPt.w, rbpC.z + radius);
// Computing the 'z' component for an arbitrary projection matrix is hard, so we don't do it.
// See sec. 8.2.2 of https://foundationsofgameenginedev.com/#fged2 for a solution.
float2 rectMin, rectMax;
// For the 'x' and 'y' components, the solution is given below.
//如果是正交投影
if (g_isOrthographic)
{
// Compute the center and the extents (half-diagonal) of the bounding box.
float2 center = mul(projMat, float4(rbpC.xyz, 1)).xy;
float2 extents = mul(projMat, float4(radius.xx, 0, 0)).xy;
rectMin = center - extents;
rectMax = center + extents;
}
else // Perspective
{
//ComputeBoundsOfSphereOnProjectivePlane函式中只用xz分量
float2 xBounds = ComputeBoundsOfSphereOnProjectivePlane(rbpC.xxz, radius, projMat._m00, projMat._m02); // X-Z plane
float2 yBounds = ComputeBoundsOfSphereOnProjectivePlane(rbpC.yyz, radius, projMat._m11, projMat._m12); // Y-Z plane
rectMin = float2(xBounds.r, yBounds.r);
rectMax = float2(xBounds.g, yBounds.g);
}
// Transform to the NDC coordinates.
rectMin = rectMin * 0.5 + 0.5;
rectMax = rectMax * 0.5 + 0.5;
// Note: separating the X- and Y-computations across 2 threads is not worth it.
ndcAaBbMinPt.xy = max(ndcAaBbMinPt.xy, rectMin);
ndcAaBbMaxPt.xy = min(ndcAaBbMaxPt.xy, rectMax);
}
}
計算出最終的ScrBound(RectMin,RectMax)
最後將計算好的ndcAaBbMinPt以及ndcAaBbMaxPt存放到g_vBoundsBuffer裡面,Scrbound的流程就結束了。
正常來說,eyeIndex在沒開VR的時候為0,開了之後eyeIndex=()
所以g_vBoundsBuffer的佈局就是[light0.min,light1.min.....][light0.max,light1.max.....]
// The returned values are used to index into our AABB screen space bounding box buffer
// Usually named g_vBoundsBuffer. The two values represent the min/max indices.
ScreenSpaceBoundsIndices GenerateScreenSpaceBoundsIndices(uint lightIndex, uint numVisibleLights, uint eyeIndex)
{
// In the monoscopic mode, there is one set of bounds (min,max -> 2 * g_iNrVisibLights)
// In stereo, there are two sets of bounds (leftMin, leftMax, rightMin, rightMax -> 4 * g_iNrVisibLights)
const uint eyeRelativeBase = eyeIndex * 2 * numVisibleLights;
ScreenSpaceBoundsIndices indices;
indices.min = eyeRelativeBase + lightIndex;
indices.max = indices.min + numVisibleLights;
return indices;
}
if ((globalLightIndex < (uint)g_iNrVisibLights) && (t % THREADS_PER_LIGHT == 0)) // Avoid bank conflicts
{
// For stereo, we have two sets of lights. Therefore, each eye has a set of mins
// followed by a set of maxs, and each set is equal to g_iNrVisibLights.
const ScreenSpaceBoundsIndices eyeAdjustedOutputOffsets = GenerateScreenSpaceBoundsIndices(globalLightIndex, g_iNrVisibLights, eyeIndex);
g_vBoundsBuffer[eyeAdjustedOutputOffsets.min] = ndcAaBbMinPt;
g_vBoundsBuffer[eyeAdjustedOutputOffsets.max] = ndcAaBbMaxPt;
}