Unity的Deferred Shading

異次元的歸來發表於2021-11-01

原文網址 : https://www.cnblogs.com/back-to-the-past/p/15495584.html

什麼是Deferred Shading

Unity自身除了支援前向渲染之外，還支援延遲渲染。Unity的rendering path可以通過Edit/Project Settings中的Graphics進行全域性設定：

除此之外，我們還可以在Main Camera中進行覆蓋設定：

需要注意的是，Unity的延遲渲染不支援MSAA。具體原因可以參考[2]。

延遲渲染主要是為了解決前向渲染在多光源場景下效率低的問題，這裡的延遲指的是將光照部分延遲到後面再進行計算。在前向渲染中，為了計算每個pixel的最終顏色，多個光源要跑多次light pass，將每個光源計算的結果進行混合。每個light pass都會重複計算一遍pixel的幾何資訊，比如normal，diffuse，specular等，這實際上是沒有必要的，只要計算一遍，快取起來就可以了。除此之外，在不考慮early-z的情況下，深度測試是在fragment shader之後進行的，那麼必定存在大量不可見的pixel，都跑了一遍複雜的light pass計算。延遲渲染的實現，就是預先多一個geometry pass，利用深度測試，將不可見的pixel剔除，同時使用MRT(nultiple render targets)，將pixel的幾何資訊，分別儲存到不同的G-Buffer中，這樣在light pass的時候，直接取樣G-Buffer就可以進行光照計算了。

說了這麼多，不如來對比一下同一個場景下前向渲染和延遲渲染的draw call數量：

如圖所示，這個場景包含兩個平行光源。先看前向渲染：

總共有457個draw call，首先為了繪製平行光陰影的screen space shadow map，需要對場景跑一遍depth pass，然後對兩個平行光源，依次繪製shadow map，進行陰影收集，最後對場景中受光照影響的物體，分別跑一遍forward base的light pass和forward add的light pass。

那麼再看下延遲渲染：

可以發現此時只有329個draw call了，Unity首先對場景跑了一遍geometry pass，繪製G-Buffer，然後將該階段的深度快取拷貝到depth buffer中，再經過一個reflections相關的pass，繪製反射資訊，就到了light pass階段。light pass中的繪製shadow map的過程與前向渲染類似，先繪製再collect，只不過少了depth pass，這是因為我們在geometry pass之後，已經有了depth buffer了。可以看到，真正負責繪製光源著色資訊的只有2個draw call，一個光源各一個。

G-Buffer

現在，讓我們以一個擁有1個平行光源，和3個反射探針的場景為例，來深入其中，一探究竟：

要想讓我們自定義的shader支援延遲渲染，就必須要設定LightMode為Deferred，而且只有GPU支援MRT（multiple render targets）時延遲渲染才有效。另外它不能是transparent的，transparent的物體會被Unity強行走前向渲染的流程。

		Pass {
			Tags {
				"LightMode" = "Deferred"
			}

			CGPROGRAM

			#pragma target 3.0
			#pragma exclude_renderers nomrt

			...

			ENDCG
		}

那問題來了，如果沒有這個LightMode的pass會怎麼樣？Unity將不會對這些物體執行geometry pass，還是會走正常的前向渲染的流程，並且還會在geometry pass之後，為這些物體跑一遍depth pass，如圖所示：

Unity的延遲渲染需要4個G-buffer。因此geometry pass的fragment shader的輸出需要定義如下：

struct FragmentOutput {
		float4 gBuffer0 : SV_Target0;
		float4 gBuffer1 : SV_Target1;
		float4 gBuffer2 : SV_Target2;
		float4 gBuffer3 : SV_Target3;
};

gBuffer0是ARGB32格式的texture，rgb通道儲存的是diffuse資訊，a通道儲存的是occlusion資訊；

gBuffer1是ARGB32格式的texture，rgb通道儲存的是specular資訊，a通道儲存的是roughness資訊；

gBuffer2是ARGB2101010格式的texture，rgb通道各佔10位，a通道只佔2位，它的rgb通道儲存的是normal資訊，a通道未被使用；

gBuffer3根據是否開啟HDR，有不同的格式，在未開啟HDR時，是ARGB2101010格式的texture，而在開啟HDR時，是ARGBHalf格式的texture，即每個通道佔16位。這個buffer就是用來儲存場景中的各種光照資訊。這裡的光照資訊主要是自發光，間接的環境光，而不包括場景中光源的直接光照，畢竟光源的光照計算是延遲到後面再去做的。另外還有一點要注意的是，在未開啟HDR時，gBuffer3的資訊要以對數的形式進行儲存，意味著我們要在程式碼中進行判斷並轉換：

#pragma multi_compile _ UNITY_HDR_ON

FragmentOutput MyFragmentProgram (Interpolators i) {
    	...
    	FragmentOutput output;
		#if !defined(UNITY_HDR_ON)
			color.rgb = exp2(-color.rgb);
		#endif
		output.gBuffer0.rgb = albedo;
		output.gBuffer0.a = GetOcclusion(i);
		output.gBuffer1.rgb = specularTint;
		output.gBuffer1.a = GetSmoothness(i);
		output.gBuffer2 = float4(i.normal * 0.5 + 0.5, 1);
		output.gBuffer3 = color;
		return output;
}

有了Deferred的shader之後，我們再看下Frame Debug：

我們注意到，除了常規的blend設定和深度設定之外，geometry pass還開啟了模板測試。由於Stencil Comp設定為Always，因此模板測試總是成功的，Stencil Pass設定的是Replace，意味著測試成功時，將把Stencil Ref寫入到模板快取中。寫入時會通過Stencil WriteMask掩碼操作，只寫入mask通過的位。那麼綜上所述，geometry pass除了繪製了一份深度資訊外，還記錄了模板資訊，所有在場景中的可見物體對應pixel的模板值均為192 & 207 = 192。用RenderDoc截幀，得到geometry pass繪製的4個G-Buffer如圖所示：

depth buffer如圖所示，這裡分別展示了buffer此時記錄的深度資訊和模板資訊：

首先我們發現texture是上下顛倒的，這是DirectX紋理座標系的原因。其次，場景中的金屬反射球，在gBuffer0中全黑，而gBuffer1中全白，這是因為反射球的材質將Metallic屬性調到了1，故而只有specular而沒有diffuse。gBuffer3除了skybox全黑的原因是因為場景中沒有間接關照和自發光資訊。模板資訊是符合預期的，即只有出現可見物體的地方儲存了模板資訊，其值為192。這裡得到的depth buffer，會通過RenderDeferred.CopyDepth拷貝一份到名為Deferred Depth的buffer中去，給後面reflections相關的pass使用，這些pass會以不同的方式去修改depth buffer，尤其是模板資訊，因此需要保留一份原始的場景深度資訊，也就是Deferred Depth這個buffer。

Deferred Reflections-Skybox

那麼，我們現在來看下reflections相關的pass。我們知道，在前向渲染中，Unity使用反射探針來實現反射的效果，並且每個物體可以混合不同的反射探針。而在延遲渲染中，不同的反射探針是基於pixel進行混合的，從Frame Debug中可知Unity使用了一個名為DeferredReflections的shader來做這件事：

Unity會先用這個shader繪製一遍skybox的reflection資訊，然後再根據反射探針的重要程度，依次繪製場景中反射探針的reflection資訊。這個shader有兩個pass，由Frame Debug可知當前用的是第1個pass，讓我們先來看下程式碼，從vertex shader看起：

struct unity_v2f_deferred {
    float4 pos : SV_POSITION;
    float4 uv : TEXCOORD0;
    float3 ray : TEXCOORD1;
};

float _LightAsQuad;

unity_v2f_deferred vert_deferred (float4 vertex : POSITION, float3 normal : NORMAL)
{
    unity_v2f_deferred o;
    o.pos = UnityObjectToClipPos(vertex);
    o.uv = ComputeScreenPos(o.pos);
    o.ray = UnityObjectToViewPos(vertex) * float3(-1,-1,1);

    // normal contains a ray pointing from the camera to one of near plane's
    // corners in camera space when we are drawing a full screen quad.
    // Otherwise, when rendering 3D shapes, use the ray calculated here.
    o.ray = lerp(o.ray, normal, _LightAsQuad);

    return o;
}

由上圖可知，在繪製skybox時，_LightAsQuad的值為1，那麼上述程式碼中只需要關注輸入的vertex和normal資訊。用RenderDoc抓幀得到：

可以看出，經過透視變換後的SV_POSITION座標(x,y)分佈在(-1, 1)上，而z分量為1，這恰好是clip座標系中近剪裁面的位置。也就是說，經過vertex shader輸出的頂點，就是表示整個近剪裁面。

再看normal資訊，它其實表示的是相機空間中從相機位置出發到達近剪裁面4個角的射線。那麼有：

\[\textbf{r} = (\pm x,\pm y,z) = (\pm\dfrac{w}{2}, \pm\dfrac{h}{2}, n) \\ tan \dfrac{\theta}{2} = \dfrac{\dfrac{h}{2}}{n} \\ aspect = \dfrac{w}{h} \]

其中，n為相機近剪裁面的距離，\(\theta\)為相機的fov：

截圖可以看出，n為0.3，\(\theta\)為\(\dfrac{\pi}{3}\)。aspect的資訊可以從GBuffer或者Depth Texture的解析度得到：

得到aspect為\(\dfrac{1150}{531}\)。代入上面的公式計算出：

\[\textbf{r} = (\pm0.37511,\pm0.17321,0.3) \]

與RenderDoc中的資訊完全吻合。fragment shader的程式碼如下：

half4 frag (unity_v2f_deferred i) : SV_Target
{
    // Stripped from UnityDeferredCalculateLightParams, refactor into function ?
    i.ray = i.ray * (_ProjectionParams.z / i.ray.z);
    float2 uv = i.uv.xy / i.uv.w;

    // read depth and reconstruct world position
    float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, uv);
    depth = Linear01Depth (depth);
    float4 viewPos = float4(i.ray * depth,1);
    float3 worldPos = mul (unity_CameraToWorld, viewPos).xyz;

    half4 gbuffer0 = tex2D (_CameraGBufferTexture0, uv);
    half4 gbuffer1 = tex2D (_CameraGBufferTexture1, uv);
    half4 gbuffer2 = tex2D (_CameraGBufferTexture2, uv);
    UnityStandardData data = UnityStandardDataFromGbuffer(gbuffer0, gbuffer1, gbuffer2);

    float3 eyeVec = normalize(worldPos - _WorldSpaceCameraPos);
    half oneMinusReflectivity = 1 - SpecularStrength(data.specularColor);

    half3 worldNormalRefl = reflect(eyeVec, data.normalWorld);

    // Unused member don't need to be initialized
    UnityGIInput d;
    d.worldPos = worldPos;
    d.worldViewDir = -eyeVec;
    d.probeHDR[0] = unity_SpecCube0_HDR;
    d.boxMin[0].w = 1; // 1 in .w allow to disable blending in UnityGI_IndirectSpecular call since it doesn't work in Deferred

    float blendDistance = unity_SpecCube1_ProbePosition.w; // will be set to blend distance for this probe
    #ifdef UNITY_SPECCUBE_BOX_PROJECTION
    d.probePosition[0]  = unity_SpecCube0_ProbePosition;
    d.boxMin[0].xyz     = unity_SpecCube0_BoxMin - float4(blendDistance,blendDistance,blendDistance,0);
    d.boxMax[0].xyz     = unity_SpecCube0_BoxMax + float4(blendDistance,blendDistance,blendDistance,0);
    #endif

    Unity_GlossyEnvironmentData g = UnityGlossyEnvironmentSetup(data.smoothness, d.worldViewDir, data.normalWorld, data.specularColor);

    half3 env0 = UnityGI_IndirectSpecular(d, data.occlusion, g);

    UnityLight light;
    light.color = half3(0, 0, 0);
    light.dir = half3(0, 1, 0);

    UnityIndirect ind;
    ind.diffuse = 0;
    ind.specular = env0;

    half3 rgb = UNITY_BRDF_PBS (0, data.specularColor, oneMinusReflectivity, data.smoothness, data.normalWorld, -eyeVec, light, ind).rgb;

    // Calculate falloff value, so reflections on the edges of the probe would gradually blend to previous reflection.
    // Also this ensures that pixels not located in the reflection probe AABB won't
    // accidentally pick up reflections from this probe.
    half3 distance = distanceFromAABB(worldPos, unity_SpecCube0_BoxMin.xyz, unity_SpecCube0_BoxMax.xyz);
    half falloff = saturate(1.0 - length(distance)/blendDistance);

    return half4(rgb, falloff);
}

_ProjectionParams是Unity儲存投影相關的引數，z分量代表遠剪裁面的距離。fragment shader首先取到當前pixel的場景深度，通過Linear01Depth將其轉換到線性空間，函式Linear01Depth考慮了reverse-z的情況，對外輸出結果保持統一，即0永遠是離相機最近，1永遠是離相機最遠。得到線性深度之後，就可以計算出投影到當前pixel離相機最近的物體，位於相機空間的座標。通過座標系轉換，進而能得到物體在世界空間中的座標。我們就是根據該物體的資訊（世界座標，法線，視線向量，反射向量），來從skybox對應的cubemap取樣，計算reflection資訊。因為對於當前pixel而言，該物體是離相機最近的，意味著位於該物體之後的都會被遮擋，對reflection沒有任何貢獻。因此只需要計算離相近最近物體的reflection資訊即可。

從Frame Debug可以發現，skybox對應的反射cube範圍為無窮大，因此所有物體必定位於cube之中，不必考慮物體在cube之外的情況。函式只需要考慮剔除掉光照的diffuse資訊，傳遞到函式UNITY_BRDF_PBS中，只計算specular資訊返回。shader繪製的render target是一個名為Deferred Reflections的texture，這裡也設定了模板測試引數，只有通過模板測試的pixel才能成功寫入texture。這裡的Stencil Ref為128，ReadMask為128，Stencil Comp為Equal，意味著只需要比較第8位的值，只有第8位為1時模板測試才通過。那麼什麼樣的pixel，模板快取第8位的值為1呢？答案就是前面geometry pass繪製到的pixel。geometry pass會把繪製的pixel的模板值設定為192，192 & 128 = 128 & 128，測試通過。這意味著，只有存在可見物體的pixel，才會繪製reflection資訊。這也是合理的，因為如果當前pixel連物體資訊都不存在，就更不可能存在reflection資訊了。

Deferred Reflections-反射探針

接下來的draw call，Unity使用了一個名為StencilWrite的shader進行繪製，該shader程式碼平平無奇，看上去等於什麼也沒做：

Shader "Hidden/Internal-StencilWrite"
{
    SubShader
    {
        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #pragma target 2.0
            #include "UnityCG.cginc"
            struct a2v {
                float4 pos : POSITION;
                UNITY_VERTEX_INPUT_INSTANCE_ID
            };
            struct v2f {
                float4 vertex : SV_POSITION;
                UNITY_VERTEX_OUTPUT_STEREO
            };
            v2f vert (a2v v)
            {
                v2f o;
                UNITY_SETUP_INSTANCE_ID(v);
                UNITY_INITIALIZE_VERTEX_OUTPUT_STEREO(o);
                o.vertex = UnityObjectToClipPos(v.pos);
                return o;
            }
            fixed4 frag () : SV_Target { return fixed4(0,0,0,0); }
            ENDCG
        }
    }
    Fallback Off
}

但是該draw call設定的rasterizer state就有意思了：

首先ColorMask設定成了0，意味著fragment shader輸出的顏色不會寫入到Deferred Reflections這個buffer中。同時，Cull也設定成了Off，意味著物體的正面和背面都會渲染一遍。這裡的Stencil Ref設定為128，Stencil Comp設定為Always，意味著模板測試總是通過的，但是這裡還設定了Stencil ZFail為Invert，也就是深度測試失敗時，需要將模板快取中的值按位取反，寫入到快取中。注意這裡的Stencil WriteMask設定為16，也就是按位取反的結果，只有第5位才會真正寫入到快取中。

那麼，傳入該shader的頂點資訊又是什麼樣的呢？用RenderDoc截幀可知，傳入shader的其實是一個cube，它的中心位於local座標系的原點，大小為1：

但其實，我們更關心的是，這個cube變換到世界座標系之後，它的座標是怎樣的。由Frame Debug中可看到unity_MatrixVP為：

\[\textbf{VP} = \begin{bmatrix} 0.68 & -0.0035 & 0.43 & 0.69 \\ 0.24 & -1.7 & -0.4 & 0.25 \\ 0.00015 & 0.000081 & -0.00024 & 0.3 \\ -0.52 & -0.27 & 0.81 & 10 \end{bmatrix} \]

而實際上經過MVP變換到clip座標系的座標我們是知道的，即SV_POSITION裡的值，那麼矩陣M為：

\[\textbf{VP} \cdot \textbf{M} \cdot v = v' \]

\[\textbf{M} \cdot v = \textbf{VP}^{-1} \cdot v' \]

問題其實就轉換成解線性方程組了，可以解得矩陣M為：

\[\textbf{M} = \begin{bmatrix} 9.01 & 0 & 0 & 0 \\ 0 & 5.01 & 0 & 2.5 \\ 0 & 0 & 9.01 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

當然，其實有了RenderDoc，這一切計算都可以省掉。我們知道這兩個矩陣在shader的vs階段使用，那麼只需定位vs階段用到的const buffer即可：

可以發現，const buffer 1框中的部分恰好對應了Frame Debug中VP矩陣的轉置形式。類似地，const buffer 0的部分對應了M矩陣的轉置形式。有了這個從local座標系轉換到世界座標系的矩陣，我們便能觀察出它所代表的實際意義。對比其中的數值，可以發現該矩陣恰好對應了場景中的一個反射探針：

這個反射探針位於世界座標系的（0，2，0）點，它的包圍box是一個x=9.01，y=5.01，z=9.01的box，而且box的中心點在y方向上有2個單位的偏移量。翻譯成數學語言，就是一個在local座標系的包圍box，經過矩陣M轉換到世界座標系下的座標應該是：

\[p_w = \textbf{M} \cdot p_l = (9.01x, 5.01y+2.5,9.01z,1)^T \]

把local座標系中box的中心（0，0，0）和頂點（+/-0.5，+/-0.5，+/-0.5）代入上式，得到的結果正是世界座標系中box中心和頂點的座標。那麼經過這麼漫長的過程，我們可以得出結論，這個StencilWrite的shader，輸入的頂點資訊就是反射探針的box資訊。

再回到這個shader本身的作用上來，它對box的正面和背面進行繪製，如果場景深度小於box正面的深度，那麼模板測試會ZFail兩次，對當前的模板值連續invert兩次，等於無事發生。如果場景深度大於box背面的深度，那麼模板測試和深度測試都會通過，模板值保持不變，也等於無事發生。但是，如果場景深度大於box正面的深度，而且小於box背面的深度，那麼模板測試只會ZFail一次，當前的模板值就會發生改變，由於~192 & 16 = 16，因此第5位會被寫入1，也就是模板值會從192變成208。換言之，只有位於box內部的物體，對應pixel的模板值會被改寫。那這個shader的作用就很明顯了，它就是為了找到位於反射探針box範圍內的物體，通過新的模板值將其標記，只有這些物體才會使用該反射探針的cubemap進行取樣，繪製reflections資訊。我們也可以使用RenderDoc檢視當前的depth buffer的模板值，來驗證我們的猜想：

場景中反射探針的box大小如圖所示：

可以看出，box內部的模板值和外部是不同的。有了這一標記，Unity繼續使用DeferredReflections這個shader進行繪製。讓我們著重看一下，與前面skybox繪製相比，有哪些不同的地方。

首先，vertex shader使用的_LightAsQuad變成了0，那麼傳給fragement shader的ray分量完全取決於頂點的座標：

    o.ray = UnityObjectToViewPos(vertex) * float3(-1,-1,1);

通過RenderDoc可以發現，這裡傳入的頂點就是前面stencil pass的反射探針的cube。那麼這裡的ray分量為：

\[ray = (-x_v, -y_v, z_v) \]

此時得到的ray分量並非是相機指向cube投影到遠剪裁麵點的射線。fragment shader中會做進一步處理：

    i.ray = i.ray * (_ProjectionParams.z / i.ray.z);

我們知道，Unity的view座標系，可見物體的z座標，一定是負值。那麼通過除以ray.z的操作，可以讓z座標的值反轉：

\[ray = (\dfrac{-x_v}{-|z_v|}, \dfrac{-y_v}{-|z_v|}, 1) \cdot f \]

\[ray = (\dfrac{x_v}{|z_v|}, \dfrac{y_v}{|z_v|}, 1) \cdot f \]

這樣求出的ray分量，就可以代入到後面計算場景深度，轉換到世界座標系，求出被cube覆蓋的區域中離相機最近的物體座標。這裡座標系轉換使用的是unity_CameraToWorld矩陣，這個矩陣接受的view空間的向量，要求z分量為正，而上面的運算剛好滿足這一條件。

此外，與skybox不同的是，反射探針這裡還考慮了blendDistance。blendDistance表示在cube之外的物體也有可能接受到該探針的reflections資訊，blend的程度由blendDistance和物體離cube的距離共同決定。blendDistance在反射探針inspector中可以設定：

blendDistance會對box相關的屬性產生影響。例如把上面box的blendDistance設定為1，從Frame Debug中觀察到：

unity_SpecCube1_ProbePosition的w分量表示當前box的blendDistance。除此之外，用RenderDoc還能發現，box的幾何資訊也發生了改變：

SV_POSITION的座標發生了變化，彷彿這個box變大了，實際也的確如此：

可以看出世界座標系變換的矩陣發生了變化，使得box的尺寸x，y，z方向都增加了2×blendDistance。不過雖然幾何上box的尺寸變大了，但是unity_SpecCube0_BoxMin和unity_SpecCube0_BoxMax這兩個變數依舊儲存了box原先的尺寸。只有box的幾何尺寸變大，才能覆蓋包含blendDistance的投影區域，而只有儲存原先尺寸，才能計算出物體到原始cube的距離，進而進行blend。

從Frame Debug可知，這裡blend的模式設定為SrcAlpha OneMinusSrcAlpha，能夠成功繪製也需要通過模板測試。這裡Stencil Ref設定為144，ReadMask設定為16，模板測試通過的條件為Equal。144 & 16 = 16，那麼只有當前模板值第5位為1的pixel才能通過測試。顯然，只有位於反射探針box範圍內的物體才會被繪製，並且這裡的box範圍包括原始box外blendDistance的區域。最後，不論模板測試成功或是失敗，都會把第5位清零，也就是把模板值復原回stencil pass之前的模樣。

除了這種方式之外，Unity還會使用另外一種策略繪製reflections資訊。例如，我們將剛剛這個反射探針的cube尺寸調大（調整box size或者blend distance），這裡將blend distance設定為2：

Frame Debug中發現stencil pass消失了，只剩下DeferredReflections這一繪製pass。不過它設定的rasterizer state發生了變化：

這裡的深度測試從Less Equal變成了Greater，Cull Back也變成了Cull Front。換言之只有物體背面會被渲染，正面會被剔除。這樣就能在fragment shader中獲得所有在box背面前方的物體。也就是說，它雖然不能像前一種方法那麼精確，只獲取box內部的物體，但是至少可以剔除掉box背後的物體。

那這樣，會不會不在box內部，在box前方的物體被錯誤渲染了呢？答案是不會的。別忘記我們還有一個blendDistance，程式碼會計算物體到原始box範圍的距離，如果超出了blendDistance，那麼pixel color的alpha分量會設定為0，對最終的結果無貢獻。

至於Unity如何選取繪製的策略，這裡並沒有找到相關內容，猜測是如果box前方的物體數量較多，走blend alpha為0的開銷相對較大，就會跑一遍stencil pass，來通過模板測試省掉不必要的繪製。

skybox和所有反射探針都繪製完後，Unity會再次使用DeferredReflections這個shader，把剛剛繪製的reflections資訊輸出到back buffer，只是這次使用的是shader的另一個pass：

// Adds reflection buffer to the lighting buffer
Pass
{
    ZWrite Off
    ZTest Always
    Blend [_SrcBlend] [_DstBlend]

    CGPROGRAM
        #pragma target 3.0
        #pragma vertex vert
        #pragma fragment frag
        #pragma multi_compile ___ UNITY_HDR_ON

        #include "UnityCG.cginc"

        sampler2D _CameraReflectionsTexture;

        struct v2f {
            float2 uv : TEXCOORD0;
            float4 pos : SV_POSITION;
        };

        v2f vert (float4 vertex : POSITION)
        {
            v2f o;
            o.pos = UnityObjectToClipPos(vertex);
            o.uv = ComputeScreenPos (o.pos).xy;
            return o;
        }

        half4 frag (v2f i) : SV_Target
        {
            half4 c = tex2D (_CameraReflectionsTexture, i.uv);
            #ifdef UNITY_HDR_ON
            return float4(c.rgb, 0.0f);
            #else
            return float4(exp2(-c.rgb), 0.0f);
            #endif

        }
    ENDCG
}

這個pass很簡單，這裡就不做分析了。

Deferred Shading Light Pass

在此之後，就正式進入繪製光源資訊的pass。Unity首先跟前向渲染一樣繪製shadowmap，如果是平行光源還會有一個collect shadows的pass，真正繪製光源資訊是使用DeferredShading這一shader進行的：

通過檢視原始碼可以發現，關鍵程式碼集中在函式CalculateLight中：

half4 CalculateLight (unity_v2f_deferred i)
{
    float3 wpos;
    float2 uv;
    float atten, fadeDist;
    UnityLight light;
    UNITY_INITIALIZE_OUTPUT(UnityLight, light);
    UnityDeferredCalculateLightParams (i, wpos, uv, light.dir, atten, fadeDist);

    light.color = _LightColor.rgb * atten;

    // unpack Gbuffer
    half4 gbuffer0 = tex2D (_CameraGBufferTexture0, uv);
    half4 gbuffer1 = tex2D (_CameraGBufferTexture1, uv);
    half4 gbuffer2 = tex2D (_CameraGBufferTexture2, uv);
    UnityStandardData data = UnityStandardDataFromGbuffer(gbuffer0, gbuffer1, gbuffer2);

    float3 eyeVec = normalize(wpos-_WorldSpaceCameraPos);
    half oneMinusReflectivity = 1 - SpecularStrength(data.specularColor.rgb);

    UnityIndirect ind;
    UNITY_INITIALIZE_OUTPUT(UnityIndirect, ind);
    ind.diffuse = 0;
    ind.specular = 0;

    half4 res = UNITY_BRDF_PBS (data.diffuseColor, data.specularColor, oneMinusReflectivity, data.smoothness, data.normalWorld, -eyeVec, light, ind);

    return res;
}

函式基本上也是一目瞭然，通過UnityDeferredCalculateLightParams計算出光源資訊，綜合G-Buffer中的場景幾何資訊，計算最終的顏色。來看看UnityDeferredCalculateLightParams輸出了光源的哪些資訊：

// --------------------------------------------------------
// Common lighting data calculation (direction, attenuation, ...)
void UnityDeferredCalculateLightParams (
    unity_v2f_deferred i,
    out float3 outWorldPos,
    out float2 outUV,
    out half3 outLightDir,
    out float outAtten,
    out float outFadeDist)
{
    i.ray = i.ray * (_ProjectionParams.z / i.ray.z);
    float2 uv = i.uv.xy / i.uv.w;

    // read depth and reconstruct world position
    float depth = SAMPLE_DEPTH_TEXTURE(_CameraDepthTexture, uv);
    depth = Linear01Depth (depth);
    float4 vpos = float4(i.ray * depth,1);
    float3 wpos = mul (unity_CameraToWorld, vpos).xyz;

    float fadeDist = UnityComputeShadowFadeDistance(wpos, vpos.z);

    // spot light case
    #if defined (SPOT)
        float3 tolight = _LightPos.xyz - wpos;
        half3 lightDir = normalize (tolight);

        float4 uvCookie = mul (unity_WorldToLight, float4(wpos,1));
        // negative bias because http://aras-p.info/blog/2010/01/07/screenspace-vs-mip-mapping/
        float atten = tex2Dbias (_LightTexture0, float4(uvCookie.xy / uvCookie.w, 0, -8)).w;
        atten *= uvCookie.w < 0;
        float att = dot(tolight, tolight) * _LightPos.w;
        atten *= tex2D (_LightTextureB0, att.rr).r;

        atten *= UnityDeferredComputeShadow (wpos, fadeDist, uv);

    // directional light case
    #elif defined (DIRECTIONAL) || defined (DIRECTIONAL_COOKIE)
        half3 lightDir = -_LightDir.xyz;
        float atten = 1.0;

        atten *= UnityDeferredComputeShadow (wpos, fadeDist, uv);

        #if defined (DIRECTIONAL_COOKIE)
        atten *= tex2Dbias (_LightTexture0, float4(mul(unity_WorldToLight, half4(wpos,1)).xy, 0, -8)).w;
        #endif //DIRECTIONAL_COOKIE

    // point light case
    #elif defined (POINT) || defined (POINT_COOKIE)
        float3 tolight = wpos - _LightPos.xyz;
        half3 lightDir = -normalize (tolight);

        float att = dot(tolight, tolight) * _LightPos.w;
        float atten = tex2D (_LightTextureB0, att.rr).r;

        atten *= UnityDeferredComputeShadow (tolight, fadeDist, uv);

        #if defined (POINT_COOKIE)
        atten *= texCUBEbias(_LightTexture0, float4(mul(unity_WorldToLight, half4(wpos,1)).xyz, -8)).w;
        #endif //POINT_COOKIE
    #else
        half3 lightDir = 0;
        float atten = 0;
    #endif

    outWorldPos = wpos;
    outUV = uv;
    outLightDir = lightDir;
    outAtten = atten;
    outFadeDist = fadeDist;
}

函式輸出了光源覆蓋區域的物體世界座標，用來取樣G-Buffer的uv座標，光源方向，光照的衰減程度，到陰影衰減中心的距離。函式首先計算場景物體的世界座標，使用UnityComputeShadowFadeDistance求出物體到陰影衰減中心的距離，該函式定義如下：

float UnityComputeShadowFadeDistance(float3 wpos, float z)
{
    float sphereDist = distance(wpos, unity_ShadowFadeCenterAndType.xyz);
    return lerp(z, sphereDist, unity_ShadowFadeCenterAndType.w);
}

通過Frame Debug發現，三種光源（平行光，點光，聚光）下unity_ShadowFadeCenterAndType均為(0,0,0,0)，那麼這裡的fadeDistance就是vpos.z。接下來，函式根據光源型別的不同，分別計算它們的衰減資訊。

對於聚光燈，和前向光照類似，會對_LightTexture0這張spot cookie紋理和_LightTextureB0這張衰減紋理進行取樣，得到光照衰減資訊（有關內容可以參考之前的文章《Unity中的多光源》[7]）。然後使用UnityDeferredComputeShadow從shadowmap中取樣陰影，再拿之前得到的陰影fadeDistance，通過UnityComputeShadowFade計算陰影衰減的程度：

half UnityComputeShadowFade(float fadeDist)
{
    return saturate(fadeDist * _LightShadowData.z + _LightShadowData.w);
}

在之前的文章《Unity中的shadows（三）receive shadows》[8]我們已經提到過：

_LightShadowData = new Vector4(
    1 - light.shadowStrength,                                                             // x = 1.0 - shadowStrength
    Mathf.Max(camera.farClipPlane / QualitySettings.shadowDistance, 1.0f),                // y = max(cameraFarClip / shadowDistance, 1.0) // but not used in current built-in shader codebase
    5.0f / Mathf.Min(camera.farClipPlane, QualitySettings.shadowDistance),                // z = shadow bias
    -1.0f * (2.0f + camera.fieldOfView / 180.0f * 2.0f)                                    // w = -1.0f * (2.0f + camera.fieldOfView / 180.0f * 2.0f) // fov is regarded as 0 when orthographic.
);

對於平行光源，預設的光照衰減為1，如果設定了cookie則還需要取樣cookie紋理，然後再計算陰影衰減，得到最終結果。對於點光源，也是類似的，也就不展開說了。

最後，我們通過Frame Debug看一下這三種光源在CPU層面的繪製資訊。首先來看聚光燈，發現Unity採用了類似reflections的繪製方式，先使用一個stencil pass來標記位於聚光燈區域內的物體，然後再去跑真正的light pass。這裡，Unity使用了一個4個頂點的pyramid來模擬聚光燈：