Causal Inference理論學習篇-Tree Based-From Uplift Tree to Uplift Forest

real-zhouyc發表於2024-04-18

原文網址 : https://www.cnblogs.com/zhouyc/p/18144752

uplift Tree

和causal tree一樣，uplift tree[8]作為一種以分類任務為主的，同樣是將因果效應apply到節點分割的標準中。區別是：causal tree：1）使用honest的方法；2) 從effect 的偏差和方差的角度切入指導樹的構建，把分類問題轉化為迴歸問題去做。3）邏輯上只支援兩個treatment
而uplift tree則更直接一些：直接看節點分裂前後，增量uplfit的gain, 兩種treatment下通用形式化框架為：

\[D_{gain}=D(P^T(Y):P^C(Y)|A) - D(P^T(Y):P^C(Y)) \]

\(D\) 表示一種評估方式，第一項表示分裂後的metric值，第二項表示分裂前的metric值，A表示某種分裂方式。\(P^T\) 表示T組的機率分佈，\(P^C\) 表示C組的機率分佈。
同時，只要稍作修改，uplift tree就能支援多treatment的情況：
對於分裂前：

\[ \begin{aligned} & D(P^{T_1}(Y),...,P^{T_k}(Y):P^{C}(Y)) \\ & =\alpha \sum_{i=1}^{K}\lambda_iD(P^{T_i}(Y):P^{C}(Y)) \\ & +(1-\alpha) \sum_{i=1}^{K}\sum_{J=1}^{K}\gamma_{ij}D(P^{T_i}(Y):P^{T_j}(Y)) \end{aligned} \]

其中，\(\alpha\) 和 \(\gamma\) 是兩個超參，其中\(\gamma，\lambda\) 用來控制treatment的重要性。\(\alpha\) 控制TvsC 和 TvsT的balance。比如我們需要設定treatment單調性的時候

i.e. 深補的effect>淺補的effect
通常來說，如果設定 \(\lambda=\frac{1}{k},\gamma_{ij}=\frac{1}{k^2}\) , 相當於等價看不同的T組。
對於分裂後：

\[ \begin{aligned} & D(P^{T_1}(Y),...,P^{T_k}(Y):P^{C}(Y)|A) \\ &=\sum_{a}\frac{N(a)}{N}D(P^{T_i}(Y|a),...,P^{T_k}(Y|a):P^{C}(Y|a)) \\ \end{aligned} \]

幾種不同的分裂方式

uplift tree的一個優點是可以很自由的去修改split criteria。我們以兩種treatment為例，事實上多兩種treatment只是多種的特例。

DDP \(\Delta \Delta P\)

對於一種分裂方式\(A\), DDP定義為：

\[\Delta \Delta P(A)=|(P^{T}(y_0|a_0)-P^{C}(y_0|a_0))-(P^{T}(y_1|a_1)-P^{C}(y_1|a_1))| \]

很直觀，字面意思就是：左右子節點各計算一次uplift得到第一層\(\Delta\) ,然後兩個葉子節點再計算一次\(\Delta\)。達到了最大化兩個葉子節點的divergence的目的，節點內部和節點間異質最大化。

IDDP

IDDP[9] 對DDP做了一些改進，motivation是DDP存在一些缺陷：

DDP得到的gain總是正的=>有正就會導致tree 一直分裂，得到一棵很深的樹。樹深了就很容易導致over-fitting。
依然繼承第1個問題，不斷分裂的特點是：容易導致葉子節點上的樣本很少，進而導致葉子方差很高，又進一步有加劇了過擬合。
沒有考慮葉子節點node的數量和T|C組的樣本佔比，這就會導致不置信或者不平。
因此，作者引入了invariant version DDP:

\[IDDP=\frac{\Delta \Delta p^*}{I(\Phi,\Phi_l,\Phi_r)} \]

\[\begin{split}I(\phi, \phi_l, \phi_r) = H(\frac{n_t(\phi)} {n(\phi)}, \frac{n_c(\phi)}{n(\phi)}) * 2 \frac{1+\Delta\Delta P^*}{3} + \frac{n_t(\phi)}{n(\phi)} H(\frac{n_t(\phi_l)}{n(\phi)}, \frac{n_t(\phi_r)}{n(\phi)}) \\ + \frac{n_c(\phi)}{n(\phi)} * H(\frac{n_c(\phi_l)}{n(\phi)}, \frac{n_c(\phi_r)}{n(\phi)}) + \frac{1}{2}\end{split}\]

其中，\(H\) 表示熵：\(H(p,q)=(-p*log_2(p)) + (-q*log_2(q))\) , \(\phi\) 表示當前node可用的特徵空間，\(\phi_l\) 就是左節點的特徵空間，\(\phi_r\) 是右節點的特徵空間。\(n_c(\phi)\) 表示當前節點c組的樣本，\(n_t(\phi)\) 表示當前節點T組的樣本，\(n(\phi)\) 就是節點總樣本。

加入整項公式後，類似於一種對於當前分割結果的懲罰項，基本上solve了前面3個問題，特別是節點樣本數量、T|C組樣本量的問題。

KL divergence KL散度

如果把T組和C組理解成兩個分佈，那麼我們需要分割的時候，能儘量的拉大兩個分佈間的距離，
同時，因為T|C的樣本量可能不一致，這個時候，分佈度量函式需要考慮到不對稱性，熟悉交叉熵的話，就會想起 KL散度就是一個不錯的選擇

\[KL(P : Q) = \sum_{k=left, right}p_klog\frac{p_k}{q_k} \]

\[D_{KL}(P:Q)=\sum_{k=left,right}P(k)*\log P(k)-P(k)\log Q(k)| \]

其中，\(p_k\)是T組在節點\(k\)的機率分佈，\(q_k\) 是C組在節點\(k\) 的機率分佈。這個分佈其實就是各組的正樣本佔比。當然，兩個分佈在這裡都是float型。對於多個treatment，則分別和C組算一下KL，然後sum起來
具體的計算為：

@cython.cfunc
def kl_divergence(pk: cython.float, qk: cython.float) -> cython.float:
    '''
    Calculate KL Divergence for binary classification.

    sum(np.array(pk) * np.log(np.array(pk) / np.array(qk)))

    Args
    ----
    pk : float
        The probability of 1 in one distribution.
    qk : float
        The probability of 1 in the other distribution.

    Returns
    -------
    S : float
        The KL divergence.
    '''

    eps: cython.float = 1e-6
    S: cython.float

    if qk == 0.:
        return 0.

    qk = min(max(qk, eps), 1 - eps)

    if pk == 0.:
        S = -log(1 - qk)
    elif pk == 1.:
        S = -log(qk)
    else:
        S = pk * log(pk / qk) + (1 - pk) * log((1 - pk) / (1 - qk))

    return S

歐式距離(Euclidean Distance, ED)

距離度量方式當然還有歐式距離：

\[ED(P : Q) = \sum_{k=left, right}(p_k - q_k)^2 \]

坦率說，我其實沒理解ED會有什麼用。

卡方距離Chi

\[\chi^2(P : Q) = \sum_{k=left, right}\frac{(p_k - q_k)^2}{q_k} \]

有了KL散度的理解，Chi內部的計算邏輯就不系嗦了:)

Interaction Tee (IT)

Causal Inference Tree (CIT)

Contextual Treatment Selection (CTS)

Uplift Forest

和 tree轉forest一樣，直接apply，簡單粗暴，秒變森林。分散式下會複雜一些


self.uplift_forest = [

            UpliftTreeClassifier(

                max_features=self.max_features, max_depth=self.max_depth,

                min_samples_leaf=self.min_samples_leaf,

                min_samples_treatment=self.min_samples_treatment,

                n_reg=self.n_reg,

                evaluationFunction=self.evaluationFunction,

                control_name=self.control_name,

                normalization=self.normalization,

                honesty=self.honesty,

                random_state=random_state.randint(MAX_INT))

            for _ in range(self.n_estimators)

        ]

refs

https://hwcoder.top/Uplift-1
工具: scikit-uplift
Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.
https://zhuanlan.zhihu.com/p/115223013
Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized random forests." (2019): 1148-1178.
Wager, Stefan, and Susan Athey. "Estimation and inference of heterogeneous treatment effects using random forests." Journal of the American Statistical Association 113.523 (2018): 1228-1242.
Rzepakowski, P., & Jaroszewicz, S. (2012). Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32, 303-327.
annik Rößler, Richard Guse, and Detlef Schoder. The best of two worlds: using recent advances from uplift modeling and heterogeneous treatment effects to optimize targeting policies. International Conference on Information Systems, 2022.

Causal Inference理論學習篇-Tree Based-Causal Forest
2024-04-18
REST
Causal Inference理論學習篇-Tree Based-Causal Tree
2024-04-14
webpack學習(四) -- css tree shaking
2019-03-27
WebCSS
react diff 學習之tree diff
2024-09-03
React
KD-Tree 學習筆記
2024-07-28
筆記
Link Cut Tree學習筆記
2021-01-18
筆記
[學習筆記 #7] Link Cut Tree
2024-12-06
筆記
批處理命令之tree命令
2021-06-11
tree
2024-12-04
珂朵莉樹(Chtholly Tree)學習筆記
2018-11-05
筆記
演算法學習筆記（16）：Link Cut Tree
2024-05-07
演算法筆記
ROS-tf tree的學習筆記，夠常用
2020-10-21
ROS筆記
Linux tree命令作用是什麼?Linux學習教程
2021-11-05
Linux
Decision Tree
2018-06-29
Tree Compass
2024-03-18
A - Distance in Tree
2020-10-21
DSU on Tree
2024-11-16
Rebuild Tree
2024-10-06
Rebuild
01 Tree
2024-07-08
【MySQL（1）| B-tree和B+tree】
2019-02-16
MySql
深度UPLIFT模型在騰訊金融使用者增長場景中的應用
2023-03-24
模型
odoo檢視入門學習- tree檢視的使用
2021-04-15
Odoo
多路查詢樹:B-tree/b+tree
2020-09-30
LeetCode#110.Balanced Binary Tree(Tree/Height/DFS/Recursion)
2020-12-23
LeetCode
Root of AVL Tree
2018-10-09
Tree – Information Theory
2018-05-23
ORM
mvn dependency:tree
2018-06-28
Traversals of binary tree
2018-04-17
Circular Spanning Tree
2024-07-26
B-tree
2024-07-20
B+tree
2024-07-20
segment tree beats
2024-11-03
tree-shaking
2022-06-15
Walking the File Tree
2020-12-24
林軒田機器學習技法課程學習筆記9 — Decision Tree
2018-07-28
機器學習筆記
機器學習之決策樹(Decision Tree)python實現
2018-06-12
機器學習Python
【學習筆記】Segment Tree Beats/吉司機線段樹
2024-11-16
筆記
LeetCode C++ 968. Binary Tree Cameras【Tree/DFS】困難
2020-09-27
LeetCodeC++