[pythonskill]利用python實現假設性檢驗方法

陳楚桐發表於2018-08-03

hello,大噶好,最近新學習了利用python實現假設性檢驗的一些方法,下面結合方法的數學原理做簡單的總結~

 

假設檢驗推論統計中用於檢驗統計假設的一種方法。而“統計假設”是可通過觀察一組隨機變數的模型進行檢驗的科學假說[1]一旦能估計未知引數,就會希望根據結果對未知的真正引數值做出適當的推論。

統計上對引數的假設,就是對一個或多個引數的論述。而其中欲檢驗其正確性的為零假設(null hypothesis),零假設通常由研究者決定,反應研究者對未知引數的看法。相對於零假設的其他有關引數之論述是備擇假設(alternative hypothesis),它通常反應了執行檢定的研究者對引數可能數值的另一種(對立的)看法(換句話說,備擇假設通常才是研究者最想知道的)。

假設檢驗的種類包括:t檢驗Z檢驗卡方檢驗F檢驗等等。

參考:https://zh.wikipedia.org/wiki/%E5%81%87%E8%A8%AD%E6%AA%A2%E5%AE%9A

應該說假設性檢驗是一種處理資料的思路,依據不同的實驗資料和目的可以使用不同的處理方法。如T檢驗,Z檢驗,卡方檢驗,F檢驗等等~~

——————————————————————————————————————–

Permutation test on frog data

學習假設性檢驗之前大家不妨先複習一下之前轉載的一篇關於置換檢驗 permutation test的一篇博文,它是假設性檢驗的一種方法,比較簡單基礎,所以先從這種方法說起。

置換檢驗適用於兩組完整的實驗資料(n1,n2)之間的分析比較:

主要流程是:

1、合併n1,n2,無放回地抽取生成與原資料長度相等的兩組資料(n1`,n2`)

def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

2、重複1過程,進行多次實驗(如10000次),將每次實驗得出資料(n1`,n2`)求平均,再將平均值做差:

def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1,data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1,perm_sample_2)

    return perm_replicates
def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1)-np.mean(data_2)

    return diff

 

3、這樣我們得到多個(10000)置換排列求得的結果,這些結果能代表模擬抽樣總體情況。

舉個例子:

Kleinteich and Gorb (Sci. Rep.4, 5225, 2014) performed an interesting experiment with South American horned frogs. They held a plate connected to a force transducer, along with a bait fly, in front of them. They then measured the impact force and adhesive force of the frog`s tongue when it struck the target.

K和G博士以前做過一系列實驗,研究青蛙舌頭的黏力與哪些因素有關。

他們經過測試發現:FROG_A(老青蛙)的平均黏力0.71 Newtons (N)和FROG_B(小青蛙)的0.42 Newtons (N)。這0.29牛頓的差距僅僅因為測試樣本過少而偶然發生的嗎?還是由於年齡的差距確實影響了青蛙的舌頭黏力呢?

於是,兩位科學家使用置換檢驗的方法對資料進行了分析:

# Compute difference of mean impact force from experiment: empirical_diff_means
empirical_diff_means = diff_of_means(force_a,force_b)#求得原始資料的平均值的差值

# Draw 10,000 permutation replicates: perm_replicates
perm_replicates = draw_perm_reps(force_a, force_b,
                                 diff_of_means, size=10000)#重複一萬次實驗之後統計差值分佈情況

# Compute p-value: p
p = np.sum(perm_replicates >= empirical_diff_means) / len(perm_replicates)#計算統計差值中比原始資料差值還大的可能

# Print the result
print(`p-value =`, p)

output:

p-value = 0.0063

可以看到,在這個假設中,我們認為:FROG_A和FROG_B的分佈是相同的(年齡並不影響舌頭的黏力)(You will compute the probability of getting at least a 0.29 N difference in mean strike force under the hypothesis that the distributions of strike forces for the two frogs are identical. )。經過反覆實驗之後,我們得到依據原始資料得出的(估計的,可能的)客觀世界均值的差值的分佈情況,並求出了原始資料以及比原始資料更大(更加離譜的差值)的概率,他是0.6%,說明出現這種比原始資料還離譜的差值概率是很小的。所以我們只能否定原來的假設,認為年齡是影響舌頭黏力的因素。

這就是簡單置換檢驗。

———————————————————————————————————————————–

Bootstrap hypothesis tests

———————————————————————————————————————————–

A one-sample bootstrap hypothesis test

下面科學家繼續對青蛙們進行研究:

在後面的研究中,科學家們發現了另一組青年青蛙FROG_C(FROG_B也是年輕青蛙哦),但不幸的是,FROG_C原始資料由於某些原因遺失,只記得它們的黏力均值為0.55N,而FROG_B的黏力均值是0.4191。因為沒有FROG_C原始資料,所以我們無法進行置換檢驗,無法確定FROG_B和FROG_C是否服從同一種分佈(是否是同一種青蛙)。它們是同一種青蛙嗎?為了進行分析,兩個科學家絞盡腦汁,提出了一個大膽的想法:

既然不能確定分佈情況,那我們假設FROG_B和FROG_C的黏力均值是一樣的好了(The mean strike force of Frog B is equal to that of Frog C.):

 

Another juvenile frog was studied, Frog C, and you want to see if Frog B and Frog C have similar impact forces. Unfortunately, you do not have Frog C`s impact forces available, but you know they have a mean of 0.55 N. Because you don`t have the original data, you cannot do a permutation test, and you cannot assess the hypothesis that the forces from Frog B and Frog C come from the same distribution. You will therefore test another, less restrictive hypothesis: The mean strike force of Frog B is equal to that of Frog C.

To set up the bootstrap hypothesis test, you will take the mean as our test statistic. Remember, your goal is to calculate the probability of getting a mean impact force less than or equal to what was observed for Frog B if the hypothesis that the true mean of Frog B`s impact forces is equal to that of Frog C is true. You first translate all of the data of Frog B such that the mean is 0.55 N. This involves adding the mean force of Frog C and subtracting the mean force of Frog B from each measurement of Frog B. This leaves other properties of Frog B`s distribution, such as the variance, unchanged.

 

# Make an array of translated impact forces: translated_force_b
translated_force_b = force_b-np.mean(force_b)+0.55#改變FROG_B的黏力(為什麼要改FROG_B的資料呢???認為FROG_B的原始資料採集錯誤嗎?)

# Take bootstrap replicates of Frog B`s translated impact forces: bs_replicates
bs_replicates = draw_bs_reps(translated_force_b, np.mean, 10000)#bootstrap reps

# Compute fraction of replicates that are less than the observed Frog B force: p
p = np.sum(bs_replicates <= np.mean(force_b)) / 10000#求p

# Print the p-value
print(`p = `, p)

output:

p =  0.0046

用人類普通話重複一下程式碼語言就是:在我們做出的假設的前提下,修改了一組FROG_B的資料(因為我們沒有均值為0.55N的實驗資料,所以我們杜撰了這個???我不確定這裡我的理解是否正確),使之均值為0.55,記錄於translated_force_b中,我們經過10000次bootstrap replicate實驗,我們得出了滿足假設條件的客觀世界的均值分佈,最後求出比真實FORG_B資料更加極端的所有情況的概率是0.46%,可能性很小,所以否定了原來的假設,即FROG_B的黏力均值幾乎不可能和FROG_C相等。

可以看到這個實驗是針對一組有資料的樣本,和一組沒有資料的樣本(只是知道其中的某個統計量),以這個沒有樣本的某個統計量為研究目的進行的分析。

———————————————————————————————————————————–

A bootstrap test for identical distributions

這個實驗是基於兩種樣本完整的資料,檢測他們是否具有相同的分佈情況(Frog A and Frog B have identical distributions of impact forces ):

# Compute difference of mean impact force from experiment: empirical_diff_means
empirical_diff_means = diff_of_means(force_a,force_b)

# Concatenate forces: forces_concat
forces_concat = np.concatenate((force_a,force_b))

# Initialize bootstrap replicates: bs_replicates
bs_replicates = np.empty(10000)

for i in range(10000):
    # Generate bootstrap sample
    bs_sample = np.random.choice(forces_concat, size=len(forces_concat))
    
    # Compute replicate
    bs_replicates[i] = diff_of_means(bs_sample[:len(force_a)],
                                     bs_sample[len(force_a):])

# Compute and print p-value: p
p = np.sum(bs_replicates>=empirical_diff_means) / len(bs_replicates)
print(`p-value =`, p)

output:

p-value = 0.0055

 程式碼過程解析:

1、首先我們認為FROG_A和FROG_B的分佈是相同的,那麼我們就可以將其合併(forces_concat = np.concatenate((force_a,force_b)));

2、利用bootstrap replicate,我們可以有放回地抽取force_a,force_b長度的兩組資料,求平均值(得到一種可能的客觀世界均值),並對平均值做差。

3、重複2步驟10000次,我們就得到了可能的客觀世界中兩個裡在假設情況下均值之差的分佈情況。

4、檢驗比原始資料(empirical_diff_means = diff_of_means(force_a,force_b))更極端的情況發生的概率是多少,求出p值。

得到了概率為0.55%,很小,所以我們否定了原來的假設,即:FROG_A和FROG_B不應該有相同的分佈情況。

 

可以看到,與重置檢驗的方法類似,兩個檢驗方法的出發點都是:在兩個給出的樣本中,如果假設他們的分佈相同,那麼均值之差為0.29的情況下在是否是一個大概率事件。但兩種方法孰優孰劣呢?datacamp中老師們給出的答案是:

 Testing the hypothesis that two samples have the same distribution may be done with a bootstrap test, but a permutation test is preferred because it is more accurate (exact, in fact).

可見,重置檢驗的方法是更加值得信任的。

但重置檢驗方法也有它的侷限性:

But therein lies the limit of a permutation test; it is not very versatile. We now want to test the hypothesis that Frog A and Frog B have the same mean impact force, but not necessarily the same distribution. This, too, is impossible with a permutation test.

當我們只想比較FROG_A和FROG_B是否具有相同的均值,而不必知道他們是否有相同的分佈時,重置檢驗就沒有辦法了。

———————————————————————————————————————————–

A two-sample bootstrap hypothesis test for difference of means.

# Compute mean of all forces: mean_force
mean_force = np.mean(forces_concat)

# Generate shifted arrays
force_a_shifted = force_a - np.mean(force_a) + mean_force
force_b_shifted = force_b - np.mean(force_b) + mean_force

# Compute 10,000 bootstrap replicates from shifted arrays
bs_replicates_a = draw_bs_reps(force_a_shifted, np.mean, 10000)
bs_replicates_b = draw_bs_reps(force_b_shifted, np.mean, 10000)

# Get replicates of difference of means: bs_replicates
bs_replicates = bs_replicates_a - bs_replicates_b

# Compute and print p-value: p
p = np.sum(bs_replicates>=empirical_diff_means) / len(bs_replicates)
print(`p-value =`, p)

output:

 p-value = 0.0043

可以看到,bootstrap analysis確實具有更加靈活多樣的檢驗能力,不僅可以檢驗兩組資料是否具有相同的分佈,而且可以檢驗資料是否具有相同的均值。

 

that`s all thank you~


相關文章