通過SQL解讀財富的分配

還是繼續昨天的問題，知乎上一個蠻有意思的問題，資料分析師做了一個很有意思的解讀。

題目是：房間裡有100個人，每人都有100元錢，他們在玩一個遊戲。每輪遊戲中，每個人都要拿出一元錢隨機給另一個人，最後這100個人的財富分佈是怎樣的？

我們用SQL來解讀一下，當然還可以測試的更充分一些。

首先我們建立一個表test_money，裡面是存放很簡單的資料，pid就是人員編號，money就是手頭的財富。

create table test_money(pid number primary key,money number);

先插入100條記錄。
insert into test_money select level,1 from dual connect by level <=100;

接著我使用如下的指令碼來批量模擬實驗，即迴圈pid,從1到100，依次扣減一塊，依次隨機新增一塊。

delete from test_money;
insert into test_money select level,1 from dual connect by level <=100;
commit;

begin
for i in 1..100 loop
update test_money set money=money-1 where pid=i;
update test_money set money=money+1 where pid=trunc(dbms_random.value(0,100)) ;
end loop;
end;
/

執行之後檢視資料，第一次是這樣的效果，可以看到，0,1,2的資料佔比差不多是95%
     MONEY   COUNT(*)
---------- ----------
         1         37
         2         25
         4          1
         3          3
         0         34
再來依次測試，資料有一定的變化，但是佔比差別不大。
     MONEY   COUNT(*)
---------- ----------
         1         42
         2         19
         4          2
         3          4
         0         33

我們能不能繼續改進一下，這麼模擬測試100次，最後得到的平均數就相對更有說服力了。

為此我們建立一個表test_money_sum，插入統計資料。

create table test_money_sum(money int,money_cnt number);
insert into test_money_sum select money,count(*)from test_money group by money;

按照這樣的邏輯，我可以很輕鬆的模擬100遍這個實驗。

使用如下的shell指令碼來批量更新資料，得到最後的結果。

for i in {1..100}
do
sqlplus -s xx/xx<<EOF
@test.sql
EOF
done

sqlplus -s xx/xx<<EOF
select money,sum(money_cnt)/100 from test_money_sum group by money;

EOF

到了出資料的時候了，100次測試之後，財富的分佈情況如下：

金錢	平均人數
0	37.36
1	36.57
2	18.12
3	6.16
4	1.43
5	0.28
6	0.07
8	0.01

如果用圖表來檢視會更好一些

而所佔的比例用這個餅圖來看更加直白。

整體來看，這個實驗還是很有意蘊的，想要突破現有的思維桎梏，達到一個新的高度，這個比例還是很低的，但是無論如何還是需要花很大的功夫的。

我們再想一下，如果這個樣本是1000萬呢。

這就涉及幾個技巧，首先是初始化資料，1000萬的資料初始化可以這麼來做。

SQL> insert into test_money select rownum,1 from xmltable('1 to 10000000');
10000000 rows created.
Elapsed: 00:00:29.50

半分鐘即可搞定，而如何批量更新呢，可以參考這個pl/sql

declare
cursor test_cur is select pid from test_money;
type rec_type is table of test_cur%rowtype index by binary_integer;
recs rec_type;
begin
open test_cur;
fetch test_cur bulk collect into recs;
close test_cur;
forall i in 1..recs.COUNT
   update test_money set money=money-1 where pid=recs(i).pid;
   update test_money set money=money+1 where pid=trunc(dbms_random.value(0,10000000)) ;
   commit;
end;
/

資料還在執行。稍後提供。

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/23718752/viewspace-2143006/，如需轉載，請註明出處，否則將追究法律責任。

通過SQL解讀財富的分配

相關文章