Python 來算算一線城市的二手房價格指數相關性

pythondict發表於2019-12-01

Python中有很多方法計算相關性,scipy中有自帶的分析工具,pandas裡也有非常方便的多變數相關性分析。我們今天就講講這兩個工具的用法。

1.資料收集

本文北上廣深的資料採集自東方財富網,以二手房價格指數為例:

資料從2011年1月1日開始,每個資料點是當時一個月的價格指數,採集方法是用開發者工具找到請求發回來的JSON資料,方法如下:

資料如下(2011/1/1-2019/10/1):

# 北京:
bj =  [100.3,100.4,99.9,100.1,99.8,99.9,100.1,100,99.6,99.5,99.3,99.2,99.1,99.8,100.2,100.4,99.9,100.2,100.3,100.3,100.1,100,100.3,101,101,102.2,103.1,102,101.7,101.3,101.4,101.2,101.3,101.1,101.2,100.6,99.9,100,100.2,99.8,99.1,98.7,99.2,99.1,98.6,100.3,100.7,100.2,100,99.9,100.5,102.1,104.3,102.3,102.6,102,101.4,101.1,101.4,101.7,102.3,103.2,106.3,103.7,102.3,101.4,101.6,103.9,105.7,101.1,100.2,100.2,100.8,101.3,102.2,100,99.1,98.9,99.2,99.1,99.4,99.5,99.5,99.6,99.4,99.5,99.8,99.9,100.3,100.1,100.4,100,99.8,99.8,99.4,99.8,99.9,100.2,100.4,100.6,100,100,99.7,99.6,99.5,99.4]

# 廣州:
gz =  [101.2,100.6,99.5,101,99.8,100.1,100.2,100.7,100.6,99.5,99.2,99.6,99.6,99.6,99.8,99.6,99.9,100.5,100.7,100.9,100.6,100.4,100.5,100.5,100.4,101.7,101.5,100.7,101.1,100.9,101,101,100.4,101,101.2,100.6,101,100.3,100.2,100.7,100.1,99.7,98.9,98.6,98.7,100,100,100.2,99.8,99.7,100,101.1,102.3,101.8,101.3,101,101.2,101.1,100.7,101,101.3,101.2,103.5,102.6,101.9,101.6,101.4,102.8,103.3,101.6,100.8,101.3,101.6,102.7,103.3,101,100.5,100.8,100.1,100,100.2,99.7,100.1,99.6,99.9,100.2,100.2,100.5,101,100.3,100.3,100.6,100.2,99.8,99.7,99.6,99.7,99.8,99.5,99.6,99.7,100,100.4,100,99.7,99.9]

# 上海: 
sh =  [100.5,100.4,100.4,100.6,100.2,100.2,100.3,100.1,100.1,99.8,99.5,99.6,99.3,99.7,99.5,100.1,100.3,100.2,100.2,100.3,100.2,100.2,100.2,100.4,100.8,101.6,102.6,101.3,100.9,101.1,100.8,100.8,101,100.9,100.7,100.5,100.1,100.6,100.2,100,99.8,99.3,99.1,99.3,99.2,100,100,100.4,100.3,100.1,100,100.6,102.2,101.2,101.6,101.1,101,100.8,101,101.2,102.7,105.3,106.2,102.5,101.4,102.2,102,103.7,103.4,100.3,99.8,99.5,99.6,100.2,100.7,100.8,100,99.9,99.6,99.8,99.9,100.3,99.7,99.9,100.1,99.6,99.4,99.8,99.7,99.7,99.9,99.9,99.8,99.8,99.9,99.7,100,99.9,100.3,100.5,100.1,99.9,100.4,100,100.6,99.8]

# 深圳:
sz =  [100.6,102.6,100.6,100.5,100.3,100,99.5,100,99.8,100,99.2,99.6,99.2,100,100.1,100,100,100.2,100.2,100.1,100.1,100.4,100.3,100.6,100.5,101.4,102.3,101.1,101,101.3,101,101.6,101.3,100.9,100.8,100.7,100.8,100.8,101.1,100.1,100.2,99.4,99.4,99.5,99.3,100,100.4,100.7,100.6,100.3,100.5,102.4,106.3,106.9,105.3,104.4,103.3,101,101.9,103.3,105.7,103.3,104.7,99.6,100,100.8,101.8,102,101.8,99.4,99.3,99.8,99.9,99.3,100.3,100.8,100.3,99.7,100.6,99.8,99.9,100.4,100.1,100.4,100.9,101.3,100.7,100.2,100.8,100.3,100.6,101.1,100,99.4,99.8,99.7,99.7,100.5,100.7,101.1,100,99.9,100.7,100.2,101.3,101]  

2.準備工作

首先,你要確保你的電腦安裝了Python,如果沒有可以看這篇文章:超詳細安裝Python指南

然後,開啟CMD(開始-執行-cmd),或者Terminal(macOS) 輸入以下指令安裝scipy和pandas.

pip install scipy 
pip install pandas

3.編寫程式碼

3.1 scipy計算相關性

scipy計算相關性其實非常簡單,引入包的stats模組:

import scipy.stats asstats

然後呼叫函式進行計算:

# 計算廣州和深圳二手房價格指數相關性
print(stats.pearsonr(gz, sz))

結果如下:

F:\push\20191130>python 1.py
(0.4673289851643741,  4.4100775485723706e-07)

什麼?!!廣州和深圳的二手房價格指數相關性竟然才0.46?那其他一線城市和深圳對比呢?

不過,stats麻煩的地方就在於,它一次只能對比兩個值,不能一次性兩兩對比四個一線城市,不過,有個模組可以。

3.2 pandas一次性兩兩對比計算相關性

首先引入pandas:

import pandas as pd

建立DataFrame存放四個資料:

df = pd.DataFrame()
df['北京'] = bj
df['上海'] = sh
df['廣州'] = gz
df['深圳'] =sz 

最後相關性計算:

print(df.corr())

來看看結果:

wow,看來深圳的二手房價還真是與眾不同,不過從下面這個圖看,確實,深圳的二手房價格和北京的二手房價格已經出現了背離的情況。

個人認為,這個背離和最近的一系列政策及香港局勢有關,但當前嚴峻的金融形勢下,不會持續太久。

我們的文章到此就結束啦,如果你希望我們今天的Python 教程,請持續關注我們,如果對你有幫助,麻煩在下面點一個贊/在看哦有任何問題都可以在下方留言區留言,我們都會耐心解答的!


​Python實用寶典 (pythondict.com)
不只是一個寶典
歡迎關注公眾號:Python實用寶典

Python實用寶典

相關文章