Python視覺化：Seaborn（一）

科賽網Kesci發表於2017-10-10

原文網址 : https://juejin.im/post/59dc953c51882551dd30fd3f

Python視覺化

進行資料分析&挖掘時，描述性統計必不可少。比如，我們需要看下各個quantitative變數的分佈情況，良好的分佈視覺化效果能為之後進一步做資料建模打下基礎。

其中，Seaborn便是個功能強大的庫，可以用它做出很棒的資料視覺化效果。我們此處結合科賽網上公開的鏈家二手房資料集，對如何使用Seaborn做Distribution Visualization進行說明。

說明：文中所有程式碼部分均可通過K-Lab線上資料分析協作工具復現。可以登入科賽網，嘗試用不同的資料集利用Seaborn進行視覺化練習。

對於quantitative變數做分佈視覺化，主要有兩點：

探尋變數自身的分佈規律，也就是univariate distributions視覺化；
探尋兩個變數之間是否有分佈關係，也就是bivariate distributions視覺化。

Seaborn也是按這個workflow給出了plot function。

univariate distributions visualization:
distplot --- 繪製某單一變數的分佈情況
kdeplot --- fit某變數(單一變數或兩個變數之間)分佈的核密度估計(kernel density estimate)
rugplot --- 在座標軸上按戳的樣式(sticks)依次繪製資料點序列
bivariate distributions visualization:
jointplot --- 繪製某兩個變數之間的分佈關係

讀取資料

import pandas as pd sh = pd.read_csv('sh.csv',encoding='gbk')

為了避免中文解碼出現bug，將表頭進行替換：

匯入繪圖的包

import warnings warnings.filterwarnings("ignore") import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline

單一變數視覺化初探

在這個資料集中，quantitative的變數主要有房屋的面積Area,每平米單價Price，以及房屋總價Tprice。

先來看看上海每個行政區房屋總價Tprice的分佈情況，我們用distplot繪製。需要注意的是，在預設情況下，distplot會直接給出變數核密度估計的fit曲線。

dist = sh.Dist.unique() plt.figure(1,figsize=(16,30))with sns.axes_style("ticks"):
for i in range(17): temp = sh[sh.Dist == dist[i]] plt.subplot(6,3,i+1) plt.title(dist[i]) sns.distplot(temp.Tprice) plt.xlabel(' ') plt.show()

當然，我們也可以關閉核密度估計fit曲線，直接去看直方圖分佈(histograms)。Seaborn在distplot function的API中給出了kde和rug這兩個引數，分別對應kernel density和rugplot(也就是在座標軸上繪製出datapoint所在的位置)。

我們單獨取出徐彙區(Xuhui)的資料，對kde和rug這兩個引數進行設定，做出的直方圖如下。

temp = sh[sh.Dist == 'Xuhui'] plt.figure(1,figsize=(6,6)) plt.title('Xuhui') sns.distplot(temp.Tprice,kde=False,bins=20,rug=True) plt.xlabel(' ') plt.show()

在Seaborn中，我們也可以直接呼叫kdeplot和rugplot做圖。

現在我們去研究一下徐彙區資料中，房屋面積變數Area的分佈情況。

from scipy import stats, integrate plt.figure(1,figsize=(12,6))with sns.axes_style("ticks"): plt.subplot(1,2,1) sns.kdeplot(temp.Area,shade=True) sns.rugplot(temp.Area) plt.title('Xuhui --- Area Distribution') plt.subplot(1,2,2) plt.title('Xuhui - Area Distribution fits with gamma distribution') sns.distplot(temp.Area, kde=False, fit=stats.gamma) plt.show()

左：kdeplot function和rugplot function分別呼叫後的疊加，體現Seaborn做圖靈活性
右：在distplot function設定了fit引數，讓資料的分佈與gamma分佈進行擬合

兩個變數(pairs)視覺化

在做了單個quantitative變數分佈的視覺化研究後，我們來看看某兩個變數組之間是否存在分佈關係。

Seaborn在這裡提供了jointplot function使用。下面我們來對整個資料集的房屋面積(Area)和房價(Tprice)這兩個變數進行視覺化分析。

繪製散點圖Scatterplot

sns.jointplot(x='Area',y='Tprice',data=sh) plt.show()

我們發現房價小於1000W並且面積小於200平方米的資料點很集中。設定一個filter，將這部分資料單獨拿出來做研究，重新繪製散點圖。

test = sh[(sh.Tprice<1000)&(sh.Area<200)]with sns.axes_style("white"): sns.jointplot(x='Area',y='Tprice',data=test) plt.show()

當資料量很大的時候，可以進一步利用hexbin plot去做視覺化，顯示資料集中分佈的區域，如下圖所示。

with sns.axes_style("white"): sns.jointplot(x='Tprice',y='Area',data=test,kind='hex') plt.show()

當然，我們也可以用kernel density estimation去做視覺化，看分佈情況。

with sns.axes_style("white"): sns.jointplot(x='Area',y='Tprice',data=test,kind='kde') plt.show()

小結

seaborn的巧妙之處就是利用最短的程式碼去視覺化儘可能多的內容，而且API十分靈活，只有你想不到，沒有你做不到。

另外，這篇小短文對資料集本身的探索與解釋不是很多，若希望更深層次的探索資料集，可以直接登入科賽網，點選「資料集」檢視。

本文由保一雄@科賽網資料分析師原創。

Python 視覺化 | Seaborn5 分鐘入門 (一)——kdeplot 和 distplot
2020-01-17
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (五)——lmplot
2020-01-17
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (七)——pairplot
2020-01-17
Python視覺化AI
Python Seaborn綜合指南，成為資料視覺化專家
2019-10-22
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (三)——boxplot 和 violinplot
2020-01-17
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (四)——stripplot 和 swarmplot
2020-01-17
Python視覺化Swarm
NumPy 正態分佈與 Seaborn 視覺化指南
2024-05-23
視覺化
Python 視覺化 | Seaborn5 分鐘入門 (六)——heatmap 熱力圖
2020-01-17
Python視覺化
從靜態到動態化，Python資料視覺化中的Matplotlib和Seaborn
2024-03-25
Python視覺化
NumPy 二項分佈生成與 Seaborn 視覺化技巧
2024-05-27
視覺化
多項分佈模擬及 Seaborn 視覺化教程
2024-06-03
視覺化
NumPy 均勻分佈模擬及 Seaborn 視覺化教程
2024-05-30
視覺化
NumPy 泊松分佈模擬與 Seaborn 視覺化技巧
2024-05-29
視覺化
資料視覺化Seaborn從零開始學習教程（三）資料分佈視覺化篇
2019-03-01
視覺化
NumPy 隨機資料分佈與 Seaborn 視覺化詳解
2024-05-21
隨機視覺化
資料視覺化Seaborn從零開始學習教程（一）風格選擇
2018-05-18
視覺化
Pandas資料視覺化工具——Seaborn用法整理
2019-01-30
視覺化
Python+pandas+matplotlib視覺化案例一則
2019-03-07
Python視覺化
卡方分佈和 Zipf 分佈模擬及 Seaborn 視覺化教程
2024-06-04
視覺化
seaborn和pandas-missingno 的資料視覺化--使用畫圖--缺失值分析
2019-01-01
視覺化
python資料視覺化——echarts
2018-10-16
Python視覺化Echarts
python 資料視覺化利器
2019-02-28
Python視覺化
Python繪圖與視覺化
2020-02-17
Python繪圖視覺化
Python視覺化神器Yellowbrick使用
2020-04-06
Python視覺化
Python視覺化-氣泡圖
2018-04-24
Python視覺化
Python視覺化-折線圖
2018-04-24
Python視覺化
Python視覺化-地圖染色
2018-04-24
Python視覺化地圖
推薦一款Python資料視覺化神器
2020-05-07
Python視覺化
Python資料視覺化matplotlib庫
2019-03-04
Python視覺化
Python視覺化(1)：折線圖
2018-12-03
Python視覺化
資料視覺化Seaborn從零開始學習教程（二）顏色調控篇
2019-03-01
視覺化
【Python視覺化】使用Pyecharts進行奧運會視覺化分析～
2020-04-29
Python視覺化Echarts
Python視覺化圖系列（1）-----jupyter notebook
2020-10-09
Python視覺化
Python資料視覺化---pygal模組
2020-04-20
Python視覺化
Python 如何實現資料視覺化
2019-05-11
Python視覺化
使用 Python 進行資料視覺化
2024-07-26
Python視覺化
視覺化
2020-08-04
視覺化
如何用Python做AQI分析並視覺化？
2020-09-01
Python視覺化
python資料分析與視覺化基礎
2024-08-02
Python視覺化

Python視覺化：Seaborn（一）

univariate distributions visualization:

bivariate distributions visualization:

單一變數視覺化初探

兩個變數(pairs)視覺化

小結

相關文章