Python視覺化：Seaborn（二）

科賽網Kesci發表於2017-10-25

原文網址 : https://juejin.im/post/59f093b751882578c17ed99c

本文由保一雄@科賽網資料分析師原創。

Seaborn是一個很棒的視覺化庫，尤其是當資料維度很大時，它可以讓我們用最少的程式碼去繪製一些描述性統計的圖，便於找尋各維度變數之間的特徵。

繼上篇Python視覺化：Seaborn（一），分享過用Seaborn做Distribution Visualization，本篇我們將分享用Seaborn做Categorial Visualization，包括其中涉及的Stripplot & Swarmplot，Boxplot & Violinplot，Barplot & Pointplot，以及抽象化的Factorplot。

我們此處結合科賽網上公開的Iris鳶尾花資料集進行演示說明。

文中所有完整原始碼均可通過K-Lab線上資料分析協作工具復現。它涵蓋了Python、R等主流語言，完成了包括Seaborn、Pandas、Numpy等90%以上資料分析&挖掘相關庫的部署，幫助資料人才專注資料分析本身，提高效率。

Iris鳶尾花資料集：是常用的分類實驗資料集，由Fisher, 1936收集整理。是一類多重變數分析的資料集。共包含150個資料集，分為3類，每類50個資料，每個資料包含4個屬性。可通過花萼長度(sepal_length)，花萼寬度(sepal_width)，花瓣長度(petal_length)，花瓣寬度(petal_width)4個屬性預測鳶尾花卉屬於（Setosa，Versicolour，Virginica）三個種類中的哪一類。

匯入庫

import warnings warnings.filter

warnings("ignore")

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt %matplotlib inline

import seaborn as sns

Stripplot

Stripplot的本質就是把資料集中具有Quantitative屬性的變數按照類別去做散點圖(Scatterplot)。

我們將紙鳶花資料集中不同種類花的Sepal Length做Stripplot視覺化。

plt.figure(1,figsize=(12,6))

plt.subplot(1,2,1)

sns.stripplot(x='species',y='sepal_length',data=iris) #stripplot

plt.title('Striplot of sepal length of Iris species')with sns.axes_style("whitegrid"): # 這個是臨時設定樣式的命令，如果不寫，則按預設格式'darkgrid'進行繪製

plt.subplot(1,2,2)

plt.title('Striplot of sepal length of Iris species') sns.stripplot(x='species',y='sepal_length',data=iris,jitter=True) # jitterplot

plt.show()

上邊左側的圖片便是在預設風格下用Stripplot繪製的散點圖。在很多情況下，Stripplot中的點會重疊，使得我們不容易看出點的分佈情況。一個簡單的解決辦法就是用在Stripplot的基礎上繪製抖動圖(jitterplot)，僅沿著類別座標軸的方向去隨機微調整點的位置，顯示出分佈情況。

Swarmplot

另一個解決Stripplot中點重疊的辦法就是繪製Swarmplot,它的本質就是用通過演算法，在類別座標軸的方向上去‘延展’繪製這些原本重合的點。我們將紙鳶花資料集中不同種類花的Petal Length和Petal width做Swarmplot視覺化。

plt.figure(1,figsize=(12,6))

plt.subplot(1,2,1)

sns.swarmplot(x='species',y='petal_length',data=iris)

with sns.axes_style("ticks"): # 這次使用了ticks風格

plt.subplot(1,2,2)

sns.swarmplot(x='species',y='petal_width',data=iris)

plt.show()

Boxplot

箱形圖，主要包含六個資料節點，將一組資料從大到小排列，分別計算出上邊緣，上四分位數Q3，中位數，下四分位數Q1，下邊緣，還有異常值。下面將紙鳶花資料集中的四個變數sepal_length, sepal_width, petal_length和petal_width做箱形圖視覺化。

var = ['sepal_length','sepal_width','petal_length','petal_width']

axes_style = ['ticks','white','whitegrid', 'dark']

fig = plt.figure(1,figsize=(12,12))for i in range(4): with sns.axes_style(axes_style[i]): # 將除了預設的darkgrid之外的樣式都展現一遍

plt.subplot(2,2,i+1) sns.boxplot(x='species',y=var[i],data=iris)

plt.show()

Violinplot

Violinplot相當於結合了箱形圖與核密度圖，更好地展現出資料的量化形態。

context= ['notebook','paper','talk','poster']

axes_style = ['ticks','white','whitegrid', 'dark']

plt.figure(1,figsize=(12,12))for i in range(4): with sns.axes_style(axes_style[i]):#設定axes_style

sns.set_context(context[i]) # 設定context style,預設為notebook,除此之外還有paper,talk,poster

plt.subplot(2,2,i+1)

plt.title(str(var[i])+ ' in Iris species')

sns.violinplot(x='species',y=var[i],data=iris)

plt.show()

Violinplot用Kernel Density Estimate去更好地描述了quantitative變數的分佈。

與此同時，也可以組合Swarmplot和Boxplot或Violinplot去描述Quantitative變數。用鳶尾花資料集展示如下：

context= ['notebook','paper','talk','poster']

axes_style = ['ticks','white','whitegrid', 'dark']

plt.figure(1,figsize=(12,12))for i in range(4): with

sns.axes_style(axes_style[i]):#設定axes_style sns.set_context(context[i])#設定context

plt.subplot(2,2,i+1)

plt.title(str(var[i])+ ' in Iris species')

sns.swarmplot(x='species', y=var[i], data=iris, color="w", alpha=.5)

sns.violinplot(x='species', y=var[i], data=iris, inner=None) if i%2 ==0 \ else sns.boxplot(x='species', y=var[i], data=iris) # 分別用swarmplot+violinplot 和swarmplot + boxplot

plt.show()

Barplot

Barplot主要是展現在分類中的Quantitative變數的平均值情況，並且用了Boostrapping演算法計算了估計值的置信區間和Error bar.用鳶尾花資料集。

plt.figure(1,figsize=(12,12))for i in range(4): with sns.axes_style(axes_style[i]):#設定axes_style

sns.set_context(context[i]) # 設定context style,預設為notebook,除此之外還有paper,talk,poster plt.subplot(2,2,i+1)

plt.title(str(var[i])+ ' in Iris species') sns.barplot(x='species',y=var[i],data=iris)

plt.show()

Countplot

如果想知道在每個類別下面有多少個觀察值，用Countplot就可以，相當於是做一個Observation Counts，用鳶尾花資料集展示如下：

plt.figure(figsize=(5,5)) sns.countplot(y="species", data=iris) # 設定y='species',將countplot水平放置

plt.title('Iris species count')

plt.show()

Pointplot

Pointplot相當於是對Barplot做了一個橫向延伸，一方面，用Point Estimate和Confidence Level去展示Barplot的內容；另一方面，當每一個主類別下面有更細分的Sub-Category的時候，Pointplot可以便於觀察不同Sub-Category在各主類別之間的聯絡。展示如下：

plt.figure(1,figsize=(12,12))for i in range(4): with sns.axes_style(axes_style[i]):#設定axes_style

sns.set_context(context[i]) # 設定context style,預設為notebook,除此之外還有paper,talk,poster plt.subplot(2,2,i+1)

plt.title(str(var[i])+ ' in Iris species') sns.pointplot(x='species',y=var[i],data=iris)

plt.show()

Factorplot

Factorplot可以說是Seaborn做Category Visualization的精髓，前面講的這些Plot都可以說是Factorplot的具體展示。我們可以用PariGrid去實現對多個類別的數值特徵用同一種Plot做視覺化。

sns.set(style="ticks") g = sns.PairGrid(iris, x_vars = ['sepal_length','sepal_width','petal_length','petal_width'], y_vars = 'species', aspect=0.75,size=4) # 設定間距和圖片大小 g.map(sns.violinplot,palette='pastel')

plt.show()

在這個資料集中，Quantitative的變數主要有房屋的面積Area,每平米單價Price，以及房屋總價Tprice。

科賽網(kesci.com)是聚合資料人才和行業問題的線上社群，重點打造的K-Lab線上資料分析協作平臺，為資料工作者的學習與工作帶來全新的體驗。

NumPy 二項分佈生成與 Seaborn 視覺化技巧
2024-05-27
視覺化
Python 視覺化 | Seaborn5 分鐘入門 (五)——lmplot
2020-01-17
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (七)——pairplot
2020-01-17
Python視覺化AI
Python Seaborn綜合指南，成為資料視覺化專家
2019-10-22
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (三)——boxplot 和 violinplot
2020-01-17
Python視覺化
Python 視覺化 | Seaborn5 分鐘入門 (四)——stripplot 和 swarmplot
2020-01-17
Python視覺化Swarm
Python 視覺化 | Seaborn5 分鐘入門 (一)——kdeplot 和 distplot
2020-01-17
Python視覺化
NumPy 正態分佈與 Seaborn 視覺化指南
2024-05-23
視覺化
Python 視覺化 | Seaborn5 分鐘入門 (六)——heatmap 熱力圖
2020-01-17
Python視覺化
從靜態到動態化，Python資料視覺化中的Matplotlib和Seaborn
2024-03-25
Python視覺化
多項分佈模擬及 Seaborn 視覺化教程
2024-06-03
視覺化
NumPy 均勻分佈模擬及 Seaborn 視覺化教程
2024-05-30
視覺化
NumPy 泊松分佈模擬與 Seaborn 視覺化技巧
2024-05-29
視覺化
資料視覺化Seaborn從零開始學習教程（二）顏色調控篇
2019-03-01
視覺化
資料視覺化Seaborn從零開始學習教程（三）資料分佈視覺化篇
2019-03-01
視覺化
NumPy 隨機資料分佈與 Seaborn 視覺化詳解
2024-05-21
隨機視覺化
Pandas資料視覺化工具——Seaborn用法整理
2019-01-30
視覺化
卡方分佈和 Zipf 分佈模擬及 Seaborn 視覺化教程
2024-06-04
視覺化
seaborn和pandas-missingno 的資料視覺化--使用畫圖--缺失值分析
2019-01-01
視覺化
python資料視覺化——echarts
2018-10-16
Python視覺化Echarts
python 資料視覺化利器
2019-02-28
Python視覺化
Python繪圖與視覺化
2020-02-17
Python繪圖視覺化
Python視覺化神器Yellowbrick使用
2020-04-06
Python視覺化
Python視覺化-氣泡圖
2018-04-24
Python視覺化
Python視覺化-折線圖
2018-04-24
Python視覺化
Python視覺化-地圖染色
2018-04-24
Python視覺化地圖
二進位制檔案視覺化（二）
2022-06-02
視覺化
pyecharts做資料視覺化(二)
2018-09-14
Echarts視覺化
資料視覺化Seaborn從零開始學習教程（一）風格選擇
2018-05-18
視覺化
Python資料視覺化matplotlib庫
2019-03-04
Python視覺化
Python視覺化(1)：折線圖
2018-12-03
Python視覺化
【Python視覺化】使用Pyecharts進行奧運會視覺化分析～
2020-04-29
Python視覺化Echarts
Python視覺化圖系列（1）-----jupyter notebook
2020-10-09
Python視覺化
Python資料視覺化---pygal模組
2020-04-20
Python視覺化
Python 如何實現資料視覺化
2019-05-11
Python視覺化
使用 Python 進行資料視覺化
2024-07-26
Python視覺化
Tableau視覺化結果的優化小技巧（二）
2020-11-02
視覺化優化
視覺化引擎antv系列之分面Facet(二)
2021-09-26
視覺化
視覺化
2020-08-04
視覺化

Python視覺化：Seaborn（二）

Stripplot

Swarmplot

Boxplot

Violinplot

Barplot

Countplot

Pointplot

Factorplot

相關文章