【python資料探勘課程】十二.Pandas、Matplotlib結合SQL語句對比圖分析

Eastmount發表於2017-03-20

原文網址 : https://blog.csdn.net/eastmount/article/details/64127445

一. 直方圖四圖對比

資料庫如下所示，包括URL、作者、標題、摘要、日期、閱讀量和評論數等。

執行結果如下所示，其中繪製多個圖的核心程式碼為：
  p1 = plt.subplot(221)
plt.bar(ind, num1, width, color='b', label='sum num')
plt.sca(p1)

完整程式碼如下：

# coding=utf-8
'''
' 這篇程式碼主要講述獲取MySQL中資料，再進行簡單的統計
' 統計採用SQL語句進行
'''

import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import pylab
import MySQLdb
from pylab import *

# 根據SQL語句輸出24小時的柱狀圖
try:
    conn = MySQLdb.connect(host='localhost',user='root',
                         passwd='123456',port=3306, db='test01')
    cur = conn.cursor() #資料庫遊標

    #防止報錯:UnicodeEncodeError: 'latin-1' codec can't encode character
    conn.set_character_set('utf8')
    cur.execute('SET NAMES utf8;')
    cur.execute('SET CHARACTER SET utf8;')
    cur.execute('SET character_set_connection=utf8;')
    

    #################################################
    # 2014年
    #################################################
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
            where DATE_FORMAT(FBTime,'%Y')='2014' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall() #獲取結果複製給result
    hour1 = [n[0] for n in result]
    print hour1
    num1 = [n[1] for n in result]
    print num1

    N =  12
    ind = np.arange(N)  #賦值0-11  
    width=0.35
    p1 = plt.subplot(221)
    plt.bar(ind, num1, width, color='b', label='sum num')   
    #設定底部名稱    
    plt.xticks(ind+width/2, hour1, rotation=40) #旋轉40度
    for i in range(12):   #中心底部翻轉90度
        plt.text(i, num1[i], str(num1[i]),
                 ha='center', va='bottom', rotation=45) 
    plt.title('2014 Number-12Month')    
    plt.sca(p1)


    #################################################
    # 2015年
    #################################################
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
            where DATE_FORMAT(FBTime,'%Y')='2015' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()        
    hour1 = [n[0] for n in result]
    print hour1
    num1 = [n[1] for n in result]
    print num1
    
    N =  12
    ind = np.arange(N)  #賦值0-11  
    width=0.35
    p2 = plt.subplot(222)
    plt.bar(ind, num1, width, color='r', label='sum num')   
    #設定底部名稱    
    plt.xticks(ind+width/2, hour1, rotation=40) #旋轉40度
    for i in range(12):   #中心底部翻轉90度
        plt.text(i, num1[i], str(num1[i]),
                 ha='center', va='bottom', rotation=45) 
    plt.title('2015 Number-12Month')    
    plt.sca(p2)


    #################################################
    # 2016年
    #################################################
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
            where DATE_FORMAT(FBTime,'%Y')='2016' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()        
    hour1 = [n[0] for n in result]
    print hour1
    num1 = [n[1] for n in result]
    print num1

    N =  12
    ind = np.arange(N)  #賦值0-11 
    width=0.35
    p3 = plt.subplot(223)
    plt.bar(ind, num1, width, color='g', label='sum num')   
    #設定底部名稱    
    plt.xticks(ind+width/2, hour1, rotation=40) #旋轉40度
    for i in range(12):   #中心底部翻轉90度
        plt.text(i, num1[i], str(num1[i]),
                 ha='center', va='bottom', rotation=45) 
    plt.title('2016 Number-12Month')    
    plt.sca(p3)

    
    #################################################
    # 所有年份資料對比
    #################################################
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()     
    hour1 = [n[0] for n in result]
    print hour1
    num1 = [n[1] for n in result]
    print num1

    N =  12
    ind = np.arange(N)  #賦值0-11  
    width=0.35
    p4 = plt.subplot(224)
    plt.bar(ind, num1, width, color='y', label='sum num')   
    #設定底部名稱    
    plt.xticks(ind+width/2, hour1, rotation=40) #旋轉40度
    for i in range(12):   #中心底部翻轉90度
        plt.text(i, num1[i], str(num1[i]),
                 ha='center', va='bottom', rotation=45) 
    plt.title('All Year Number-12Month')    
    plt.sca(p4)

    plt.savefig('ttt.png',dpi=400)    
    plt.show()

#異常處理
except MySQLdb.Error,e:
    print "Mysql Error %d: %s" % (e.args[0], e.args[1])
finally:
    cur.close()
    conn.commit()  
    conn.close()

二. Area Plot圖對比

執行效果如下所示，核心程式碼如下：
data = np.array([num1, num2, num3, num4])
d = data.T #轉置 12*4
df = DataFrame(d, index=hour1, columns=['All','2014', '2015', '2016'])
df.plot(kind='area', alpha=0.2) #設定顏色透明度
plt.savefig('csdn.png',dpi=400)
plt.show()
其中需要將num1~num4合併為[12,4]陣列，同時轉換為array，再轉置繪圖。index是設定X軸時間，columns是設定每行資料對應的值。kind='area'設定Area Plot圖，還有 'bar'(柱狀圖)、'barh'(柱狀圖-縱向)、'scatter'(散點圖)、'pie'(餅圖)。

該圖會將資料劃分為等級梯度，基本趨勢相同。
完整程式碼如下所示：

# coding=utf-8
'''
' 這篇程式碼主要講述獲取MySQL中資料，再進行簡單的統計
' 統計採用SQL語句進行 By：Eastmount CSDN
'''

import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import MySQLdb
from pandas import *

try:
    conn = MySQLdb.connect(host='localhost',user='root',
                         passwd='123456',port=3306, db='test01')
    cur = conn.cursor() #資料庫遊標

    #防止報錯:UnicodeEncodeError: 'latin-1' codec can't encode character
    conn.set_character_set('utf8')
    cur.execute('SET NAMES utf8;')
    cur.execute('SET CHARACTER SET utf8;')
    cur.execute('SET character_set_connection=utf8;')

    #所有部落格數
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
             group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()        #獲取結果複製給result
    hour1 = [n[0] for n in result]
    print hour1
    num1 = [n[1] for n in result]
    print num1

    #2014年部落格數
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
             where DATE_FORMAT(FBTime,'%Y')='2014' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()        
    num2 = [n[1] for n in result]
    print num2

    #2015年部落格數
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
             where DATE_FORMAT(FBTime,'%Y')='2015' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()       
    num3 = [n[1] for n in result]
    print num3

    #2016年部落格數
    sql = '''select MONTH(FBTime) as mm, count(*) as cnt from csdn_blog
             where DATE_FORMAT(FBTime,'%Y')='2016' group by mm;'''
    cur.execute(sql)
    result = cur.fetchall()       
    num4 = [n[1] for n in result]
    print num4

    #重點: 資料整合 [12,4]
    data = np.array([num1, num2, num3, num4])
    print data
    d = data.T #轉置
    print d
    df = DataFrame(d, index=hour1, columns=['All','2014', '2015', '2016'])
    df.plot(kind='area', alpha=0.2) #設定顏色 透明度
    plt.title('Arae Plot Blog-Month') 
    plt.savefig('csdn.png',dpi=400) 
    plt.show()

#異常處理
except MySQLdb.Error,e:
    print "Mysql Error %d: %s" % (e.args[0], e.args[1])
finally:
    cur.close()
    conn.commit()  
    conn.close()

三. MySQL語句獲取星期資訊

MySQL通過日期獲取星期的語句如下：

select  now(), case dayofweek(now())  
	when 1 then '星期日' 
	when 2 then '星期一' 
	when 3 then '星期二' 
	when 4 then '星期三' 
	when 5 then '星期四' 
	when 6 then '星期五' 
	when 7 then '星期六' end as 'week'  
from dual;

輸出如下圖所示：

Python對應的程式碼如下，獲取總的部落格星期分佈：

# coding=utf-8
'''
' 這篇程式碼主要講述獲取MySQL中資料，再進行簡單的統計
' 統計採用SQL語句進行 By：Eastmount CSDN
'''

import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import MySQLdb
from pandas import *

try:
    conn = MySQLdb.connect(host='localhost',user='root',
                         passwd='123456',port=3306, db='test01')
    cur = conn.cursor() #資料庫遊標

    #防止報錯:UnicodeEncodeError: 'latin-1' codec can't encode character
    conn.set_character_set('utf8')
    cur.execute('SET NAMES utf8;')
    cur.execute('SET CHARACTER SET utf8;')
    cur.execute('SET character_set_connection=utf8;')
    sql = '''select  
            COUNT(case dayofweek(FBTime)  when 1 then 1 end) AS '星期日',
            COUNT(case dayofweek(FBTime)  when 2 then 1 end) AS '星期一',
            COUNT(case dayofweek(FBTime)  when 3 then 1 end) AS '星期二',
            COUNT(case dayofweek(FBTime)  when 4 then 1 end) AS '星期三',
            COUNT(case dayofweek(FBTime)  when 5 then 1 end) AS '星期四',
            COUNT(case dayofweek(FBTime)  when 6 then 1 end) AS '星期五',
            COUNT(case dayofweek(FBTime)  when 7 then 1 end) AS '星期六'
            from csdn_blog;
          '''
    cur.execute(sql)
    result = cur.fetchall()     
    print result
    #((31704L, 43081L, 42670L, 43550L, 41270L, 39164L, 29931L),)
    name = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']
    #轉換為numpy陣列
    data = np.array(result)
    print data
    d = data.T #轉置
    print d

    matplotlib.style.use('ggplot')
    df=DataFrame(d, index=name,columns=['Nums'])
    df.plot(kind='bar')
    plt.title('All Year Blog-Week')    
    plt.xlabel('Week')
    plt.ylabel('The number of blog')
    plt.savefig('01csdn.png',dpi=400)
    plt.show()

#異常處理
except MySQLdb.Error,e:
    print "Mysql Error %d: %s" % (e.args[0], e.args[1])
finally:
    cur.close()
    conn.commit()  
    conn.close()

執行結果如下所示：

四. 星期資料柱狀圖及折線圖對比

下面獲取四年的資料進行對比，程式碼如下所示：

核心程式碼如下，注意三個一維陣列轉換為num[7][3]二維陣列的方法。
data = np.random.rand(7,3)
print data
i = 0
while i<7:
data[i][0] = d1[i]
data[i][1] = d2[i]
data[i][2] = d3[i]
i = i + 1
matplotlib.style.use('ggplot')
#資料[7,3]陣列 name為星期 columns對應年份
df=DataFrame(data, index=name, columns=['2008','2011','2016'])
df.plot(kind='bar')
plt.show()

完整程式碼為：

# coding=utf-8
'''
' 這篇程式碼主要講述獲取MySQL中資料，再進行簡單的統計
' 統計採用SQL語句進行 By:Eastmount CSDN 楊秀璋
'''

import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np
import MySQLdb
from pandas import *

try:
    conn = MySQLdb.connect(host='localhost',user='root',
                         passwd='123456',port=3306, db='test01')
    cur = conn.cursor() #資料庫遊標

    #防止報錯:UnicodeEncodeError: 'latin-1' codec can't encode character
    conn.set_character_set('utf8')
    cur.execute('SET NAMES utf8;')
    cur.execute('SET CHARACTER SET utf8;')
    cur.execute('SET character_set_connection=utf8;')
    sql = '''select  
            COUNT(case dayofweek(FBTime)  when 1 then 1 end) AS '星期日',
            COUNT(case dayofweek(FBTime)  when 2 then 1 end) AS '星期一',
            COUNT(case dayofweek(FBTime)  when 3 then 1 end) AS '星期二',
            COUNT(case dayofweek(FBTime)  when 4 then 1 end) AS '星期三',
            COUNT(case dayofweek(FBTime)  when 5 then 1 end) AS '星期四',
            COUNT(case dayofweek(FBTime)  when 6 then 1 end) AS '星期五',
            COUNT(case dayofweek(FBTime)  when 7 then 1 end) AS '星期六'
            from csdn_blog where DATE_FORMAT(FBTime,'%Y')='2008';
          '''
    cur.execute(sql)
    result1 = cur.fetchall()        
    print result1
    name = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']
    data = np.array(result1)
    d1 = data.T #轉置
    print d1


    sql = '''select  
            COUNT(case dayofweek(FBTime)  when 1 then 1 end) AS '星期日',
            COUNT(case dayofweek(FBTime)  when 2 then 1 end) AS '星期一',
            COUNT(case dayofweek(FBTime)  when 3 then 1 end) AS '星期二',
            COUNT(case dayofweek(FBTime)  when 4 then 1 end) AS '星期三',
            COUNT(case dayofweek(FBTime)  when 5 then 1 end) AS '星期四',
            COUNT(case dayofweek(FBTime)  when 6 then 1 end) AS '星期五',
            COUNT(case dayofweek(FBTime)  when 7 then 1 end) AS '星期六'
            from csdn_blog where DATE_FORMAT(FBTime,'%Y')='2011';
          '''
    cur.execute(sql)
    result2 = cur.fetchall()        
    data = np.array(result2)
    d2 = data.T #轉置
    print d2


    sql = '''select  
            COUNT(case dayofweek(FBTime)  when 1 then 1 end) AS '星期日',
            COUNT(case dayofweek(FBTime)  when 2 then 1 end) AS '星期一',
            COUNT(case dayofweek(FBTime)  when 3 then 1 end) AS '星期二',
            COUNT(case dayofweek(FBTime)  when 4 then 1 end) AS '星期三',
            COUNT(case dayofweek(FBTime)  when 5 then 1 end) AS '星期四',
            COUNT(case dayofweek(FBTime)  when 6 then 1 end) AS '星期五',
            COUNT(case dayofweek(FBTime)  when 7 then 1 end) AS '星期六'
            from csdn_blog where DATE_FORMAT(FBTime,'%Y')='2016';
          '''
    cur.execute(sql)
    result3 = cur.fetchall()       
    data = np.array(result3)
    print type(result3),type(data)
    d3 = data.T #轉置
    print d3


    #SQL語句獲取3個陣列，採用迴圈複製到一個[7][3]的二維陣列中
    data = np.random.rand(7,3)
    print data
    i = 0
    while i<7:
        data[i][0] = d1[i]
        data[i][1] = d2[i]
        data[i][2] = d3[i]
        i = i + 1

    print data
    print type(data)

    #繪圖
    matplotlib.style.use('ggplot')
    #資料[7,3]陣列 name為星期 columns對應年份
    df=DataFrame(data, index=name, columns=['2008','2011','2016'])
    df.plot(kind='bar')   
    plt.title('Comparison Chart Blog-Week')    
    plt.xlabel('Week')
    plt.ylabel('The number of blog')
    plt.savefig('03csdn.png', dpi=400)
    plt.show()



#異常處理
except MySQLdb.Error,e:
    print "Mysql Error %d: %s" % (e.args[0], e.args[1])
finally:
    cur.close()
    conn.commit()  
    conn.close()

其中將程式碼 "df.plot(kind='bar')" 修改為 "df.plot()" 即為折線圖。

講到這裡，通過Pandas、Matplotlib、Numpy結合MySQL視覺化分析，並且進階對比圖片函式的分析過程已經講完了，後面會結合SQL資料庫做一些詞雲WordCloud、顏色圖、Power-low圖等分析。

希望文章對你有所幫助，尤其是結合資料庫做資料分析的人。還是那句話，如果剛好需要這部分知識，你就會覺得非常有幫助，否則只是覺得好玩，這也是線上筆記的作用。如果文章中存在不足或錯誤的地方，還請海涵~

最近可能有些事情需要發生，我都需要平常心對待，真的好喜歡教學，認真教學生些東西，但是又覺得 "教優則仕" 也有道理！做自己，為每一個自己的學生付出我所能做的所有。同時，真的心疼綠么，但是有她陪著真的感覺兩個人能克服一切，心安娜美~

   視覺化推薦下面的文章：
  [轉] 使用python繪製簡單的圖表 - 初雪之音（強推）
  利用Python進行資料分析——繪圖和視覺化(八) （強推）
  用 Seaborn 畫出好看的分佈圖（Python） [強推]
  10分鐘python圖表繪製 | seaborn入門（一）：distplot與kdeplot
  python資料視覺化(matplotlib,pandas繪圖，散點圖，柱狀圖，折線圖，箱線圖)
Python之numpy教程（三）：轉置、乘積、通用函式

(By:Eastmount 2017-03-20 晚上7點 http://blog.csdn.net/eastmount/ )

Python 資料分析：讓你像寫 Sql 語句一樣，使用 Pandas 做資料分析
2019-06-14
PythonSQL
SQL與Pandas大資料分析效能對比（Haki Benita）
2021-04-30
SQL大資料
【python資料探勘課程】二十四.KMeans文字聚類分析互動百科語料
2018-07-06
Python聚類
【python資料探勘課程】二十五.Matplotlib繪製帶主題及聚類類標的散點圖
2018-07-18
Python聚類
【python資料探勘課程】二十三.時間序列金融資料預測及Pandas庫詳解
2018-05-09
Python
【python資料探勘課程】二十六.基於SnowNLP的豆瓣評論情感分析
2018-12-21
Python
【python資料探勘課程】二十七.基於SVM分類器的紅酒資料分析
2019-01-16
Python
Oracle資料庫SQL語句執行過程
2019-01-15
Oracle資料庫SQL
Python - pandas 資料分析
2020-04-05
Python
Python 利用pandas和matplotlib繪製餅圖
2023-11-03
Python
【資料庫】SQL語句
2018-05-30
資料庫SQL
用Jupyter+pandas資料分析，6種資料格式效率對比
2020-10-29
複製表結構和資料SQL語句
2018-11-01
SQL
Python資料分析之pandas
2018-07-23
Python
python pandas Join SQL⻛格合併
2020-12-27
PythonSQL
Python 利用pandas 和 matplotlib繪製柱狀圖
2023-10-28
Python
《資料分析與資料探勘》--天津大學公開課
2020-10-09
1.4 資料庫和常用SQL語句（正文）——MySQL資料庫命令和SQL語句
2021-03-07
資料庫MySql
SQL Server 資料庫部分常用語句小結（二）
2018-11-10
SQLServer資料庫
SQL Server 資料庫部分常用語句小結（一）
2018-09-15
SQLServer資料庫
Python資料分析之Pandas篇
2020-10-05
Python
【matplotlib基礎】--結合地圖
2023-09-25
地圖
【SQL】Oracle sql語句 minus函式執行效率與join對比
2021-07-14
SQLOracle函式
Python屬不屬於組合語言？Python課程
2021-04-16
Python組合語言
資料庫常用的sql語句大全--sql
2022-03-22
資料庫SQL
資料庫常用操作SQL語句
2019-02-12
資料庫SQL
關係型資料庫查詢語言 SQL 和圖資料庫查詢語言 nGQL 對比
2020-07-23
資料庫SQL
萌新向Python資料分析及資料探勘前言
2018-12-25
Python
入門資料分析選擇Python還是SQL？七個常用操作對比
2020-11-05
PythonSQL
Python資料分析 Pandas模組基礎資料結構與簡介
2018-12-14
Python資料結構
mysql執行sql語句過程
2021-09-09
MySql
Python 利用pandas和matplotlib繪製柱狀折線圖
2023-11-09
Python
[20201105]再分析sql語句.txt
2020-11-05
SQL
Mybatis原始碼分析（五）探究SQL語句的執行過程
2019-03-10
MyBatis原始碼SQL
用一條SQL語句顯示所有可能的比賽組合
2018-12-17
SQL
SQL語句規範總結
2019-12-24
SQL
sql語句學習總結
2020-10-12
SQL
MySQL基本sql語句總結
2021-07-05
MySql
Python利用pandas處理資料與分析
2024-03-25
Python

【python資料探勘課程】十二.Pandas、Matplotlib結合SQL語句對比圖分析

一. 直方圖四圖對比

二. Area Plot圖對比

三. MySQL語句獲取星期資訊

四. 星期資料柱狀圖及折線圖對比

相關文章