在模仿中精進資料分析與視覺化01——顆粒物濃度時空變化趨勢（Mann–Kendall Test）

江流石發表於2021-06-12

原文網址 : https://www.cnblogs.com/djweasy/p/14871947.html

本文是在模仿中精進資料分析與視覺化系列的第一期——顆粒物濃度時空變化趨勢（Mann–Kendall Test），主要目的是參考其他作品模仿學習進而提高資料分析與視覺化的能力，如果有問題和建議，歡迎在評論區指出。若有其他想要看的作品，也歡迎在評論區留言並給出相關資訊。
所用資料和程式碼的下載地址如下：

連結：https://pan.baidu.com/s/1IixHE9aPf1u9qFkdAdHQaA
提取碼：hmq2
複製這段內容後開啟百度網盤手機App，操作更方便哦

簡介

本次要模仿的作品來自論文Investigating the Impacts of Urbanization on PM_2.5 Pollution in the Yangtze River Delta of China: A Spatial Panel Data Approach，研究區域為上海、安徽、浙江和江蘇，所用資料為 2002–2017該區域PM_2.5濃度柵格資料，資料來源於 Dalhousie University Atmospheric Composition Analysis Group開發的年均PM_2.5資料集V4.CH.03，空間解析度為0.01°×0.1°（原論文采用資料的空間解析度為1km×1km，但我在該網站上找不到，可能是不提供下載了）。

資料下載和處理

資料下載格式為.asc，使用arcpy將其轉為.tif格式，所用程式碼如下。

# -*- coding: utf-8 -*-

import arcpy
import os

inpath = "./ASCII" #待轉換的柵格的儲存路徑，會轉換該路徑下的所有柵格
outpath = "./TIF" #輸出柵格的路徑，最好是空路徑
filetype = "FLOAT"

print "Starting Convert!"
for filename in os.listdir(inpath):
    if filename.endswith(".asc"):
        filepath = os.path.join(inpath, filename)
        outfilepath = os.path.join(outpath, filename.replace(".asc", ".tif"))
        arcpy.ASCIIToRaster_conversion(filepath, outfilepath, filetype)
print "Convert Over!"

Mann–Kendall趨勢分析

Mann–Kendall趨勢分析的具體計算方法這裡不再贅述，原文作者採用R語言的trend package計算的，本文采用python的pymannkendall計算，github專案地址為https://github.com/mmhs013/pyMannKendall。
原文的趨勢分析包括兩部分，一部分是計算slope值，slope值為正，則表明具有上升的趨勢，反之亦然；另一部分是計算p值，p值越小趨勢越顯著，0.01<p<0.05說明趨勢顯著，p<0.01說明趨勢非常顯著。二者分別採用pymannkendall的sens_slope和original_test函式計算，pymannkendall的簡單用法介紹如下。

A quick example of pyMannKendall usage is given below. Several more examples are provided here.

import numpy as np
import pymannkendall as mk

# Data generation for analysis
data = np.random.rand(360,1)

result = mk.original_test(data)
print(result)

Output are like this:

Mann_Kendall_Test(trend='no trend', h=False, p=0.9507221701045581, z=0.06179991635055463, Tau=0.0021974620860414733, s=142.0, var_s=5205500.0, slope=1.0353584906597959e-05, intercept=0.5232692553379981)

Whereas, the output is a named tuple, so you can call by name for specific result:

print(result.slope)

or, you can directly unpack your results like this:

trend, h, p, z, Tau, s, var_s, slope, intercept = mk.original_test(data)

計算並儲存結果

這裡依然使用arcpy作為分析計算的工具，所用程式碼如下。
pymannkendall較為臃腫，計算速度很慢(全部計算用了十幾分鍾)，並且暫不支援numba加速，有需要大量計算的可根據其原始碼重新編寫函式，實現numba加速，如本文的get_slope函式，在使用numba加速後計算pvalues僅需4秒，使用pymannkendall的sens_test則需要幾分鐘的時間。

# -*- coding: utf-8 -*-

import arcpy
import os
from glob import glob
import numpy as np
import pymannkendall as mk

inpath = r"./TIF" #.tif檔案的儲存路徑
p_path = r"./pvalues.tif" #p-values的輸出路徑
slope_path = r"./slopes.tif" #slopes的輸出路徑
trend_path = r"./trends.tif" #原圖左圖中不同的趨勢
border_path = r"./Shapefiles/border.shp" #研究區域

# 獲取2002-2017年的柵格資料的路徑
def get_raster_paths(inpath):
    paths = []
    for year in range(2002, 2018):
        year_path = glob(os.path.join(inpath, "*"+str(year)+"*.tif"))
        if year_path:
            paths.append(year_path[0])
        else:
            print "can't find raster of {} year!".format(year)
    return paths

# 裁剪柵格，並將結果轉為numpy陣列
def clip_raster_to_array(paths, border):
    out_image = arcpy.sa.ExtractByMask(paths[0], border)
    # 掩膜提取
    x_cell_size, y_cell_size = out_image.meanCellWidth, out_image.meanCellHeight #x,y方向的像元大小
    ExtentXmin, ExtentYmin = out_image.extent.XMin, out_image.extent.YMin #取x,y座標最小值
    lowerLeft = arcpy.Point(ExtentXmin, ExtentYmin) #取得資料起始點範圍
    noDataValue = out_image.noDataValue #取得資料的noData值
    out_image = arcpy.RasterToNumPyArray(out_image) #將柵格轉為numpy陣列
    out_image[out_image==noDataValue] = np.NAN #將陣列中的noData值設為nan
    arrays = np.full(shape=(len(paths), out_image.shape[0], out_image.shape[1]),
                     fill_value=np.NAN, dtype=out_image.dtype)
    arrays[0] = out_image
    
    for i in range(1, len(paths)):
        out_image = arcpy.sa.ExtractByMask(paths[i], border)
        out_image = arcpy.RasterToNumPyArray(out_image)
        out_image[out_image==noDataValue] = np.NAN
        arrays[i] = out_image
    
    return arrays, (lowerLeft, x_cell_size, y_cell_size, noDataValue)

def array_to_raster(path, data, rasterInfo):
    new_raster = arcpy.NumPyArrayToRaster(data, *rasterInfo) #陣列轉柵格
    new_raster.save(path) #儲存柵格

# 計算slope值
def get_slope(x):
    if np.isnan(x).any():
        return np.NAN
    idx = 0
    n = len(x)
    d = np.ones(int(n*(n-1)/2))

    for i in range(n-1):
        j = np.arange(i+1,n)
        d[idx : idx + len(j)] = (x[j] - x[i]) / (j - i)
        idx = idx + len(j)
        
    return np.median(d)

# 計算p值
def get_pvalue(x):
    if np.isnan(x).any():
        return np.NAN
    result = mk.original_test(x)
    return result.p

paths = get_raster_paths(inpath)
arrays, rasterinfo = clip_raster_to_array(paths, border_path)
print "clip raster to array over!"

slopes = np.apply_along_axis(get_slope, 0, arrays)
print "calculate p-value over!"
pvalues = np.apply_along_axis(get_pvalue, 0, arrays)
print "calculate slope over!"

#計算有顯著和非常顯著趨勢的區域
trends = np.full(shape=slopes.shape, fill_value=np.NaN)
trends[~np.isnan(slopes)] = 0 #不顯著的區域設為0
trends[(slopes>0) & ((0.01<pvalues) & (pvalues<0.05))] = 1 #比較顯著增加的區域設為1
trends[(slopes>0) & (pvalues<0.01)] = 2 #顯著增加的區域設為2
trends[(slopes<0) & ((0.01<pvalues) & (pvalues<0.05))] = 3 #比較顯著減少的區域設為3
trends[(slopes<0) & (pvalues<0.01)] = 4 #顯著減少的區域設為4

# 儲存柵格
array_to_raster(p_path, pvalues, rasterinfo)
array_to_raster(slope_path, slopes, rasterinfo)
array_to_raster(trend_path, trends, rasterinfo)
print "save rasters over!"

結果繪圖

由於QGIS軟體開啟和一些相關操作的速度都要比ArcGIS快的多，而且QGIS內建的取色器的功能也方便繪圖時設定顏色，因此本文使用QGIS繪製結果圖，如下圖所示。

(在模仿中精進資料視覺化01) 全國38城居住自由指數視覺化
2020-09-03
視覺化
（在模仿中精進資料視覺化03）OD資料的特殊視覺化方式
2020-10-18
視覺化
（在模仿中精進資料視覺化09）近期基金漲幅排行視覺化
2021-06-27
視覺化
(在模仿中精進資料視覺化02) 溫室氣體排放來源視覺化
2020-09-14
視覺化
（在模仿中精進資料視覺化04）舊金山街道樹木分佈視覺化
2020-11-22
視覺化
（在模仿中精進資料視覺化07）星球研究所大壩分佈視覺化
2021-01-27
視覺化
（在模仿中精進資料視覺化05）疫情期間市值增長top25公司
2020-11-29
視覺化
（在模仿中精進資料視覺化06）常見抽象地圖的製作方法
2021-01-03
視覺化抽象地圖
資料視覺化專案---客源分析趨勢圖
2020-10-29
視覺化
資料分析-以最小顆粒的方式進行
2020-09-10
Geopandas——從“視覺化”到“字母化”的空間資料分析
2018-12-07
視覺化
中國大學排名資料分析與視覺化
2024-05-29
視覺化
python資料分析與視覺化基礎
2024-08-02
Python視覺化
業務顆粒化思考
2020-08-19
資料視覺化--實驗五：高維非空間資料視覺化
2021-01-01
視覺化
資料視覺化在展廳中可以體現的優勢
2022-11-25
視覺化
[資料分析與視覺化] Python繪製資料地圖2-GeoPandas地圖視覺化
2023-04-09
視覺化Python地圖
詳談資料視覺化的現狀及發展趨勢
2019-03-05
視覺化
視覺化資料分析軟體
2021-11-30
視覺化
大資料視覺化優勢在哪
2022-06-29
大資料視覺化
從零開始學習時空資料視覺化（序）
2019-02-25
視覺化
資料視覺化能否代替資料分析
2021-12-01
視覺化
地理空間資料分析與視覺化：洞察地理現象的智慧之眼
2024-06-04
視覺化
視覺化中的資料
2019-02-27
視覺化
python對資料集進行清洗與視覺化
2021-03-25
Python視覺化
python資料分析與視覺化【思維導圖】
2023-01-09
Python視覺化
（資料科學學習手札64）在jupyter notebook中利用kepler.gl進行空間資料視覺化
2019-07-06
資料科學視覺化
併發優化 - 降低鎖顆粒
2018-12-06
優化
併發優化 – 降低鎖顆粒
2019-02-28
優化
使用 Python 進行資料視覺化
2024-07-26
Python視覺化
大資料視覺化有哪些優勢
2022-05-19
大資料視覺化
高效資料通道支撐生產情況實時分析與視覺化
2022-11-03
視覺化
Mann-Kendall演算法用於金融品種長週期趨勢判斷和變點檢測，以及策略思路
2019-07-24
演算法
軟體SaaS化大勢所趨資訊化演進背景下潛在市場空間巨大
2019-06-27
在進行大資料視覺化分析時，到底要如何進行呢？
2022-09-19
大資料視覺化
詳解Python 中視覺化資料分析工作流程
2024-05-10
Python視覺化
互動式資料視覺化的優勢
2022-03-31
視覺化
大資料視覺化有哪些優勢呢？
2022-06-27
大資料視覺化

在模仿中精進資料分析與視覺化01——顆粒物濃度時空變化趨勢（Mann–Kendall Test）

簡介

資料下載和處理

Mann–Kendall趨勢分析

計算並儲存結果

結果繪圖

相關文章