R語言實現倫敦各地區預期壽命與全國平均水平差異地理資料視覺化（熱力圖）

Hotomoderato_Han發表於2020-10-30

原文網址 : https://blog.csdn.net/weixin_38955499/article/details/109377412

R語言實現倫敦各地區預期壽命與全國平均水平差異地理資料視覺化

讀取csv檔案

使用read.csv()直接從web讀取並在數字列中清除文字字元

read.csv( ) 與 read_csv( ) 區別

read.csv( )	read_csv( )
R中預設的csv讀取方式	readr包提供的讀取方式
適用於小檔案	適用於較大csv檔案

LondonData <- read_csv("https://files.datapress.com/london/dataset/ward-profiles-and-atlas/2015-09-24T14:21:24/ward-profiles-excel-version.csv",
                       locale = locale(encoding = "UTF-8"),
                       na = "n/a")

encoding = “UTF-8” , 在UTF-8中每個字元可以包含一個以上的位元組，但是R中預設的編碼方式為latin1，所以需要修改編碼方式。

檢查是否正確讀入資料

方法一

使用class（）

class(LondonData)

輸出

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

方法二

使用dplyr包中的summarise_all（）和pivot_longer（）檢視變數與變數型別

Datatypelist <- LondonData %>% 
  summarise_all(class) %>%
  pivot_longer(everything(), 
               names_to="All_variables", 
               values_to="Variable_class")

Datatypelist

輸出

# A tibble: 67 x 2
   All_variables                      Variable_class
   <chr>                              <chr>         
 1 Ward name                          character     
 2 Old code                           character     
 3 New code                           character     
 4 Population - 2015                  numeric       
 5 Children aged 0-15 - 2015          numeric       
 6 Working-age (16-64) - 2015         numeric       
 7 Older people aged 65+ - 2015       numeric       
 8 % All Children aged 0-15 - 2015    numeric       
 9 % All Working-age (16-64) - 2015   numeric       
10 % All Older people aged 65+ - 2015 numeric       
# … with 57 more rows

資料篩選

這時所有數字列均已經以數字形式讀入，現在已經有一些資料讀入R，我們需要選擇一個小的子集，只選取倫敦的資料進行處理。因為倫敦自治市的程式碼以E09開頭（檔案其餘部分的區號以E05開頭）所以使用filter（）函式選取需要的資料子集（類似於SQL中的select * from…where…)。
這時候問題又出現了，New code列使用的是字元格式而非整數，在這種情況下我們可以使用str_detect ( ) (stringr字元處理包）與filter ( ) 結合使用。

LondonBoroughs<- LondonData %>% 
  filter(str_detect(`New code`, "^E09"))

檢查輸出結果

LondonBoroughs$`Ward name`

資料提取

異常資料處理

這個時候又出現問題了，發現City of London有兩行，所以提取唯一行，使用distinct( ):

LondonBoroughs<-LondonBoroughs %>%
  distinct()

這樣就OK了！

在這裡插入圖片描述
由於現在在對 “borough” 進行統計而不是“ward”，這樣的命名容易產生誤導，所以最好養成良好的習慣對列名進行重新命名保持一致性。

library(janitor)

LondonBoroughs <- LondonBoroughs %>%
  dplyr::rename(Borough=`Ward name`)%>%
  clean_names()

資料處理

計算：
a. 平均預期壽命
b. 基於a的每個地區的歸一化值
使用mutate( ) 在現有變數的基礎上增加新變數

Life_expectancy <- LondonBoroughs %>% 
  #平均男女預期壽命
  mutate(averagelifeexpectancy= (female_life_expectancy_2009_13 +
                                   male_life_expectancy_2009_13)/2)%>%
  #歸一化壽命
  mutate(normalisedlifeepectancy= averagelifeexpectancy /
           mean(averagelifeexpectancy))%>%
  #挑選需要的列
  select(new_code,
         borough,
         averagelifeexpectancy, 
         normalisedlifeepectancy)%>%
  #降序排列
  arrange(desc(normalisedlifeepectancy))

c. 使用case_when( )比較各地區預期壽命與英國平均壽命81.16

Life_expectancy2 <- Life_expectancy %>%
  mutate(UKcompare = case_when(averagelifeexpectancy>81.16 ~ "above UK average",
                               TRUE ~ "below UK average"))
Life_expectancy2

在這裡插入圖片描述
d. 計算兩者差值

Life_expectancy2_group <- Life_expectancy2 %>%
  mutate(UKdiff = averagelifeexpectancy-81.16) %>%
  group_by(UKcompare)%>%
  summarise(range=max(UKdiff)-min(UKdiff), count=n(), Average=mean(UKdiff))

在這裡插入圖片描述
e.根據差值對區域進行統計
1）將列UKdiff舍入到0個小數位（不新增新列）
2）使用case_when()發現有相等的平均年齡或超過81的區域，並基於文字的合併“equal or above UK average by” 建立一個新的列containts，然後在UKdiff劃分差異年數。通過str_c()函式，將兩個或多個向量元素連線到單個字元向量， sep確定如何將這兩個向量隔開。
3）按UKcompare列分組。
4）計算每組中的區域數。

Life_expectancy3 <- Life_expectancy %>%
  mutate(UKdiff = averagelifeexpectancy-81.16)%>%
  mutate(across(where(is.numeric), round, 3))%>%
  mutate(across(UKdiff, round, 0))%>%
  mutate(UKcompare = case_when(averagelifeexpectancy >= 81 ~ 
                                 str_c("equal or above UK average by",
                                       UKdiff, 
                                       "years", 
                                       sep=" "), 
                               TRUE ~ str_c("below UK average by",
                                            UKdiff,
                                            "years",
                                            sep=" ")))%>%
  group_by(UKcompare)%>%
  summarise(count=n())

在這裡插入圖片描述

熱力圖視覺化

使用maptools

install.packages("maptools")
install.packages(c("classInt", "tmap"))

# might also need these ones
install.packages(c("RColorBrewer", "sp", "rgeos", 
                   "tmaptools", "sf", "downloader", "rgdal", 
                   "geojsonio"))

直接從opendata讀取GeoJson檔案

# this will take a few minutes
EW <- st_read("https://opendata.arcgis.com/datasets/8edafbe3276d4b56aec60991cbddda50_2.geojson")

下載讀取shp資料

# shapefile in local folder
EW <- st_read(here::here("prac2_data",
                        "Local_Authority_Districts__December_2015__Boundaries-shp",
                        "Local_Authority_Districts__December_2015__Boundaries.shp"))

查詢倫敦的地區並繪製圖形

LondonMap<- EW %>%
  filter(str_detect(lad15cd, "^E09"))

#plot it using the qtm function
qtm(LondonMap)

在這裡插入圖片描述

在建立地圖之前，需要使用merge()將一些屬性資料連線到地圖，但是首先要Janitor再次清理。

LondonData <- clean_names(LondonData)

#直接從web寫入
BoroughDataMap <- EW %>%
  clean_names()%>%
  # . 表示已經載入的資料
  filter(str_detect(lad15cd, "^E09"))%>%
  merge(.,
        LondonData, 
        by.x="lad15cd", 
        by.y="new_code",
        no.dups = TRUE)%>%
  distinct(.,lad15cd, 
           .keep_all = TRUE)

distinct()這意味著僅基於程式碼具有唯一的行，但保留所有其他變數.keep_all=TRUE。如果更改為.keep_all=FALSE（預設設定），則所有其他變數都將被刪除。

使用qtm( ) 快速建立Choropleth貼圖

tmap_mode("plot")

qtm(BoroughDataMap, 
    fill = "rate_of_job_seekers_allowance_jsa_claimants_2015")

在這裡插入圖片描述
不好看！這真的不好看！加個底圖吧！
使用read_osm()函式從OpenStreetMap（OSM）中提取底圖tmaptools
st_box()函式在倫敦周圍建立一個框，sf以提取底圖影像

tmaplondon <- BoroughDataMap %>%
  st_bbox(.) %>% 
  tmaptools::read_osm(., type = "osm", zoom = NULL)

tmap進行繪製，新增底圖，新增倫敦的形狀，要對映的屬性，進行顏色劃分的樣式，透明度（alpha），指南針，比例和圖例。

tmap_mode("plot")

tm_shape(tmaplondon)+
tm_rgb()+
tm_shape(BoroughDataMap) + 
tm_polygons("rate_of_job_seekers_allowance_jsa_claimants_2015", 
        style = "jenks",
        palette = "YlOrBr",
        midpoint = NA,
        title = "Rate per 1,000 people",
        alpha = 0.5) + 
  tm_compass(position = c("left", "bottom"),type = "arrow") + 
  tm_scale_bar(position = c("left", "bottom")) +
  tm_layout(title = "Job seekers' Allowance Claimants", legend.position = c("right", "bottom"))

在這裡插入圖片描述

Finally！合併Life_expectancy4map的空間資料EW並對映合併 tmap

Life_expectancy4map <- EW %>%
  merge(.,
        Life_expectancy4, 
        by.x="lad15cd", 
        by.y="new_code",
        no.dups = TRUE)%>%
  distinct(.,lad15cd, 
           .keep_all = TRUE)

tmap_mode("plot")
tm_shape(tmaplondon)+
  tm_rgb()+
  tm_shape(Life_expectancy4map) + 
  tm_polygons("UKdiff", 
              style="pretty",
              palette="Blues",
              midpoint=NA,
              title="Number of years",
              alpha = 0.5) + 
  tm_compass(position = c("left", "bottom"),type = "arrow") + 
  tm_scale_bar(position = c("left", "bottom")) +
  tm_layout(title = "Difference in life expectancy", legend.position = c("right", "bottom"))

在這裡插入圖片描述
這樣就大功告成啦！

第一次用R繪製熱力圖，感覺不管是資料處理還是地圖繪製的工具包都是比較完備的，不用自己寫函式方法，但是還要多熟練一下R的操作。

Web資料視覺化-手把手教你實現熱力圖
2019-01-13
Web視覺化
資料視覺化：淺談熱力圖如何在前端實現
2018-11-05
視覺化前端
倫敦大學學院：研究顯示100年來首次人類預期壽命不再增加
2020-03-08
R語言之視覺化①②熱圖繪製2
2021-09-09
R語言視覺化
地理探測器R語言實現：geodetector
2024-03-18
R語言
地理空間資料分析與視覺化：洞察地理現象的智慧之眼
2024-06-04
視覺化
【視覺化】熱力圖繪製原理
2018-11-14
視覺化
1851年-2011年全球各國預期壽命變化情況
2021-04-06
體系課-資料視覺化入門到精通-打造前端差異化競爭力
2020-12-15
視覺化前端
R語言：KEGG富集、視覺化教程，附程式碼
2024-06-14
R語言視覺化
個推資料視覺化之人群熱力圖、訊息下發圖前端開發實踐
2019-09-18
視覺化前端
快速實現地圖遷移資料視覺化
2018-12-19
地圖視覺化
Python與其它程式語言的差異化總結
2019-03-16
Python
CDC：2020年美國人均預期壽命下降1.5年
2021-07-21
一圖讀懂疫情期間全國商場“熱力圖”
2020-03-01
CDC：2020年美國人均預期壽命縮短1.8歲
2021-12-26
[資料分析與視覺化] Python繪製資料地圖2-GeoPandas地圖視覺化
2023-04-09
視覺化Python地圖
ECharts與資料視覺化：如何高效使用JavaScript實現複雜圖表
2024-08-04
Echarts視覺化JavaScript
【資料視覺化】周杰倫新歌《Mojito》豆瓣短評資料
2020-06-26
視覺化
Python 如何實現資料視覺化
2019-05-11
Python視覺化
R語言入門與資料分析
2024-04-20
R語言
R語言之視覺化①③散點圖+擬合曲線
2018-11-18
R語言視覺化
R語言熱力地圖之漸變分析——西班牙開店選址
2019-03-22
R語言地圖
倫敦工商會：促進倫敦全球化城市的地位
2023-04-09
資料視覺化之下發圖實踐
2019-04-18
視覺化
R視覺化09|ggplot2-圖層圖形語法 (1)
2020-10-14
視覺化
研究發現對未來悲觀的人，或降低預期壽命
2020-08-03
地理資料視覺化的神奇組合：Python和Geopandas
2024-05-15
視覺化Python
美國CDC：2020年美國人均預期壽命縮短近2年
2021-12-23
獵豹：一圖讀懂疫情期間全國商場“熱力圖”
2020-02-25
使用Echarts來實現資料視覺化
2018-03-28
Echarts視覺化
大資料視覺化該如何實現
2021-12-01
大資料視覺化
美CDC：2021年美國人的預期壽命下降到76.4歲
2022-12-25
如果倫敦地鐵圖是資料科學家畫的……
2018-09-08
資料科學
python資料分析與視覺化【思維導圖】
2023-01-09
Python視覺化
結構化資料與非結構化資料的差異
2022-03-01
分享20份大屏視覺化模板，輕鬆實現資料視覺化
2021-03-03
視覺化
資料視覺化實踐
2018-10-25
視覺化

R語言實現倫敦各地區預期壽命與全國平均水平差異地理資料視覺化（熱力圖）

R語言實現倫敦各地區預期壽命與全國平均水平差異地理資料視覺化

讀取csv檔案

檢查是否正確讀入資料

資料篩選

異常資料處理

資料處理

熱力圖視覺化

相關文章