Tutorials 使用視窗功能分析資訊

眼淚成詩hocc發表於2016-01-17

本教程中簡要介紹了分析鑽1.2,即ANSI相容sql解析和視窗函式。鑽支援以下SQL視窗函式:

  • PATRIRION BY和OVER條款

  • 各種聚合視窗函式,Sum,Max,Min,Count,Avg

  • 分析函式,如First_Value Last_Value、Lead、Lag,NTile,Row_Number和Rank

視窗功能多種多樣。你可以減少連線、子查詢和需要編寫顯式遊標。視窗函式解決各種用例以最少的編碼工作。

本教程建立在之前的教程,分析Yelp學術資料集 和 分析高度動態資料集,並使用相同的Yelp資料集。

開始

  1. 要開始,請下載 Yelp (業務審查)。

  2. 安裝和啟動鑽。

  3. 在Drill中列出可用的模式列表。

SHOW schemas;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.yelp            |
| sys                 |
+---------------------+

7 rows selected (1.755 seconds)
  1. 切換到使用Yelp的工作區資料載入。

    USE dfs.yelp;

    +——-+—————————————+
    | ok | summary |
    +——-+—————————————+
    | true | Default schema changed to [dfs.yelp] |
    +——-+—————————————+

    1 row selected (0.129 seconds)

  2. 開始探索的一個資料集在Yelp資料集——業務資訊。

    SELECT * FROM business.json LIMIT 1;

    +————————+—————————————————–+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods |
    +————————+————–+——+——————————-+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd Ste 101 Phoenix, AZ 85018 | {“Tuesday”:{“close”:”17:00″,”open”:”08:00″},”Friday”:{“close”:”17:00″,”open”:”08:00″},”Monday”:{“close”:”17:00″,”open”:”08:00″},”Wednesday”:{“close”:”17:00″,”open”:”08:00″},”Thursday”:{“close”:”17:00″,”open”:”08:00″},”Sunday”:{},”Saturday”:{}} | true | [“Doctors”,”Health & Medical”] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {“By Appointment Only”:true,”Good Ambience”:{},”Parking”:{},”Music”:{},”Hair Types Specialized In”:{},”Payment Types”:{},”Dietary Restrictions”:{}} | business | [] |
    +————-+————–+——-+——+————+——+————–+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    1 row selected (0.514 seconds)

使用視窗功能簡單的查詢

  1. 得到最高Yelp企業基於評論在每個城市數量和業務的行號。

    SELECT name, city, review_count, row_number()
    OVER (PARTITION BY city ORDER BY review_count DESC) AS rownum
    FROM business.json LIMIT 15;

    +—————————————-+————+—————+———+
    | name | city | review_count | rownum |
    +—————————————-+————+—————+———+
    | Cupz N` Crepes | Ahwatukee | 124 | 1 |
    | My Wine Cellar | Ahwatukee | 98 | 2 |
    | Kathy`s Alterations | Ahwatukee | 12 | 3 |
    | McDonald`s | Ahwatukee | 7 | 4 |
    | U-Haul | Ahwatukee | 5 | 5 |
    | Hi-Health | Ahwatukee | 4 | 6 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 7 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 8 |
    | Roberto`s Authentic Mexican Food | Anthem | 117 | 1 |
    | Q to U BBQ | Anthem | 74 | 2 |
    | Outlets At Anthem | Anthem | 64 | 3 |
    | Dara Thai | Anthem | 56 | 4 |
    | Cafe Provence | Anthem | 53 | 5 |
    | Shanghai Club | Anthem | 50 | 6 |
    | Two Brothers Kitchen | Anthem | 43 | 7 |
    +—————————————-+————+—————+———+
    15 rows selected (0.67 seconds)

  2. 檢查評論數量的每個業務相比評論的平均數量在所有業務。

    SELECT name, city,review_count,
    Avg(review_count) OVER (PARTITION BY City) AS city_reviews_avg
    FROM business.json LIMIT 15;

    +—————————————-+————+—————+———————+
    | name | city | review_count | city_reviews_avg |
    +—————————————-+————+—————+———————+
    | Hi-Health | Ahwatukee | 4 | 32.25 |
    | My Wine Cellar | Ahwatukee | 98 | 32.25 |
    | U-Haul | Ahwatukee | 5 | 32.25 |
    | Cupz N` Crepes | Ahwatukee | 124 | 32.25 |
    | McDonald`s | Ahwatukee | 7 | 32.25 |
    | Kathy`s Alterations | Ahwatukee | 12 | 32.25 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 32.25 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 32.25 |
    | Anthem Community Center | Anthem | 4 | 14.492063492063492 |
    | Scrapbooks To Remember | Anthem | 4 | 14.492063492063492 |
    | Hungry Howie`s Pizza | Anthem | 7 | 14.492063492063492 |
    | Pinata Nueva | Anthem | 3 | 14.492063492063492 |
    | Starbucks Coffee Company | Anthem | 13 | 14.492063492063492 |
    | Pizza Hut | Anthem | 6 | 14.492063492063492 |
    | Rays Pizza | Anthem | 19 | 14.492063492063492 |
    +—————————————-+————+—————+———————+
    15 rows selected (0.395 seconds)

  3. 看看為每個業務貢獻的評論數量對所有企業在城市的總數。

SELECT name, city,review_count,
Sum(review_count) OVER (PARTITION BY City) AS city_reviews_sum
FROM business.jsonlimit 15;

name city review_count city_reviews_sum
+—————————————-+————+—————+——————-+
Hi-Health Ahwatukee 4 258
My Wine Cellar Ahwatukee 98 258
U-Haul Ahwatukee 5 258
Cupz N` Crepes Ahwatukee 124 258
McDonald`s Ahwatukee 7 258
Kathy`s Alterations Ahwatukee 12 258
Healthy and Clean Living Environments Ahwatukee 4 258
Active Kids Pediatrics Ahwatukee 4 258
Anthem Community Center Anthem 4 913
Scrapbooks To Remember Anthem 4 913
Hungry Howie`s Pizza Anthem 7 913
Pinata Nueva Anthem 3 913
Starbucks Coffee Company Anthem 13 913
Pizza Hut Anthem 6 913
Rays Pizza Anthem 19 913

15 rows selected (0.543 seconds)

對複雜的查詢使用視窗函式

  1. 排名前十的城市名單和他們的排名最高的企業數量的評論。使用鑽視窗等功能等級,dense_rank這些查詢。

    WITH X
    AS
    (SELECT name, city, review_count,
    RANK()
    OVER (PARTITION BY city
    ORDER BY review_count DESC) AS review_rank
    FROM business.json)
    SELECT X.name, X.city, X.review_count
    FROM X
    WHERE X.review_rank =1 ORDER BY review_count DESC LIMIT 10;

    +——————————————-+————-+—————+
    | name | city | review_count |
    +——————————————-+————-+—————+
    | Mon Ami Gabi | Las Vegas | 4084 |
    | Studio B | Henderson | 1336 |
    | Phoenix Sky Harbor International Airport | Phoenix | 1325 |
    | Four Peaks Brewing Co | Tempe | 1110 |
    | The Mission | Scottsdale | 783 |
    | Joe`s Farm Grill | Gilbert | 770 |
    | The Old Fashioned | Madison | 619 |
    | Cornish Pasty Company | Mesa | 578 |
    | SanTan Brewing Company | Chandler | 469 |
    | Yard House | Glendale | 321 |
    +——————————————-+————-+—————+
    10 rows selected (0.49 seconds)

  2. 比較評論的數量為每個業務在城市頂部和底部評論數。

    SELECT name, city, review_count,
    FIRST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
    LAST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS bottom_review_count
    FROM business.json limit 15;

    +—————————————-+————+—————+——————-+———————-+
    | name | city | review_count | top_review_count | bottom_review_count |
    +—————————————-+————+—————+——————-+———————-+
    | My Wine Cellar | Ahwatukee | 98 | 124 | 12 |
    | McDonald`s | Ahwatukee | 7 | 124 | 12 |
    | U-Haul | Ahwatukee | 5 | 124 | 12 |
    | Hi-Health | Ahwatukee | 4 | 124 | 12 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 124 | 12 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 124 | 12 |
    | Cupz N` Crepes | Ahwatukee | 124 | 124 | 12 |
    | Kathy`s Alterations | Ahwatukee | 12 | 124 | 12 |
    | Q to U BBQ | Anthem | 74 | 117 | 117 |
    | Dara Thai | Anthem | 56 | 117 | 117 |
    | Cafe Provence | Anthem | 53 | 117 | 117 |
    | Shanghai Club | Anthem | 50 | 117 | 117 |
    | Two Brothers Kitchen | Anthem | 43 | 117 | 117 |
    | The Tennessee Grill | Anthem | 32 | 117 | 117 |
    | Dollyrockers Boutique and Salon | Anthem | 30 | 117 | 117 |
    +—————————————-+————+—————+——————-+———————-+
    15 rows selected (0.516 seconds)

  3. 比較的評論評論的數量之前和之後的業務。

    SELECT city, review_count, name,
    LAG(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS preceding_count,
    LEAD(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS following_count
    FROM business.json limit 15;

    +————+—————+—————————————-+——————+——————+
    | city | review_count | name | preceding_count | following_count |
    +————+—————+—————————————-+——————+——————+
    | Ahwatukee | 124 | Cupz N` Crepes | null | 98 |
    | Ahwatukee | 98 | My Wine Cellar | 124 | 12 |
    | Ahwatukee | 12 | Kathy`s Alterations | 98 | 7 |
    | Ahwatukee | 7 | McDonald`s | 12 | 5 |
    | Ahwatukee | 5 | U-Haul | 7 | 4 |
    | Ahwatukee | 4 | Hi-Health | 5 | 4 |
    | Ahwatukee | 4 | Healthy and Clean Living Environments | 4 | 4 |
    | Ahwatukee | 4 | Active Kids Pediatrics | 4 | null |
    | Anthem | 117 | Roberto`s Authentic Mexican Food | null | 74 |
    | Anthem | 74 | Q to U BBQ | 117 | 64 |
    | Anthem | 64 | Outlets At Anthem | 74 | 56 |
    | Anthem | 56 | Dara Thai | 64 | 53 |
    | Anthem | 53 | Cafe Provence | 56 | 50 |
    | Anthem | 50 | Shanghai Club | 53 | 43 |
    | Anthem | 43 | Two Brothers Kitchen | 50 | 32 |
    +————+—————+—————————————-+——————+——————+
    15 rows selected (0.518 seconds)

←分析社交媒體 安裝鑽→

相關文章