HiveQL應用體驗
HiveQL應用體驗
Hive提供的類SQL的操作體驗,從語法上來看,近似於MySQL,從功能上來看,近似於SQL92,不過大家注意了,都只是近似而已。
HiveQL詳細語法參見:https://cwiki.apache.org/confluence/display/Hive/LanguageManual
我們先嚐試使用Hive來實現之前使用PIG分析access_log的任務,每個ip的點選次數。
建立表物件如下:
hive> create table access_log(
> ip string,
> other string
> )
> row format delimited fields terminated by ¨ ¨
> stored as textfile;
OK
Time taken: 0.106 seconds載入資料也是用LOAD DATA 命令,這裡要處理的檔案:
hive> load data local inpath ¨/data/software/access_log.txt¨ overwrite into table access_log;
Copying data from file:/data/software/access_log.txt
Copying file: file:/data/software/access_log.txt
Loading data to table default.access_log
Moved to trash: hdfs://hdnode1:9000/user/hive/warehouse/access_log
OK
Time taken: 0.753 seconds查詢載入的資料,查詢出前20條,我了個去,連limit子句都提供了,方便呀:
hive> select * from access_log limit 20;
OK
220.181.108.151 -
208.115.113.82 -
220.181.94.221 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
112.97.24.243 -
Time taken: 0.156 seconds藉助SQL統計就很方便了,GROUP BY即可。不過考慮到這次要操作的資料量較大,執行GROUP BY將結果集輸出到螢幕恐有不妥,因此這裡我們將結果儲存到另外的表中,執行SQL語句如下:
hive> create table access_result as select ip,count(1) ct from access_log group by ip;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201304220923_0007, Tracking URL = http://hdnode1:50030/jobdetails.jsp?jobid=job_201304220923_0007
Kill Command = /usr/local/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=hdnode1:9001 -kill job_201304220923_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-02 17:02:12,037 Stage-1 map = 0%, reduce = 0%
2013-05-02 17:02:18,082 Stage-1 map = 100%, reduce = 0%
2013-05-02 17:02:27,128 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304220923_0007
Moving data to: hdfs://hdnode1:9000/user/hive/warehouse/access_result
[Warning] could not update stats.
476 Rows loaded to hdfs://hdnode1:9000/tmp/hive-grid/hive_2013-05-02_17-02-05_557_8882025499109535399/-ext-10000
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 7118627 HDFS Write: 8051 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 25.529 seconds查詢資料(由於需要排序,因此又是個M-R任務):
hive> select * from access_result order by ct desc limit 10;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201304220923_0009, Tracking URL = http://hdnode1:50030/jobdetails.jsp?jobid=job_201304220923_0009
Kill Command = /usr/local/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=hdnode1:9001 -kill job_201304220923_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-02 17:04:45,208 Stage-1 map = 0%, reduce = 0%
2013-05-02 17:04:48,220 Stage-1 map = 100%, reduce = 0%
2013-05-02 17:04:57,260 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304220923_0009
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 8051 HDFS Write: 191 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
218.20.24.203 4597
221.194.180.166 4576
119.146.220.12 1850
117.136.31.144 1647
121.28.95.48 1597
113.109.183.126 1596
182.48.112.2 870
120.84.24.200 773
61.144.125.162 750
27.115.124.75 470
Time taken: 20.608 seconds看吧,不怕不識貨,就怕貨比貨,怪不得PIG要沒落呢,相比Hive,PIG那一套可以直接忽視了。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7607759/viewspace-761358/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- HiveQL詳解Hive
- 快應用初體驗
- 體驗了一把快應用
- 餓了麼快應用初體驗
- .NET MAUI 安卓應用開發初體驗UI安卓
- Opensignal:移動語音應用體驗報告
- 華為雲應用魔方 AppCube 建立問卷調查應用的使用體驗APP
- Judo:使用無程式碼構建原生應用體驗
- 鴻蒙應用開發-DevEco Studio 模板體驗(四)鴻蒙dev
- 鴻蒙應用開發-DevEco Studio 模板體驗(一)鴻蒙dev
- 鴻蒙應用開發-DevEco Studio 模板體驗(三)鴻蒙dev
- electron-vue模仿網易雲桌面應用體驗Vue
- Docker(1):初體驗之應用掛載到容器Docker
- 體驗SpringBoot(2.3)應用製作Docker映象(官方方案)Spring BootDocker
- 博物館展示沉浸式互動體驗空間應用
- Microsoft 365應用將取代Office應用,成為體驗微軟服務的新中心ROS微軟
- 阿里雲推出全球應用加速解決方案,快速提升跨域應用訪問體驗阿里跨域
- 【開發者必看】移動應用趨勢洞察白皮書-應用體驗變革篇
- hive學習筆記之六:HiveQL基礎Hive筆記
- AI應用體驗-QiWen-Plus大模型之聊天小助手AI大模型
- Google Play 應用與遊戲使用者體驗指南 (一)Go遊戲
- Google Play 應用與遊戲使用者體驗指南 (二)Go遊戲
- OS課 Level 2 實驗(2):軟體的部署與應用
- 來BSN,體驗更輕鬆的公鏈應用開發
- 【有償招募】【微信】【應用寶】測試體驗官招募 | 優先體驗產品瞭解走向
- 使用 onBeforeRouteUpdate 組合式函式提升應用的使用者體驗函式
- 使用 onBeforeRouteLeave 組合式函式提升應用的使用者體驗函式
- 不止於大,如何打造優秀的摺疊屏應用體驗
- 卓懿應用商城帶來Linux上的視聽新體驗Linux
- 應用出海,如何使用蘋果 CallKit 提升網路通話體驗蘋果
- 教你嚐鮮「快應用」!體驗秒開,如絲般順滑!
- 如何改善應用啟動效能 | Facebook 應用的經驗分享
- 體驗用yarp連線websocketWeb
- 手把手帶你離線部署Walrus,體驗極簡應用交付
- 實驗6 C語言結構體、列舉應用程式設計C語言結構體程式設計
- 鐵威馬NAS上架迅雷應用,極速下載體驗get!
- DevEco Studio 2.0開發鴻蒙HarmonyOS應用初體驗全面測評dev鴻蒙
- 阿里雲Kubernetes服務上使用Tekton完成應用釋出初體驗阿里
- 超詳細,Flutter2.0構建Web應用的實際體驗FlutterWeb