參考連結：http://www.cnblogs.com/xd502djj/archive/2013/12/11/3470074.html

語句參考：
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.created.files=500000;
set mapred.reduce.tasks = 3000;
INSERT OVERWRITE TABLE sany_online_hive_wj.ecc_wj partition(st_year,st_month,st_day)
select
a.st_pid,
a.st_loginid,
a.st_ma_serialno,
a.st_checkmark,
a.st_state,
a.st_logintime,
a.st_connecttime,
a.st_updatetime,
a.st_totalwktime,
a.st_rmntime,
a.st_berrorcode,
a.st_werrorcode,
a.st_balmcode,
a.st_walmcode,
a.st_longitude,
a.st_latitude,
a.st_saticunt,
a.st_steppos,
a.st_engv,
a.st_oillev,
a.st_batteryvol,
a.st_floatreserv33,
a.st_floatreserv34,
a.re_en_pid ,
a.st_wktime,
a.st_gpssta,
a.st_velocity,
a.st_orientation,
a.st_sgnlq,
a.st_errdealsta,
a.st_cmmctsch,
a.st_altitude,
a.st_uintreserv10,
a.st_uintreserv11,
a.st_uintreserv12,
a.st_uintreserv13,
a.st_uintreserv14,
a.st_uintreserv15,
a.st_uintreserv16,
a.st_uintreserv17,
a.st_uintreserv18,
a.st_uintreserv19,
a.st_uintreserv20,
a.st_uintreserv21,
a.st_uintreserv22,
a.st_uintreserv23,
a.st_uintreserv24,
a.st_uintreserv25,
a.st_uintreserv26,
a.st_uintreserv27,
a.st_uintreserv28,
a.st_uintreserv29,
a.st_uintreserv30,
a.st_uintreserv31,
a.st_uintreserv32,
a.st_floatreserv13,
a.st_floatreserv14,
a.st_floatreserv15,
a.st_floatreserv16,
a.st_floatreserv17,
a.st_floatreserv18,
a.st_floatreserv19,
a.st_floatreserv20,
a.st_floatreserv21,
a.st_floatreserv22,
a.st_floatreserv23,
a.st_floatreserv24,
a.st_floatreserv25,
a.st_floatreserv26,
a.st_floatreserv27,
a.st_floatreserv28,
a.st_floatreserv29,
a.st_floatreserv30,
a.st_floatreserv31,
a.st_floatreserv32,
substring(trim(a.st_updatetime),1,4) st_year,
substring(trim(a.st_updatetime),6,2) st_month,
substring(trim(a.st_updatetime),9,2) st_day
from sany_online_hive_wj.ecc_wj_distinct as a where
substring(trim(a.st_updatetime),1,7)='${YYYY-MM}'
distribute by st_year,st_month,st_day

不加distribute by之前，資料從hive任務的臨時結果路徑寫入資料的分割槽路徑下，速度特別慢，3，40分鐘左右，加上後耗時3分鐘左右。
具體原因，可以參考：http://blog.csdn.net/xiaolang85/article/details/11767297

另外，hive的表，考慮資料傾斜的情況，最好是將資料均分到表的檔案中會好些。

hive dynamic partition的使用

相關文章