定時將資料匯入到hive的shell指令碼

破棉襖發表於2015-05-12

每日定時匯入hive資料倉儲的自動化指令碼


建立shell指令碼,建立臨時表,裝載資料,轉換到正式的分割槽表中:


  1. #!/bin/sh
  2. # upload logs to hdfs
  3. yesterday=`date --date=\'1 days ago\' +%Y%m%d`
  4. hive -e \"
  5. use stage;
  6. create table tracklog_tmp (
  7. dateday string,
  8. datetime string,
  9. ip string ,
  10. cookieid string,
  11. userid string,
  12. logserverip string,
  13. referer string,
  14. requesturl string,
  15. remark1 string,
  16. remark2 string,
  17. alexaflag string,
  18. ua string,
  19. wirelessflag string
  20. )
  21. ROW FORMAT DELIMITED FIELDS TERMINATED BY \' \';\"
  22. hive -e \"
  23. use stage;
  24. set hive.enforce.bucketing=true;
  25. set hive.exec.compress.output=true;
  26. set mapred.output.compress=true;
  27. set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
  28. set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
  29. load data local inpath \'/diskg/logs/tracklog_192.168.1.1/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  30. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  31. load data local inpath \'/diskg/logs/tracklog_192.168.1.2/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  32. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  33. load data local inpath \'/diskg/logs/tracklog_192.168.1.3/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  34. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  35. load data local inpath \'/diskg/logs/trackloguc_192.168.1.1/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  36. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  37. load data local inpath \'/diskg/logs/trackloguc_192.168.1.2/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  38. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  39. load data local inpath \'/diskg/logs/trackloguc_192.168.1.3/${yesterday}/${yesterday}????.dat\' overwrite into table tracklog_tmp;
  40. insert into table tracklog PARTITION (day=\'${yesterday}\') select * from tracklog_tmp;
  41. \"
  42. hive -e \"
  43. use stage;
  44. drop table tracklog_tmp ;\"



在crontab中加入定時任務:


crontab -e 
加入如下程式碼:

#import tracklog
25  07 * * * /opt/bin/hive_opt/import_tracklog.sh


原文:


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29754888/viewspace-1649771/,如需轉載,請註明出處,否則將追究法律責任。

相關文章