HIVE基本語法以及HIVE分割槽

adragon發表於2018-09-20

原文網址 : https://flycode.co/archives/247892

HIVE小結

HIVE基本語法

HIVE和Mysql十分類似
建表規則

  CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
  [(col_name data_type [COMMENT col_comment], ...)] 
  [COMMENT table_comment] 
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
  [CLUSTERED BY (col_name, col_name, ...) 
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
  [ROW FORMAT row_format] 
  [STORED AS file_format] 
  [LOCATION hdfs_path]

CREATE TABLE 建立一個指定名字的表。如果相同名字的表已經存在，則丟擲異常；使用者可以用 IF NOT EXIST 選項來忽略這個異常
EXTERNAL 關鍵字可以讓使用者建立一個外部表，在建表的同時指定一個指向實際資料的路徑（LOCATION）
LIKE 允許使用者複製現有的表結構，但是不復制資料
COMMENT可以為表與欄位增加描述

建立表
hive> CREATE TABLE IF NOT EXISTS test1
> (id INT,name STRING);

刪除表
drop table test1;
檢視錶結構
desc test1;
修改表名
alter table test1 rename to test2;
修改表結構
alter table test1 add columns(address string ,grade string);
建立和已知表相同結構的表
create table test3 like test1;
載入本地資料
load date local inpath `/home/date/` into table test1;
注意可以在into 前面新增overwrite表示覆蓋之前在test1的資料，如果沒有就表示載入本地資料在原始資料的後面
載入hdfs的檔案
首先將檔案上傳到hdfs檔案系統對對應的目錄上
hadoop fs -put /home/.txt /usr/
然後載入hdfs中的資料
load data inpath /usr/ into table test1;

插入資料
insert overwrite table test2 select * from test1;
查詢資料
和mysql語法上沒甚沒區別

查詢單個欄位的資料
where條件查詢
all和distinct
limit限制查詢
group by
order by
sort bu
distribute by
cluster by

HIVE分割槽

hive分割槽是為了更方便資料管理，常見的有時間分割槽和業分割槽

    create table t1(
    id      int
    ,name    string
    ,hobby   array<string>
    ,add     map<String,string>
    )
    partitioned by (pt_d string)

需要注意的是分割槽欄位不能和表中的欄位重複，否則就會報錯：

    FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

我們在載入資料的時候也可以分割槽載入

load data local inpath `/home/hadoop/Desktop/data` overwrite into table t1 partition ( pt_d = `201701`);

之後我們再將同一份資料載入到不同的分割槽中

load data local inpath `/home/hadoop/Desktop/data` overwrite into table t1 partition ( pt_d = `000000`);

查詢一下資料 select * from t1;

1   xiaoming    ["book","TV","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  000000
2   lilei   ["book","code"] {"nanjing":"jiangning","taiwan":"taibei"}   000000
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  000000
1   xiaoming    ["book","TV","code"]    {"beijing":"chaoyang","shagnhai":"pudong"}  201701
2   lilei   ["book","code"] {"nanjing":"jiangning","taiwan":"taibei"}   201701
3   lihua   ["music","book"]    {"heilongjiang":"haerbin"}  201701

建立分割槽除了在建立表的時候啟動partition by實現，還可以
alter table t1 add partition (pt_d string)
這樣就建立了一個分割槽，這時會看到hive在hdfs中建立了相應的資料夾

查詢相應的分割槽的資料

select * from t1 where pt_d = ‘000000’

新增分割槽，增加一個分割槽檔案

alter table t1 add partition (pt_d = ‘333333’);

刪除分割槽（刪除對應的分割槽檔案）
注意，對於外表進行drop partition並不會刪除hdfs上的檔案，並且通過msck repair table table_name同步回hdfs上的分割槽。

alter table test1 drop partition (pt_d = ‘20170101’);

查詢分割槽

show partitions table_name;

修復分割槽
修復分割槽就是重新同步hdfs上的分割槽資訊。

msck repair table table_name;

插入資料

insert overwrite table partition_test partition(stat_date=`2015-01-18`,province=`jiangsu`) 
select member_id,name from partition_test_input 
where stat_date=`2015-01-18` 
and province=`jiangsu`;

內部表和外部表的區別

Hive中表與外部表的區別：
1、在匯入資料到外部表，資料並沒有移動到自己的資料倉儲目錄下，也就是說外部表中的資料並不是由它自己來管理的！而表則不一樣；
2、在刪除表的時候，Hive將會把屬於表的後設資料和資料全部刪掉；而刪除外部表的時候，Hive僅僅刪除外部表的後設資料，資料是不會刪除的！
那麼，應該如何選擇使用哪種表呢？在大多數情況沒有太多的區別，因此選擇只是個人喜好的問題。但是作為一個經驗，如果所有處理都需要由Hive完成，那麼你應該建立表，否則使用外部表！

[Hive]hive分割槽設定注意事項
2018-08-16
Hive
Hive動態分割槽
2018-03-13
Hive
hive分割槽分桶
2021-02-26
Hive
Spark操作Hive分割槽表
2018-12-07
SparkHive
Hive的分割槽和排序
2024-11-13
Hive排序
Hive和Spark分割槽策略
2021-06-27
HiveSpark
Hive的靜態分割槽與動態分割槽
2018-05-03
Hive
Hive動態分割槽詳解
2020-12-23
Hive
一起學Hive——使用MSCK命令修復Hive分割槽
2021-09-09
Hive
Hive中靜態分割槽和動態分割槽總結
2021-03-31
Hive
hive分割槽和分桶你熟悉嗎？
2024-03-10
Hive
hive Sql的動態分割槽問題
2024-04-01
HiveSQL
hive 動態分割槽插入資料表
2020-12-18
Hive
3- hive語法
2019-08-25
Hive
hive 分割槽表和分桶表區別
2020-09-26
Hive
【趙渝強老師】Hive的分割槽表
2024-10-28
Hive
hive迷案之消失的分割槽檔案
2021-09-09
Hive
hive學習筆記之四：分割槽表
2021-07-02
Hive筆記
Hive sql語法詳解
2019-03-24
HiveSQL
Hive的基本介紹以及常用函式
2020-06-04
Hive函式
Hive語法及其進階(二)
2021-09-29
Hive
Presto 與 Hive 語法學習
2022-04-16
RESTHive
hive從入門到放棄(四)——分割槽與分桶
2022-04-02
Hive
好程式設計師大資料培訓分享Hive的靜態分割槽與動態分割槽
2020-06-05
程式設計師大資料Hive
Hive表的基本操作
2021-01-10
Hive
hive建立分割槽表報錯AccessControlException Permission denied: user=NONE, access=WRITE, inode
2020-09-23
HiveExceptionNone
好程式設計師大資料開發之掌握Hive的靜態分割槽與動態分割槽
2019-03-29
程式設計師大資料Hive
Hive學習之基本操作
2018-11-30
Hive
Hive的基本操作用法
2024-11-10
Hive
Hive基礎語法5分鐘速覽
2020-12-14
Hive
hive學習之四:hive檔案格式以及壓縮編碼
2018-05-22
Hive
Hive --------- hive 的優化
2018-11-12
Hive優化
[Hive]Hive排序優化
2018-08-15
Hive排序優化
Hive內部函式簡介及查詢語法
2018-07-02
Hive函式
好程式設計師大資料學習路線分享hive分割槽和分桶
2019-08-20
程式設計師大資料Hive
【Hive】hive資料遷移
2018-08-21
Hive
HIVE
2022-06-19
Hive
Spark SQL解析查詢parquet格式Hive表獲取分割槽欄位和查詢條件
2020-12-03
SparkSQLHive

HIVE基本語法以及HIVE分割槽

HIVE小結