HIVE 資料定義 DDL

逸卿發表於2014-12-31

coming form:http://blog.csdn.net/iquicksandi/article/details/8522691

Databases in Hive -- 在HIVE 中使用資料庫

Hive offers no support for row-level inserts, updates, and deletes.

Hive doesn’t support transactions. Hive adds ex-tensions to provide
better performance in the context of Hadoop and to integrate with

custom extensions and even external programs.

Hive 不支援行級插入，更新，刪除。也不支援事務

建立資料庫

[sql]view
 plaincopyprint?

hive> CREATE DATABASE financials;  

倉庫資料庫判斷資料庫是否存在

[sql]view
 plaincopyprint?

hive> CREATE DATABASE IF NOT EXISTS financials;  

顯示現在有的資料庫

[sql]view
 plaincopyprint?

hive> SHOW DATABASES;  

default  

financials  

hive> CREATE DATABASE human_resources;  

hive> SHOW DATABASES;  

default  

financials  

human_resources

條件查詢資料庫

[sql]view
 plaincopyprint?

hive> SHOW DATABASES LIKE 'h.*';  

human_resources  

hive> ...  

建立指定存放檔案位置資料庫

[sql]view
 plaincopyprint?

hive> CREATE DATABASE financials  

    > LOCATION '/my/preferred/directory';  

建立資料庫時新增註釋資訊

[sql]view
 plaincopyprint?

hive> CREATE DATABASE financials  

    > COMMENT 'Holds all financial tables';  

hive> DESCRIBE DATABASE financials;  

financials   Holds all financial tables  

  hdfs://master-server/user/hive/warehouse/financials.db

建立資料庫新增擴充套件資訊

[sql]view
 plaincopyprint?

hive> CREATE DATABASE financials  

    > WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02');  

hive> DESCRIBE DATABASE financials;  

financials   hdfs://master-server/user/hive/warehouse/financials.db  

hive> DESCRIBE DATABASE EXTENDED financials;  

financials   hdfs://master-server/user/hive/warehouse/financials.db  

 {date=2012-01-02, creator=Mark Moneybags);

使用資料庫

[sql]view
 plaincopyprint?

hive> USE financials;  

設定顯示當前資料庫

[sql]view
 plaincopyprint?

hive> set hive.cli.print.current.db=true;  

hive (financials)> USE default;  

hive (default)> set hive.cli.print.current.db=false;  

hive> ...  

刪除資料庫

[sql]view
 plaincopyprint?

hive> DROP DATABASE IF EXISTS financials;  

當資料庫存在表時，先要刪除表再能刪除資料庫

[sql]view
 plaincopyprint?

hive> DROP DATABASE IF EXISTS financials CASCADE;  

Alter Database -- 修改資料庫

[sql]view
 plaincopyprint?

hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');  

There is no way to delete or “unset” a DBPROPERTY 沒有方法刪除或重置 DBPROPERTY

Creating Tables -- 建立表

[sql]view
 plaincopyprint?

CREATE TABLE IF NOT EXISTS mydb.employees (  

  name         STRING COMMENT 'Employee name',  

  salary       FLOAT  COMMENT 'Employee salary',  

  subordinates ARRAY<STRING> COMMENT 'Names of subordinates',  

  deductions   MAP<STRING, FLOAT>  

               COMMENT 'Keys are deductions names, values are percentages  

  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>  

               COMMENT 'Home address')  

COMMENT 'Description of the table'  

TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)  

LOCATION '/user/hive/warehouse/mydb.db/employees';

建立表-複製表結構

[sql]view
 plaincopyprint?

CREATE TABLE IF NOT EXISTS mydb.employees2  

LIKE mydb.employees;  

顯示某個資料庫中的表

[sql]view
 plaincopyprint?

hive> USE mydb;  

hive> SHOW TABLES;  

employees  

table1  

table2

[sql]view
 plaincopyprint?

hive> USE default;  

hive> SHOW TABLES IN mydb;  

employees  

顯示指定篩選條件表名

[sql]view
 plaincopyprint?

hive> USE mydb;  

hive> SHOW TABLES 'empl.*';  

employees  

顯示錶擴充套件資訊

[sql]view
 plaincopyprint?

hive> DESCRIBE EXTENDED mydb.employees;  

name    string  Employee name  

salary  float   Employee salary  

subordinates    array<string>   Names of subordinates  

deductions      map<string,float> Keys are deductions names, values are percentages  

address struct<street:string,city:string,state:string,zip:int>  Home address  

Detailed Table Information      Table(tableName:employees, dbName:mydb, owner:me,  

...  

location:hdfs://master-server/user/hive/warehouse/mydb.db/employees,  

parameters:{creator=me, created_at='2012-01-02 10:00:00',  

            last_modified_user=me, last_modified_time=1337544510,  

            comment:Description of the table, ...}, ...)

指定顯示某個欄位的資訊

[sql]view
 plaincopyprint?

hive> DESCRIBE mydb.employees.salary;  

salary  float   Employee salary  

External Tables -- 外部表

外部表，刪除表不刪除資料

[sql]view
 plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (  

  exchange        STRING,  

  symbol          STRING,  

  ymd             STRING,  

  price_open      FLOAT,  

  price_high      FLOAT,  

  price_low       FLOAT,  

  price_close     FLOAT,  

  volume          INT,  

  price_adj_close FLOAT)  

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','  

LOCATION '/data/stocks';

複製表結構倉庫外部表

[sql]view
 plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3  

LIKE mydb.employees  

LOCATION '/path/to/data';  

Partitioned, Managed Tables --分割槽表

[sql]view
 plaincopyprint?

CREATE TABLE employees (  

  name         STRING,  

  salary       FLOAT,  

  subordinates ARRAY<STRING>,  

  deductions   MAP<STRING, FLOAT>,  

  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>  

)  

PARTITIONED BY (country STRING, state STRING);

However, Hive will now create subdirectories reflecting the partitioning structure. For
example:

[sql]view
 plaincopyprint?

...  

.../employees/country=CA/state=AB  

.../employees/country=CA/state=BC  

...  

.../employees/country=US/state=AL  

.../employees/country=US/state=AK  

...

建議安全措施
把HIVE 設定成“嚴格”模式,禁止分割槽表的查詢沒有
一個WHERE子句

[sql]view
 plaincopyprint?

hive> set hive.mapred.mode=strict;  

hive> SELECT e.name, e.salary FROM employees e LIMIT 100;  

FAILED: Error in semantic analysis: No partition predicate found for  

 Alias "e" Table "employees"  

hive> set hive.mapred.mode=nonstrict;  

hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

檢視現有分割槽

[sql]view
 plaincopyprint?

hive> SHOW PARTITIONS employees;  

...  

Country=CA/state=AB  

country=CA/state=BC  

...  

country=US/state=AL  

country=US/state=AK

檢視分割槽詳細分割槽鍵

[sql]view
 plaincopyprint?

hive> SHOW PARTITIONS employees PARTITION(country='US');  

country=US/state=AL  

country=US/state=AK  

...  

hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK');  

country=US/state=AK

通過 DESC 顯示分割槽鍵

[sql]view
 plaincopyprint?

hive> DESCRIBE EXTENDED employees;  

name         string,  

salary       float,  

...  

address      struct<...>,  

country      string,  

state        string  

Detailed Table Information...  

partitionKeys:[FieldSchema(name:country, type:string, comment:null),  

FieldSchema(name:state, type:string, comment:null)],  

...

從檔案讀入分割槽表

[sql]view
 plaincopyprint?

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'  

INTO TABLE employees  

PARTITION (country = 'US', state = 'CA');  

External Partitioned Tables 外部分割槽表

1.先建立外部表結構

[sql]view
 plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS log_messages (  

  hms             INT,  

  severity        STRING,  

  server          STRING,  

  process_id      INT,  

  message         STRING)  

PARTITIONED BY (year INT, month INT, day INT)  

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

2.為外部表增加指定分割槽

[sql]view
 plaincopyprint?

ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2)  

LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';  

3.把資料表複製外部表目錄結構中

Copy the data for the partition being moved to S3. For example, you can use the
hadoop distcp command:

[sql]view
 plaincopyprint?

hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02  

•Alter the table to point the partition to the S3 location:

[sql]view
 plaincopyprint?

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)  

SET LOCATION 's3n://ourbucket/logs/2011/01/02';
•Remove the HDFS copy of the partition using the hadoop fs -rmr command:

[sql]view
 plaincopyprint?

hadoop fs -rmr /data/log_messages/2011/01/02  

顯示表分割槽資訊

[sql]view
 plaincopyprint?

hive> SHOW PARTITIONS log_messages;  

...  

year=2011/month=12/day=31  

year=2012/month=1/day=1  

year=2012/month=1/day=2

[sql]view
 plaincopyprint?

hive> DESCRIBE EXTENDED log_messages;  

...  

message         string,  

year            int,  

month           int,  

day             int  

Detailed Table Information...  

partitionKeys:[FieldSchema(name:year, type:int, comment:null),  

FieldSchema(name:month, type:int, comment:null),  

FieldSchema(name:day, type:int, comment:null)],  

...

[sql]view
 plaincopyprint?

hive> DESCRIBE EXTENDED log_messages PARTITION (year=2012, month=1, day=2);  

...  

location:s3n://ourbucket/logs/2011/01/02,  

...  

Customizing Table Storage Formats -- 表儲存格式

[sql]view
 plaincopyprint?

CREATE TABLE employees (  

  name         STRING,  

  salary       FLOAT,  

  subordinates ARRAY<STRING>,  

  deductions   MAP<STRING, FLOAT>,  

  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>  

)  

ROW FORMAT DELIMITED  

FIELDS TERMINATED BY '\001'  

COLLECTION ITEMS TERMINATED BY '\002'  

MAP KEYS TERMINATED BY '\003'  

LINES TERMINATED BY '\n'  

STORED AS TEXTFILE;

Dropping Tables -- 刪除表

[sql]view
 plaincopyprint?

DROP TABLE IF EXISTS employees;  

For external tables, the metadata is deleted but the data is not.

Alter Table --修改表結構

ALTER TABLE modifies table metadata only. The data for the table is
untouched. It’s up to you to ensure that any modifications are consistent
with the actual data.

Renaming a Table -- 修改表名

[sql]view
 plaincopyprint?

ALTER TABLE log_messages RENAME TO logmsgs;  

Adding, Modifying, and Dropping a Table Partition -- 增加，修改，刪除表分割槽

[sql]view
 plaincopyprint?

ALTER TABLE log_messages ADD IF NOT EXISTS  

PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'  

PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'  

PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'  

[sql]view
 plaincopyprint?

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)  

SET LOCATION 's3n://ourbucket/logs/2011/01/02';  

[sql]view
 plaincopyprint?

ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);  

Changing Columns --修改列

[sql]view
 plaincopyprint?

ALTER TABLE log_messages  

CHANGE COLUMN hms hours_minutes_seconds INT  

COMMENT 'The hours, minutes, and seconds part of the timestamp'  

AFTER severity;  

Adding Columns --增加列

[sql]view
 plaincopyprint?

ALTER TABLE log_messages ADD COLUMNS (  

 app_name   STRING COMMENT 'Application name',  

 session_id LONG   COMMENT 'The current session id');  

Deleting or Replacing Columns --刪除替換列

[sql]view
 plaincopyprint?

ALTER TABLE log_messages REPLACE COLUMNS (  

 hours_mins_secs INT    COMMENT 'hour, minute, seconds from timestamp',  

 severity        STRING COMMENT 'The message severity'  

 message         STRING COMMENT 'The rest of the message');  

This statement effectively renames the original hms column and removes the server and
process_id columns from the original schema definition. As for all ALTER statements,
only the table metadata is changed.

Alter Table Properties --修改表屬性

[sql]view
 plaincopyprint?

ALTER TABLE log_messages SET TBLPROPERTIES (  

 'notes' = 'The process id is no longer captured; this column is always NULL');  

Alter Storage Properties --修改儲存屬性

[sql]view
 plaincopyprint?

ALTER TABLE log_messages  

PARTITION(year = 2012, month = 1, day = 1)  

SET FILEFORMAT SEQUENCEFILE;  

You can specify a new SerDe along with SerDe properties or change the properties for
the existing SerDe. The following example specifies that a table will use a Java class
named com.example.JSONSerDe to process a file of JSON-encoded records

[sql]view
 plaincopyprint?

ALTER TABLE table_using_JSON_storage  

SET SERDE 'com.example.JSONSerDe'  

WITH SERDEPROPERTIES (  

 'prop1' = 'value1',  

 'prop2' = 'value2');

Hive（一）資料型別以及DDL資料定義
2024-09-02
Hive資料型別
hive從入門到放棄(二)——DDL資料定義
2022-03-16
Hive
資料定義語言(DDL)
2020-12-01
MySQL全面瓦解4：資料定義-DDL
2020-10-30
MySql
MySQL之資料定義語言(DDL)
2021-05-25
MySql
MySQL中的DDL(Data Definition Language,資料定義語言)
2018-03-13
MySql
ClickHouse資料庫資料定義手記之資料型別
2020-11-22
資料庫資料型別
MySQL學習筆記之資料定義表約束,分頁方法總結
2021-09-09
MySql筆記
postgresql資料定時轉存mongodb方案
2024-04-20
SQLMongoDB
hive 3.0.0自定義函式
2018-09-06
Hive函式
Hive On Tez自定義Job Name
2020-09-24
Hive
Hive中自定義函式
2020-10-13
Hive函式
淺談Dotnet的資料定位和匹配
2021-03-10
【Hive】hive資料遷移
2018-08-21
Hive
shell向pg寫入資料定時任務
2020-11-13
Linux大資料定製篇 Shell程式設計
2021-01-01
Linux大資料程式設計
Hive常用函式及自定義函式
2018-06-08
Hive函式
HIVE自定義函式的擴充套件
2018-12-30
Hive函式套件
Hive---＞建立自定義的UDTF函式
2020-11-27
Hive函式
資料庫操作語言DDL
2024-06-10
資料庫
[hive]hive資料模型中四種表
2018-08-14
Hive模型
【大資料開發】Hive——Hive函式大全
2020-11-06
大資料Hive函式
【HIVE】hive 使用shell指令碼跑歷史資料
2020-12-31
Hive指令碼
MySQL DDL執行方式-Online DDL介紹
2022-09-22
MySql
ddl練習
2024-05-26
Hive資料格式轉換
2019-01-08
Hive
spark寫入hive資料
2019-04-09
SparkHive
Hive 資料更新時間
2020-09-23
Hive
Hive處理Json資料
2021-11-30
HiveJSON
大資料技術 - Hive
2023-02-24
大資料Hive
大資料4.2 -- hive資料庫
2018-04-03
大資料Hive資料庫
好程式設計師大資料培訓分享之hive常見自定義函式
2020-05-26
程式設計師大資料Hive函式
23 大資料之hive(第四部 hive基礎)
2020-11-22
大資料Hive
MyCAT中的DDL
2018-05-08
MySQL DDL操作表
2024-10-22
MySql
hive（4）——後設資料概述
2020-10-11
Hive
掌握Hive資料儲存模型
2024-06-28
Hive模型
HIVE資料匯入基礎
2021-09-09
Hive
Sqoop匯出ClickHouse資料到Hive
2023-02-06
OOPHive

HIVE 資料定義 DDL

相關文章