HIVE 資料定義 DDL
coming form:http://blog.csdn.net/iquicksandi/article/details/8522691
Databases in Hive -- 在HIVE 中使用資料庫
Hive offers no support for row-level inserts, updates, and deletes.
Hive doesn’t support transactions. Hive adds ex-tensions to providebetter performance in the context of Hadoop and to integrate with
custom extensions and even external programs.
Hive 不支援行級插入,更新,刪除。也不支援事務
建立資料庫
- hive> CREATE DATABASE financials;
倉庫資料庫判斷資料庫是否存在
- hive> CREATE DATABASE IF NOT EXISTS financials;
顯示現在有的資料庫
- hive> SHOW DATABASES;
- default
- financials
- hive> CREATE DATABASE human_resources;
- hive> SHOW DATABASES;
- default
- financials
- human_resources
條件查詢資料庫
- hive> SHOW DATABASES LIKE 'h.*';
- human_resources
- hive> ...
建立指定存放檔案位置 資料庫
- hive> CREATE DATABASE financials
- > LOCATION '/my/preferred/directory';
建立資料庫時 新增註釋資訊
- hive> CREATE DATABASE financials
- > COMMENT 'Holds all financial tables';
- hive> DESCRIBE DATABASE financials;
- financials Holds all financial tables
- hdfs://master-server/user/hive/warehouse/financials.db
建立資料庫 新增擴充套件資訊
- hive> CREATE DATABASE financials
- > WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02');
- hive> DESCRIBE DATABASE financials;
- financials hdfs://master-server/user/hive/warehouse/financials.db
- hive> DESCRIBE DATABASE EXTENDED financials;
- financials hdfs://master-server/user/hive/warehouse/financials.db
- {date=2012-01-02, creator=Mark Moneybags);
使用資料庫
- hive> USE financials;
設定顯示當前資料庫
- hive> set hive.cli.print.current.db=true;
- hive (financials)> USE default;
- hive (default)> set hive.cli.print.current.db=false;
- hive> ...
刪除資料庫
- hive> DROP DATABASE IF EXISTS financials;
當資料庫存在表時,先要刪除表 再能刪除資料庫
- hive> DROP DATABASE IF EXISTS financials CASCADE;
Alter Database -- 修改資料庫
- hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');
There is no way to delete or “unset” a DBPROPERTY 沒有方法刪除或重置 DBPROPERTY
Creating Tables -- 建立表
- CREATE TABLE IF NOT EXISTS mydb.employees (
- name STRING COMMENT 'Employee name',
- salary FLOAT COMMENT 'Employee salary',
- subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
- deductions MAP<STRING, FLOAT>
- COMMENT 'Keys are deductions names, values are percentages
- address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
- COMMENT 'Home address')
- COMMENT 'Description of the table'
- TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)
- LOCATION '/user/hive/warehouse/mydb.db/employees';
建立表-複製表結構
- CREATE TABLE IF NOT EXISTS mydb.employees2
- LIKE mydb.employees;
顯示某個資料庫中的表
- hive> USE mydb;
- hive> SHOW TABLES;
- employees
- table1
- table2
- hive> USE default;
- hive> SHOW TABLES IN mydb;
- employees
顯示指定篩選條件 表名
- hive> USE mydb;
- hive> SHOW TABLES 'empl.*';
- employees
顯示錶擴充套件資訊
- hive> DESCRIBE EXTENDED mydb.employees;
- name string Employee name
- salary float Employee salary
- subordinates array<string> Names of subordinates
- deductions map<string,float> Keys are deductions names, values are percentages
- address struct<street:string,city:string,state:string,zip:int> Home address
- Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me,
- ...
- location:hdfs://master-server/user/hive/warehouse/mydb.db/employees,
- parameters:{creator=me, created_at='2012-01-02 10:00:00',
- last_modified_user=me, last_modified_time=1337544510,
- comment:Description of the table, ...}, ...)
指定顯示某個欄位的資訊
- hive> DESCRIBE mydb.employees.salary;
- salary float Employee salary
External Tables -- 外部表
外部表,刪除表不刪除資料
- CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
- exchange STRING,
- symbol STRING,
- ymd STRING,
- price_open FLOAT,
- price_high FLOAT,
- price_low FLOAT,
- price_close FLOAT,
- volume INT,
- price_adj_close FLOAT)
- ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
- LOCATION '/data/stocks';
複製表結構倉庫外部表
- CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3
- LIKE mydb.employees
- LOCATION '/path/to/data';
Partitioned, Managed Tables --分割槽表
- CREATE TABLE employees (
- name STRING,
- salary FLOAT,
- subordinates ARRAY<STRING>,
- deductions MAP<STRING, FLOAT>,
- address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
- )
- PARTITIONED BY (country STRING, state STRING);
However, Hive will now create subdirectories reflecting the partitioning structure. For
example:
- ...
- .../employees/country=CA/state=AB
- .../employees/country=CA/state=BC
- ...
- .../employees/country=US/state=AL
- .../employees/country=US/state=AK
- ...
建議安全措施
把HIVE 設定成“嚴格”模式,禁止分割槽表的查詢沒有
一個WHERE子句
- hive> set hive.mapred.mode=strict;
- hive> SELECT e.name, e.salary FROM employees e LIMIT 100;
- FAILED: Error in semantic analysis: No partition predicate found for
- Alias "e" Table "employees"
- hive> set hive.mapred.mode=nonstrict;
- hive> SELECT e.name, e.salary FROM employees e LIMIT 100;
檢視現有分割槽
- hive> SHOW PARTITIONS employees;
- ...
- Country=CA/state=AB
- country=CA/state=BC
- ...
- country=US/state=AL
- country=US/state=AK
檢視分割槽詳細 分割槽鍵
- hive> SHOW PARTITIONS employees PARTITION(country='US');
- country=US/state=AL
- country=US/state=AK
- ...
- hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK');
- country=US/state=AK
通過 DESC 顯示分割槽鍵
- hive> DESCRIBE EXTENDED employees;
- name string,
- salary float,
- ...
- address struct<...>,
- country string,
- state string
- Detailed Table Information...
- partitionKeys:[FieldSchema(name:country, type:string, comment:null),
- FieldSchema(name:state, type:string, comment:null)],
- ...
從檔案讀入 分割槽表
- LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
- INTO TABLE employees
- PARTITION (country = 'US', state = 'CA');
External Partitioned Tables 外部分割槽表
1.先建立外部表結構
- CREATE EXTERNAL TABLE IF NOT EXISTS log_messages (
- hms INT,
- severity STRING,
- server STRING,
- process_id INT,
- message STRING)
- PARTITIONED BY (year INT, month INT, day INT)
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
2.為外部表增加指定分割槽
- ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2)
- LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';
3.把資料表複製外部表目錄結構中
Copy the data for the partition being moved to S3. For example, you can use the
hadoop distcp command:
- hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02
•Alter the table to point the partition to the S3 location:
- ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION 's3n://ourbucket/logs/2011/01/02';
•Remove the HDFS copy of the partition using the hadoop fs -rmr command:
- hadoop fs -rmr /data/log_messages/2011/01/02
顯示 表分割槽資訊
- hive> SHOW PARTITIONS log_messages;
- ...
- year=2011/month=12/day=31
- year=2012/month=1/day=1
- year=2012/month=1/day=2
- hive> DESCRIBE EXTENDED log_messages;
- ...
- message string,
- year int,
- month int,
- day int
- Detailed Table Information...
- partitionKeys:[FieldSchema(name:year, type:int, comment:null),
- FieldSchema(name:month, type:int, comment:null),
- FieldSchema(name:day, type:int, comment:null)],
- ...
- hive> DESCRIBE EXTENDED log_messages PARTITION (year=2012, month=1, day=2);
- ...
- location:s3n://ourbucket/logs/2011/01/02,
- ...
Customizing Table Storage Formats -- 表儲存格式
- CREATE TABLE employees (
- name STRING,
- salary FLOAT,
- subordinates ARRAY<STRING>,
- deductions MAP<STRING, FLOAT>,
- address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
- )
- ROW FORMAT DELIMITED
- FIELDS TERMINATED BY '\001'
- COLLECTION ITEMS TERMINATED BY '\002'
- MAP KEYS TERMINATED BY '\003'
- LINES TERMINATED BY '\n'
- STORED AS TEXTFILE;
Dropping Tables -- 刪除表
- DROP TABLE IF EXISTS employees;
For external tables, the metadata is deleted but the data is not.
Alter Table --修改表結構
ALTER TABLE modifies table metadata only. The data for the table is
untouched. It’s up to you to ensure that any modifications are consistent
with the actual data.
Renaming a Table -- 修改表名
- ALTER TABLE log_messages RENAME TO logmsgs;
Adding, Modifying, and Dropping a Table Partition -- 增加,修改,刪除 表分割槽
- ALTER TABLE log_messages ADD IF NOT EXISTS
- PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'
- PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'
- PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'
- ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
- SET LOCATION 's3n://ourbucket/logs/2011/01/02';
- ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);
Changing Columns --修改列
- ALTER TABLE log_messages
- CHANGE COLUMN hms hours_minutes_seconds INT
- COMMENT 'The hours, minutes, and seconds part of the timestamp'
- AFTER severity;
Adding Columns --增加列
- ALTER TABLE log_messages ADD COLUMNS (
- app_name STRING COMMENT 'Application name',
- session_id LONG COMMENT 'The current session id');
Deleting or Replacing Columns --刪除 替換列
- ALTER TABLE log_messages REPLACE COLUMNS (
- hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp',
- severity STRING COMMENT 'The message severity'
- message STRING COMMENT 'The rest of the message');
This statement effectively renames the original hms column and removes the server and
process_id columns from the original schema definition. As for all ALTER statements,
only the table metadata is changed.
Alter Table Properties --修改表屬性
- ALTER TABLE log_messages SET TBLPROPERTIES (
- 'notes' = 'The process id is no longer captured; this column is always NULL');
Alter Storage Properties --修改儲存屬性
- ALTER TABLE log_messages
- PARTITION(year = 2012, month = 1, day = 1)
- SET FILEFORMAT SEQUENCEFILE;
You can specify a new SerDe along with SerDe properties or change the properties for
the existing SerDe. The following example specifies that a table will use a Java class
named com.example.JSONSerDe to process a file of JSON-encoded records
- ALTER TABLE table_using_JSON_storage
- SET SERDE 'com.example.JSONSerDe'
- WITH SERDEPROPERTIES (
- 'prop1' = 'value1',
- 'prop2' = 'value2');
相關文章
- Hive(一)資料型別以及DDL資料定義Hive資料型別
- hive從入門到放棄(二)——DDL資料定義Hive
- 資料定義語言(DDL)
- MySQL之資料定義語言(DDL)MySql
- MySQL全面瓦解4:資料定義-DDLMySql
- 資料庫實驗3 資料定義語言DDL資料庫
- MySQL中的DDL(Data Definition Language,資料定義語言)MySql
- 你需要知道的7個大資料定義大資料
- ClickHouse資料庫資料定義手記之資料型別資料庫資料型別
- MySQL 4.1.0 中文參考手冊 --- 6.5 資料定義: CREATE、DROP、ALTER (轉)MySql
- 匯出ddl定義.sqlSQL
- Hive學習之二 《Hive的安裝之自定義mysql資料庫》HiveMySql資料庫
- MySQL學習筆記之資料定義表約束,分頁方法總結MySql筆記
- postgresql資料定時轉存mongodb方案SQLMongoDB
- hive 3.0.0自定義函式Hive函式
- Hive On Tez自定義Job NameHive
- Hive中自定義函式Hive函式
- 【Hive】hive資料遷移Hive
- HIVE中的自定義函式Hive函式
- oracle使用dbms_metadata.get_ddl包檢視DDL語句定義Oracle
- 淺談Dotnet的資料定位和匹配
- shell向pg寫入資料定時任務
- Linux大資料定製篇 Shell程式設計Linux大資料程式設計
- 如何dump audio資料定位雜音問題
- C2B的未來:大資料定製大資料
- [hive]hive資料模型中四種表Hive模型
- HIVE自定義函式的擴充套件Hive函式套件
- Hive--->建立自定義的UDTF函式Hive函式
- Hive常用函式及自定義函式Hive函式
- HIVE 資料同步Hive
- 【大資料開發】Hive——Hive函式大全大資料Hive函式
- 【HIVE】hive 使用shell指令碼跑歷史資料Hive指令碼
- 大資料時代的技術hive:hive介紹大資料Hive
- Python 爬蟲實戰(2):股票資料定向爬蟲Python爬蟲
- Hive資料匯出Hive
- Hive資料傾斜Hive
- Hive學習筆記 2 Hive的資料型別Hive筆記資料型別
- Hive學習筆記 4 Hive的資料匯入Hive筆記