資料倉儲建模工具之一——Hive學習第四天

shmil發表於2024-07-18

原文網址 : https://www.cnblogs.com/shmil/p/18308434

Hive的基本操作

1.3HIve的表操作（接著昨天的繼續學習）

1.3.2 顯示錶

show tables;
show tables like 'u*';
desc t_person;
desc formatted students; // 更加詳細

1.3.3 載入資料

1、使用`hdfs dfs -put '本地資料' 'hive表對應的HDFS目錄下'`

2、使用 load data

下列命令需要在hive shell裡執行

create table IF NOT EXISTS students2
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

// 將HDFS上的/input1目錄下面的資料 移動至 students表對應的HDFS目錄下，注意是 **移動、移動、移動**
load data inpath '/input1/students.txt' into table students;

注意：使用hdfs匯入資料至hive，使用的是剪下操作，即原hdfs路徑下的檔案在被匯入到hive中之後，原路徑下的檔案就不復存在

// 清空表
truncate table students;
// 加上 local 關鍵字 可以將Linux本地目錄下的檔案 上傳到 hive表對應HDFS 目錄下 **原檔案不會被刪除,是複製，不是移動**
load data local inpath '/usr/local/soft/data/students.txt' into table students;
// overwrite 覆蓋載入
load data local inpath '/usr/local/soft/data/students.txt' overwrite into table students;

3、create table xxx as SQL語句

4、insert into table xxxx SQL語句（沒有as）

create table IF NOT EXISTS students3
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';


// 將 students表的資料插入到students2 這是複製 不是移動 students表中的資料不會丟失
insert into table students2 select * from students;

// 覆蓋插入 把into 換成 overwrite
insert overwrite table students2 select * from students;

1.3.4 修改列

查詢表結構

desc students2;

新增列

alter table students2 add columns (education string);

新增列之後，查詢表的資料，新增的那一列是沒有具體的資料進行對映的，所以全是null
當向其中插入新的資料時，新的資料和原先的資料在HDFS中看似是分開的，但是其實就是一個表資料，
注意的是新的資料插入之後不會影響之前的資料，在HDFS中沒有對應的資料，就相當於沒有對映過來就是null

查詢表結構

desc students2;

更新列

alter table stduents2 change education educationnew string;

1.3.5 刪除表

drop table students2;

1.4 Hive內外部表

面試題：內部表和外部表的區別？如何建立外部表？工作中使用外部表

1.4.1 hive內部表

當建立好表的時候，HDFS會在當前表所屬的庫中建立一個資料夾

當設定表路徑的時候，如果直接指向一個已有的路徑,可以直接去使用資料夾中的資料

當load資料的時候，就會將資料檔案存放到表對應的資料夾中

而且資料一旦被load，就不能被修改

我們查詢資料也是查詢檔案中的檔案,這些資料最終都會存放到HDFS

當我們刪除表的時候，表對應的資料夾會被刪除，同時資料也會被刪除

預設建表的型別就是內部表

// 內部表
create table students_internal
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/input2';

hive> dfs -put /usr/local/soft/data/students.txt /input2/;

1.4.1 Hive外部表

外部表說明

外部表因為是指定其他的hdfs路徑的資料載入到表中來，所以hive會認為自己不完全獨佔這份資料

刪除hive表的時候，資料仍然儲存在hdfs中，不會刪除。

刪除外部表只會刪除hive中的對映出來的資料表，以及存在MySQL中的資料的後設資料資訊。

// 外部表
create external table students_external
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/hive_test/input3';

hive> dfs -put /usr/local/soft/data/students.txt /input3/;

刪除表測試一下：

hive> drop table students_internal;
Moved: 'hdfs://master:9000/input2' to trash at: hdfs://master:9000/user/root/.Trash/Current
OK
Time taken: 0.474 seconds
hive> drop table students_external;
OK
Time taken: 0.09 seconds
hive>

一般在公司中，使用外部表多一點，因為資料可以需要被多個程式使用，避免誤刪，通常外部表會結合location一起使用

外部表還可以將其他資料來源中的資料對映到 hive中，比如說：hbase，ElasticSearch......

設計外部表的初衷就是讓表的後設資料與資料解耦

操作案例: 分別建立dept，emp，salgrade。並載入資料。

建立資料檔案存放的目錄

hdfs dfs -mkdir -p /bigdata/hive_test1/dept
hdfs dfs -mkdir -p /bigdata/hive_test1/emp
hdfs dfs -mkdir -p /bigdata/hive_test1/salgrade

建立dept表

CREATE EXTERNAL TABLE IF NOT EXISTS dept (
  DEPTNO int,
  DNAME string,
  LOC string
) row format delimited fields terminated by ','
location '/hive_test/dept';

10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

建立emp表

CREATE EXTERNAL TABLE IF NOT EXISTS emp (
   EMPNO int,
   ENAME string,
   JOB string,
   MGR int,
   HIREDATE date,
   SAL int,
   COMM int,
   DEPTNO int
 ) row format delimited fields terminated by ','
 location '/hive_test/emp';
 
7369,SMITH,CLERK,7902,1980-12-17,800,null,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02,2975,null,20
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30
7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10
7788,SCOTT,ANALYST,7566,1987-07-13,3000,null,20
7839,KING,PRESIDENT,null,1981-11-17,5000,null,10
7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30
7876,ADAMS,CLERK,7788,1987-07-13,1100,null,20
7900,JAMES,CLERK,7698,1981-12-03,950,null,30
7902,FORD,ANALYST,7566,1981-12-03,3000,null,20
7934,MILLER,CLERK,7782,1982-01-23,1300,null,10

建立salgrade表

CREATE EXTERNAL TABLE IF NOT EXISTS salgrade (
  GRADE int,
  LOSAL int,
  HISAL int
) row format delimited fields terminated by ','
location '/hive_test/salgrade';

1,700,1200
2,1201,1400
3,1401,2000
4,2001,3000
5,3001,9999

1.5 Hive匯出資料

將表中的資料備份

將查詢結果存放到本地

//建立存放資料的目錄
mkdir -p /usr/local/soft/bigdata

//匯出查詢結果的資料(匯出到Node01上)
insert overwrite local directory '/usr/local/soft/bigdata/person_data' select * from t_person;

按照指定的方式將資料輸出到本地

-- 建立存放資料的目錄
mkdir -p /usr/local/soft/bigdata

-- 匯出查詢結果的資料
insert overwrite local directory '/usr/local/soft/bigdata/hive_test1/person' 
ROW FORMAT DELIMITED fields terminated by ',' 
collection items terminated by '-' 
map keys terminated by ':' 
lines terminated by '\n' 
select * from t_person;

insert overwrite local directory '/usr/local/soft/bigdata/hive_test1/stu' 
ROW FORMAT DELIMITED fields terminated by ','  
lines terminated by '\n' 
select clazz,count(1) as count from students group by clazz;

將查詢結果輸出到HDFS

將sql語句的查詢結果輸出到HDFS中時，在HDFS上是直接將該結果存放在指定的資料夾裡。

-- 建立存放資料的目錄
hdfs dfs -mkdir -p /bigdata/hive_test1/copy

-- 匯出查詢結果的資料
insert overwrite directory '/bigdata/copy2' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from students

直接使用HDFS命令儲存表對應的資料夾

// 建立存放資料的目錄
hdfs dfs -mkdir -p /bigdata/person

// 使用HDFS命令複製檔案到其他目錄
hdfs dfs -cp /hive/warehouse/t_person/*  /bigdata/person

將表結構和資料同時備份
將資料匯出到HDFS

將資料匯出到HDFS中時，會在指定的檔案目錄下產生一個該資料的後設資料檔案資訊_metadata和一個用於存放真實資料的資料夾 data，在該資料夾下面才是資料檔案

//建立存放資料的目錄
hdfs dfs -mkdir -p /bigdata/copy

//匯出查詢結果的資料
export table t_person to '/bigdata/copy';

刪除表結構

drop table t_person;

恢復表結構和資料

import from '/bigdata;

注意：時間不同步，會導致匯入匯出失敗

資料倉儲建模工具之一——Hive學習第五天
2024-07-19
Hive
資料倉儲建模工具之一——Hive學習第七天
2024-07-26
Hive
資料倉儲建模方法論
2020-12-08
大資料框架之一——Hadoop學習第四天
2024-08-09
大資料框架Hadoop
Hive：資料倉儲構建步驟
2018-10-16
Hive
利用Data vault對資料倉儲建模
2020-07-26
hive學習筆記之一：基本資料型別
2021-07-01
Hive筆記資料型別
雲資料建模：為資料倉儲設計資料庫
2022-06-30
資料庫
黑猴子的家：Hive 資料倉儲位置配置
2018-09-28
Hive
利用Data Vault對資料倉儲進行建模（二）
2020-08-01
資料量不大的資料倉儲方案有必要用 hive 嗎？
2022-05-10
Hive
資料倉儲元件：Hive環境搭建和基礎用法
2021-01-04
元件Hive
最新資料倉儲建模指南頂級教程加強版
2019-04-02
【資料倉儲】|4 維度建模之事實表設計
2021-06-01
13、資料，學習和建模
2019-02-23
【資料倉儲】|3 維度建模之維度表設計
2021-05-22
【資料倉儲】|5 維度建模設計和實施過程
2021-06-07
大資料4.1 - Flume整合案例+Hive資料倉
2018-04-08
大資料Hive
資料倉儲工具箱-維度建模權威指南（第三版）讀書筆記
2018-08-29
筆記
掌握Hive資料儲存模型
2024-06-28
Hive模型
資料倉儲Build The Data Warehouse（William H.Inmon）學習筆記 --- 第八章、外部資料/非結構化資料與資料倉儲
2020-10-13
UI筆記
資料庫倉庫系列：(一)什麼是資料倉儲，為什麼要資料倉儲
2020-12-12
資料庫
好程式設計師大資料學習路線之hive儲存格式
2019-07-30
程式設計師大資料Hive
資料倉儲 - ER模型
2023-05-15
模型
[數倉]資料倉儲設計方案
2019-06-11
資料倉儲應該用什麼方案——資料倉儲實施方案概述
2024-05-30
奈學：資料湖和資料倉儲的區別有哪些？
2020-06-28
什麼是資料倉儲
2023-05-17
什麼是資料倉儲？
2019-07-29
資料倉儲經驗概念
2022-05-23
大資料基礎學習-7.Hive-1.1.0
2018-04-27
大資料Hive
淺談資料倉儲和大資料
2018-06-21
大資料
資料湖會取代資料倉儲嗎?
2022-11-09
談談資料湖和資料倉儲
2022-11-29
資料湖 vs 資料倉儲 vs 資料庫
2022-01-16
資料庫
資料倉儲(6)數倉分層設計
2022-04-14
資料倉儲(7)數倉規範設計
2022-04-20
Hive學習
2019-04-14
Hive

資料倉儲建模工具之一——Hive學習第四天

Hive的基本操作

1.3HIve的表操作（接著昨天的繼續學習）

1.3.2 顯示錶

1.3.3 載入資料

1、使用hdfs dfs -put '本地資料' 'hive表對應的HDFS目錄下'

2、使用 load data

3、create table xxx as SQL語句

4、insert into table xxxx SQL語句 （沒有as）

1.3.4 修改列

1.3.5 刪除表

1.4 Hive內外部表

1.4.1 hive內部表

1.4.1 Hive外部表

1.5 Hive匯出資料

相關文章

1、使用`hdfs dfs -put '本地資料' 'hive表對應的HDFS目錄下'`

4、insert into table xxxx SQL語句（沒有as）