hive表連線和oracle測試對比

YallonKing發表於2012-03-31

原文網址 : http://blog.itpub.net/26143577/viewspace-720093/

/*
測試簡述：
在oracle中測試100000條資料的等值連線效率；在hive中測試65535條資料的等值連線效率；並進行對比。
其中在hive中通過資料行儲存和列儲存的資料組織結構進行測試。

測試環境說明：
oracle中：伺服器+orcel enterprise linux6.0+oracle11gr2
hive中：一般桌上型電腦+vmware8.0+orcel enterprise linux6.0+hadoop-0.20.2+hive-0.7.1
*/

--附：具體測試過程如下：
--------------------------oracle下操作--------------------------

--建立測試表1
SQL> create table hive1 (id number,name varchar2(20));

Table created.

SQL> begin
2 for i in 1..100000
3 loop
4 insert into hive1 values(i,'hive1');
5 commit;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

--建立測試表2
SQL> create table hive2 (id number,name varchar2(20));

Table created.

Elapsed: 00:00:00.05
SQL> begin
2 for i in 1..100000
3 loop
4 insert into hive2 values(i,'hive2');
5 commit;
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

--測試2個表等值連線效率
SQL> select a.*,b.name as name2 from hive1 a,hive2 b where a.id=b.id;

100000 rows selected.

Elapsed: 00:00:00.98

Execution Plan
----------------------------------------------------------
Plan hash value: 2251506592

--------------------------------------------------------------------------------
----

--------------------------------------------------------------------------------
----

| 0 | SELECT STATEMENT | | 102K| 5002K| | 456 (1)| 00:00:
06 |

|* 1 | HASH JOIN | | 102K| 5002K| 3704K| 456 (1)| 00:00:
06 |

| 2 | TABLE ACCESS FULL| HIVE2 | 102K| 2501K| | 45 (3)| 00:00:
01 |

| 3 | TABLE ACCESS FULL| HIVE1 | 105K| 2587K| | 45 (3)| 00:00:
01 |

--------------------------------------------------------------------------------
----

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."ID"="B"."ID")

Note
-----
- dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       7149 consistent gets
          0 physical reads
          0 redo size
    1836054 bytes sent via SQL*Net to client
      73742 bytes received via SQL*Net from client
       6668 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

--------------------------hive下操作--------------------------

--建立行儲存測試表 test_hive2[測試表 test_hive3 同理]
hive> create table test_hive2 (id int,id2 int,name string)
    > row format delimited
    > fields terminated by '\t';
OK
Time taken: 0.055 seconds
hive> load data local inpath '/root/tmp001.txt'
    > overwrite into table test_hive2;
Copying data from file:/root/tmp001.txt
Copying file: file:/root/tmp001.txt
Loading data to table default.test_hive2
Deleted file:/user/hive/warehouse/test_hive2
OK
Time taken: 0.342 seconds

--測試全表掃描效率
hive> select count(*) from test_hive2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Execution log at: /tmp/root/root_20120331160606_9f242fd6-e321-4413-92a7-b4d92207f14e.log
Job running in-process (local Hadoop)
2012-03-31 16:06:10,931 null map = 100%, reduce = 100%
Ended Job = job_local_0001
OK
65535
Time taken: 2.36 seconds

--測試2個行儲存表的等值連線效率
hive> select test_hive2.*,test_hive3.name from test_hive2 join test_hive3 on ( test_hive2.id=test_hive3.id);

Time taken: 27.749 seconds

--列儲存2表等值連線測試如下
[root@oraking ~]# hive
Hive history file=/tmp/root/hive_job_log_root_201203311553_471153629.txt

--建立列儲存測試表 test_hive
hive> create table test_hive (id int,id2 int,name string)
> row format serde 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS RCFILE;
OK
Time taken: 2.944 seconds

--插入資料
hive> insert overwrite table test_hive
> select id,id2,name
> from test_hive2;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/root/root_20120331164444_44ac1bdb-f603-45e2-b50a-4681e69d330f.log
Job running in-process (local Hadoop)
2012-03-31 16:44:31,709 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Ended Job = 171641075, job is filtered out (removed at runtime).
Moving data to: file:/tmp/hive-root/hive_2012-03-31_16-44-28_048_8584030323247636747/-ext-10000
Loading data to table default.test_hive
Deleted file:/user/hive/warehouse/test_hive
Table default.test_hive stats: [num_partitions: 0, num_files: 1, num_rows: 65535, total_size: 961048]
OK
Time taken: 3.946 seconds

--建立列儲存測試表 test_hive4
hive> create table test_hive4 (id int,id2 int,name string)
> row format serde 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS RCFILE;
OK
Time taken: 0.019 seconds

--插入資料
hive> insert overwrite table test_hive4
> select id,id2,name
> from test_hive2;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/root/root_20120331165555_13476cce-d845-4277-9505-38e6441e8e77.log
Job running in-process (local Hadoop)
2012-03-31 16:55:11,786 null map = 100%, reduce = 0%
Ended Job = job_local_0001
Ended Job = -1613004717, job is filtered out (removed at runtime).
Moving data to: file:/tmp/hive-root/hive_2012-03-31_16-55-09_157_510206187744689585/-ext-10000
Loading data to table default.test_hive4
Deleted file:/user/hive/warehouse/test_hive4
Table default.test_hive4 stats: [num_partitions: 0, num_files: 1, num_rows: 65535, total_size: 961048]
OK
Time taken: 2.863 seconds

--測試列儲存全表掃描效率
hive> select count(*) from test_hive;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Execution log at: /tmp/root/root_20120331164848_c82b81f6-ee21-48fa-bbb8-c5a52b5c01bb.log
Job running in-process (local Hadoop)
2012-03-31 16:48:45,111 null map = 100%, reduce = 100%
Ended Job = job_local_0001
OK
65535
Time taken: 2.357 seconds

hive> select count(*) from test_hive4;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Execution log at: /tmp/root/root_20120331165555_0e89d023-1969-4272-8ab5-a68c48db7073.log
Job running in-process (local Hadoop)
2012-03-31 16:55:45,199 null map = 100%, reduce = 100%
Ended Job = job_local_0001
OK
65535
Time taken: 2.377 seconds

--測試列儲存2表等值連線效率
hive> select test_hive.*,test_hive4.name from test_hive join test_hive4 on (test_hive.id=test_hive4.id);

Time taken: 28.306 seconds

/*
綜述：整體來看，在oracle中的效率明顯高於hive中[無論是行儲存還是列儲存]

注：oracle中是2個100000條資料的等值連線測試；而hive中是2個65535條資料的等值連線測試
初步分析如下：
1、由於hive是本機的虛擬機器下測試，而oracle是在真實伺服器下測試，故可能存在整體基礎環境效能影響；
2、由於hive的工作原理本身的問題，如先map再reduce生成結果集等；
3、本人對新技術掌握程度，靈活使用的能力問題；
4、其他不確定因素。
*/

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/26143577/viewspace-720093/，如需轉載，請註明出處，否則將追究法律責任。

【JDBC】java連線池模擬測試連線oracle
2021-04-08
JDBCJavaOracle
LightDB-Oracle和LightDB邏輯備份測試對比（十二）
2022-04-25
Oracle
Nginx 和 Gunicorn 效能對比測試
2022-01-10
Nginx
Linux測試點對點連線速度工具
2024-06-16
Linux
【JDBC】java連線池模擬測試連線Oracle資料庫指令碼參考
2021-04-08
JDBCJavaOracle資料庫指令碼
JDBC連線openGauss6.0和PostgreSQL16.2效能對比
2024-05-14
JDBCSQL
Hive||beeline連線的InvalidURL
2018-03-09
Hive
python主流框架測試對比
2024-09-13
Python框架
TDengine 和 InfluxDB 查詢效能對比測試報告
2022-03-29
UX測試報告
報表連 hive，資料量比較大，怎麼分頁查詢？
2020-06-23
Hive
深圳軟體測試學習：【資料庫】-【oracle】-連線查詢
2019-11-05
資料庫Oracle
輪詢、長輪詢、短連線、長連線區別對比
2019-05-05
測試rss與navicat連線
2020-04-04
Hadoop（五）C#連線Hive
2022-05-04
HadoopC#Hive
【測評】軟體測試廠商Parasoft和SmartBear關鍵功能對比
2018-11-27
[Hive]Hive中表連線的優化，加快查詢速度
2018-08-15
Hive優化
cx_Oracle 連線 Oracle
2020-01-18
Oracle
敏捷測試VS傳統測試對比，6招玩轉敏捷測試！
2018-11-21
敏捷測試
C#連線和操作Oracle資料
2018-12-21
C#Oracle
0039-如何使用PythonImpyla客戶端連線Hive和Impala
2018-11-22
Python客戶端Hive
測試 mysql 的最大連線數
2019-03-03
MySql
oracle partition by group by,詳解partition by和group by對比
2022-07-19
Oracle
springboot連線hive無法啟動
2024-09-14
Spring BootHive
Servlet連線Oracle
2019-03-11
ServletOracle
PHP 連線 Oracle
2019-09-05
PHPOracle
thinkphp連線Oracle
2024-08-03
PHPOracle
PHP 連線oracle
2021-09-09
PHPOracle
linux tinydrm vs fbtft 效能對比測試
2024-07-03
Linux
SpringBoot2 整合測試元件，七種測試手段對比
2021-05-31
Spring Boot元件
oracle分割槽表的分類及測試
2019-01-22
Oracle
【SQL】SQL表連線方法方式介紹（Oracle/Postgresql）
2021-03-09
SQLOracle
5. Oracle連線和使用——5.1. sqlplus
2020-02-13
OracleSQL
轉:ORACLE的JDBC連線方式:OCI和THIN
2020-04-04
OracleJDBC
對比測試工具平臺讓財務測試飛起來
2018-12-14
Oracle、MySQL常見表結構變更語句對比
2023-01-02
OracleMySql
Oracle、NoSQL和NewSQL 資料庫技術對比
2020-11-11
OracleSQL資料庫
Oracle和MySQL資料庫CTAS等操作對比
2023-02-09
OracleMySql資料庫
服務端c100k連線測試和客戶端65535測試驗證2
2021-05-11
服務端客戶端
oracle 連線查詢
2019-10-17
Oracle

hive表連線和oracle測試對比

相關文章