partition timestamp(0) not use partition

miguelmin發表於2009-05-28

Tearadata前輩的問題,記下來:

The definition statement is as follows:

CREATE MULTISET TABLE PD_DATA_Z.NET_SETT_TICKET_DAILY
(
Calling_Nbr VARCHAR(21) NOT NULL,
Start_Time TIMESTAMP(0) NOT NULL,
...
)
PRIMARY INDEX XIE1NET_SETT_TICKET_DAILY ( Calling_Nbr )
PARTITION BY RANGE_N(CAST((Start_Time ) AS DATE FORMAT 'YYYYMMDD') BETWEEN DATE '1990-01-01' AND DATE '2030-12-31' EACH INTERVAL '1' DAY );

[@more@]

Case 1:
explain select *
FROM PD_DATA_Z.NET_SETT_TICKET_DAILY
WHERE Start_Time = '2009-03-20 00:00:00';

*** Help information returned. 19 rows.
*** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
1) First, we lock a distinct PD_DATA_Z."pseudo table" for read on a
RowHash to prevent global deadlock for
PD_DATA_Z.NET_SETT_TICKET_DAILY.
2) Next, we lock PD_DATA_Z.NET_SETT_TICKET_DAILY for read.
3) We do an all-AMPs RETRIEVE step from a single partition of
PD_DATA_Z.NET_SETT_TICKET_DAILY with a condition of (
"PD_DATA_Z.NET_SETT_TICKET_DAILY.Start_Time = TIMESTAMP
'2009-03-20 00:00:00'") with a residual condition of (
"PD_DATA_Z.NET_SETT_TICKET_DAILY.Start_Time = TIMESTAMP
'2009-03-20 00:00:00'") into Spool 1 (all_amps), which is built
locally on the AMPs. The result spool file will not be cached in
memory. The size of Spool 1 is estimated with low confidence to
be 1,127,057,976 rows. The estimated time for this step is 55
minutes and 24 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 55 minutes and 24
seconds.

Case 2:
explain select *
FROM PD_DATA_Z.NET_SETT_TICKET_DAILY
WHERE Start_Time <= '2009-03-20 00:00:00';

*** Help information returned. 17 rows.
*** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
1) First, we lock a distinct PD_DATA_Z."pseudo table" for read on a
RowHash to prevent global deadlock for
PD_DATA_Z.NET_SETT_TICKET_DAILY.
2) Next, we lock PD_DATA_Z.NET_SETT_TICKET_DAILY for read.
3) We do an all-AMPs RETRIEVE step from
PD_DATA_Z.NET_SETT_TICKET_DAILY by way of an all-rows scan with a
condition of ("PD_DATA_Z.NET_SETT_TICKET_DAILY.Start_Time <=
TIMESTAMP '2009-03-20 00:00:00'") into Spool 1 (group_amps), which
is built locally on the AMPs. The input table will not be cached
in memory, but it is eligible for synchronized scanning. The
result spool file will not be cached in memory. The size of Spool
1 is estimated with no confidence to be 375,685,992 rows. The
estimated time for this step is 25 minutes and 8 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 25 minutes and 8 seconds.

Case 3:
explain select *
FROM PD_DATA_Z.NET_SETT_TICKET_DAILY
WHERE cast(Start_Time as date format 'yyyymmdd') = '20090320';

*** Help information returned. 18 rows.
*** Total elapsed time was 1 second.

Explanation
---------------------------------------------------------------------------
1) First, we lock a distinct PD_DATA_Z."pseudo table" for read on a
RowHash to prevent global deadlock for
PD_DATA_Z.NET_SETT_TICKET_DAILY.
2) Next, we lock PD_DATA_Z.NET_SETT_TICKET_DAILY for read.
3) We do an all-AMPs RETRIEVE step from
PD_DATA_Z.NET_SETT_TICKET_DAILY by way of an all-rows scan with a
condition of ("(CAST((PD_DATA_Z.NET_SETT_TICKET_DAILY.Start_Time)
AS DATE))= DATE '2009-03-20'") into Spool 1 (group_amps), which is
built locally on the AMPs. The input table will not be cached in
memory, but it is eligible for synchronized scanning. The result
spool file will not be cached in memory. The size of Spool 1 is
estimated with no confidence to be 112,705,798 rows. The
estimated time for this step is 12 minutes and 12 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 12 minutes and 12
seconds.


Case 4:
explain select *
FROM PD_DATA_Z.NET_SETT_TICKET_DAILY
WHERE cast(Start_Time as date format 'yyyymmdd') <= '20090320';

*** Help information returned. 17 rows.
*** Total elapsed time was 1 second.

Explanation
-------------------------------------------------------------------------
1) First, we lock a distinct PD_DATA_Z."pseudo table" for read on a
RowHash to prevent global deadlock for
PD_DATA_Z.NET_SETT_TICKET_DAILY.
2) Next, we lock PD_DATA_Z.NET_SETT_TICKET_DAILY for read.
3) We do an all-AMPs RETRIEVE step from
PD_DATA_Z.NET_SETT_TICKET_DAILY by way of an all-rows scan with a
condition of ("(CAST((PD_DATA_Z.NET_SETT_TICKET_DAILY.Start_Time)
AS DATE))<= DATE '2009-03-20'") into Spool 1 (group_amps), which
is built locally on the AMPs. The input table will not be cached
in memory, but it is eligible for synchronized scanning. The
result spool file will not be cached in memory. The size of Spool
1 is estimated with no confidence to be 375,685,992 rows. The
estimated time for this step is 25 minutes and 8 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 25 minutes and 8 seconds.

All these query should use partition elimination, but only Case 1 did.
------------------------------------------------------------------------------------
NTA 1589 describes this issue. This is due to users having potential different timezone differentials into the same data and potentially not selecting data correctly by spanning partitions.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/16723161/viewspace-1022548/,如需轉載,請註明出處,否則將追究法律責任。

相關文章