ORACLE批量刪除無主鍵重複資料

DBA_H發表於2020-02-20

1.需求說明

TEST表情況說明:

  • 按月進行分割槽的分割槽表
  • 未定義主鍵或唯一索引
  • 包含COL1,COL2,COL3,INSERTTIME四列

現需要刪除2019年3月31日當天存在的重複資料

2.解決方法

2.1 確認無重複資料的記錄數

SELECT COUNT(1) FROM (
    SELECT COL1,COL2,COL3,INSERTTIME FROM TEST PARTITION(P201903) A 
        WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01'
        GROUP BY COL1,COL2,COL3
);

2.2 梳理需要篩選的資料

由於原表A資料量特別大,此處新建一張表將需要處理的資料單獨存放

CREATE TABLE TEST_TMP NOLOGGING AS
SELECT /*PARALLEL +8 */ A.*,A.ROWID ROWID_OLD FROM TEST PARTITION(P201903) A 
    WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01';

2.2 確認需要刪除的資料

理論上而言需要刪除和需要保留的資料記錄數應相等

--需要刪除的資料記錄數    
SELECT COUNT(1) FROM TEST PARTITION(P201903) A WHERE ROWID IN (
    SELECT MIN(ROWID_OLD) ROWID_OLD FROM TEST_TMP 
    WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01' 
    GROUP BY COL1,COL2,COL3,INSERTTIME 
    HAVING COUNT(1) > 1)
AND INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01'
--需要保留的資料記錄數    
SELECT COUNT(1) FROM TEST PARTITION(P201903) A WHERE ROWID NOT IN (
    SELECT MIN(ROWID_OLD) ROWID_OLD FROM TEST_TMP 
    WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01' 
    GROUP BY COL1,COL2,COL3,INSERTTIME 
    HAVING COUNT(1) > 1)
AND INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01'

2.3 利用分批提交刪除重複資料

DECLARE
      TYPE ROWID_LIST IS TABLE OF UROWID INDEX BY BINARY_INTEGER;
      ROWID_INFOS ROWID_LIST;
      I NUMBER;
      CURSOR C_ROWIDS IS  (SELECT MIN(ROWID_OLD) ROWID_OLD
                            FROM TEST_TMP 
                           WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01' 
                           GROUP BY  COL1,COL2,COL3,INSERTTIME
                           HAVING COUNT(1) > 1);
  BEGIN
      OPEN C_ROWIDS;
      LOOP
      --此處LIMIT後的值為分批提交的記錄數,可以根據實際情況調整
       FETCH C_ROWIDS BULK COLLECT INTO ROWID_INFOS LIMIT 10000;
       FORALL I IN 1..ROWID_INFOS.COUNT
        --如下的DELETE語句為分批提交實際需要執行的部分
        DELETE FROM TEST WHERE ROWID=ROWID_INFOS(I);
       COMMIT;
       EXIT WHEN ROWID_INFOS.COUNT<10000;
    END LOOP;
    CLOSE C_ROWIDS;
 END;

2.4 確認無重複資料

SELECT * FROM (
    SELECT COL1,COL2,COL3,INSERTTIME FROM TEST PARTITION(P201903) A 
        WHERE INSERTTIME >= DATE'2019-03-31' AND INSERTTIME < DATE'2019-04-01'
        GROUP BY COL1,COL2,COL3,INSERTTIME
        HAVING COUNT(1)>1 )
);

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69923980/viewspace-2676483/,如需轉載,請註明出處,否則將追究法律責任。

相關文章