在實際應用中,很可能會碰到一些需要刪除某些欄位的重複記錄,我現在把我能想到的寫下來,望高手們補充。
1、
具體實現如下:
Table Create Table
———— ——————————————————–
users_groups CREATE TABLE `users_groups` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uid` int(11) NOT NULL,
`gid` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8
users_groups.txt內容:
1,11,502
2,107,502
3,100,503
4,110,501
5,112,501
6,104,502
7,100,502
8,100,501
9,102,501
10,104,502
11,100,502
12,100,501
13,102,501
14,110,501
mysql> load data infile `c:\users_groups.txt` into table users_groups fields
terminated by `,` lines terminated by `
`;
Query OK, 14 rows affected (0.05 sec)
Records: 14 Deleted: 0 Skipped: 0 Warnings: 0
具體實現如下:
Table Create Table
———— ——————————————————–
users_groups CREATE TABLE `users_groups` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uid` int(11) NOT NULL,
`gid` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8
users_groups.txt內容:
1,11,502
2,107,502
3,100,503
4,110,501
5,112,501
6,104,502
7,100,502
8,100,501
9,102,501
10,104,502
11,100,502
12,100,501
13,102,501
14,110,501
mysql> load data infile `c:\users_groups.txt` into table users_groups fields
terminated by `,` lines terminated by `
`;
Query OK, 14 rows affected (0.05 sec)
Records: 14 Deleted: 0 Skipped: 0 Warnings: 0
mysql> select * from users_groups;
query result(14 records)
id | uid | gid |
1 | 11 | 502 |
2 | 107 | 502 |
3 | 100 | 503 |
4 | 110 | 501 |
5 | 112 | 501 |
6 | 104 | 502 |
7 | 100 | 502 |
8 | 100 | 501 |
9 | 102 | 501 |
10 | 104 | 502 |
11 | 100 | 502 |
12 | 100 | 501 |
13 | 102 | 501 |
14 | 110 | 501 |
14 rows in set (0.00 sec)
根據一位兄弟的建議修改。
mysql> create temporary table tmp_wrap select * from users_groups group by uid having count(1) >= 1;
Query OK, 7 rows affected (0.11 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> truncate table users_groups;
Query OK, 14 rows affected (0.03 sec)
mysql> insert into users_groups select * from tmp_wrap;
Query OK, 7 rows affected (0.03 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> select * from users_groups;
query result(7 records)
id | uid | gid |
1 | 11 | 502 |
2 | 107 | 502 |
3 | 100 | 503 |
4 | 110 | 501 |
5 | 112 | 501 |
6 | 104 | 502 |
9 | 102 | 501 |
mysql> drop table tmp_wrap;
Query OK, 0 rows affected (0.05 sec)
2、還有一個很精簡的辦法。
查詢重複的,並且除掉最小的那個。
delete users_groups as a from users_groups as a,
(
select *,min(id) from users_groups group by uid having count(1) > 1
) as b
where a.uid = b.uid and a.id > b.id;
(
select *,min(id) from users_groups group by uid having count(1) > 1
) as b
where a.uid = b.uid and a.id > b.id;
(7 row(s)affected)
(0 ms taken)
(0 ms taken)
query result(7 records)
id | uid | gid |
1 | 11 | 502 |
2 | 107 | 502 |
3 | 100 | 503 |
4 | 110 | 501 |
5 | 112 | 501 |
6 | 104 | 502 |
9 | 102 | 501 |
3、現在來看一下這兩個辦法的效率。
執行一下以下SQL 語句
create index f_uid on users_groups(uid);
explain select * from users_groups group by uid having count(1) > 1 union all
select * from users_groups group by uid having count(1) = 1;
explain select * from users_groups group by uid having count(1) > 1 union all
select * from users_groups group by uid having count(1) = 1;
explain select * from users_groups as a,
(
select *,min(id) from users_groups group by uid having count(1) > 1
) as b
where a.uid = b.uid and a.id > b.id;
(
select *,min(id) from users_groups group by uid having count(1) > 1
) as b
where a.uid = b.uid and a.id > b.id;
query result(3 records)
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | PRIMARY | users_groups | index | (NULL) | f_uid | 4 | (NULL) | 14 | |
2 | UNION | users_groups | index | (NULL) | f_uid | 4 | (NULL) | 14 | |
(NULL) | UNION RESULT | <union1,2> | ALL | (NULL) | (NULL) | (NULL) | (NULL) | (NULL) |
query result(3 records)
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | PRIMARY | <derived2> | ALL | (NULL) | (NULL) | (NULL) | (NULL) | 4 | |
1 | PRIMARY | a | ref | PRIMARY,f_uid | f_uid | 4 | b.uid | 1 | Using where |
2 | DERIVED | users_groups | index | (NULL) | f_uid | 4 | (NULL) | 14 |
很明顯的第二個比第一個掃描的函式要少。