表名和表结构数据:users
id | unionid |
---|---|
1 | 111111 |
2 | 222222 |
3 | 222222 |
4 | 111111 |
5 | 111111 |
6 | 111111 |
7 | 111111 |
8 | 222222 |
9 | 111111 |
10 | 111111 |
11 | 111111 |
12 | 333333 |
13 | 111111 |
方案一
删除 unionid
字段重复的数据,并保留 ID 最小的
DELETE FROM users
WHERE id NOT IN (
SELECT u.min_id
FROM (SELECT MIN(id) AS min_id FROM users GROUP BY unionid) AS u
);
删除后的结果
id | unionid |
---|---|
1 | 111111 |
2 | 222222 |
12 | 333333 |
实际测试说明:本机测试
表中数据条数:417936
重复的 unionid
条数:2188
重复的 unionid
条数总数:19179(14796x1 + 4x2 + 3x5 + 2180x2)
最终删除数据:16991
执行时间:1′12″6
方案二
表名和表结构数据:users
id | unionid |
---|---|
1 | 444444 |
2 | 222222 |
12 | 333333 |
14 | 111111 |
15 | 111111 |
16 | 333333 |
17 | 444444 |
执行删除命令
DELETE u1
FROM
users AS u1,users AS u2
WHERE u1.unionid=u2.unionid AND u1.id > u2.id;
删除后的数据
id | unionid |
---|---|
1 | 444444 |
2 | 222222 |
12 | 333333 |
14 | 111111 |
1、找出想保留数据的id,这一步是最关键的一步
# id 是自增的情况,很简单
SELECT MIN(id) AS min_id FROM users GROUP BY unionid;
# id 不是自增,如 UUID 时,就需要根据插入时间来排序了,别告诉我你这个也没有
SELECT min(created_at) time,openid FROM channel GROUP BY openid;
2、把其他的数据删除
附录:其他操作,随便说一下
- 查询重复数据(多字段)
select * from miniapp_user_activity_infos_temp a where (a.openid,a.map_id,a.created_at) IN
(
select openid,map_id,created_at from miniapp_user_activity_infos_temp
group by openid,map_id,created_at having count(*) > 1
)
- 查出数据(保留ID最小的一条)
select * from channel where id IN (select min(id) from channel GROUP BY openid);
- 看有多少条数据
select count(1) from channel where id In (select min(id) from channel GROUP BY openid);
附录:https://www.jb51.net/article/116677.htm