SQL 語法 SELECT Duplicate Row (Column)

如何找到重覆的資料 (Duplicate Row , Column or Record)

測試環境為 CentOS 8 x86_64 (虛擬機)

先建立一個測試用資料庫 (testdb) , 與 tables (employee) 格式為 K1 int(11) – auto_increment & PRIMARY KEY , Name char(20), Dept char(20), jobTitle char(20) ,email char(30) 以及 Salary int(11).

[root@localhost ~]# mysql -u root -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.3.11-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

將資料庫字元編碼 ( Character Sets ) 設定為 utf8 (8-bit Unicode Transformation Format) 與 文字排序 ( Collations ) 設定為utf8_general_ci .

MariaDB [(none)]> CREATE DATABASE testdb DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.001 sec)

MariaDB [(none)]> USE testdb;
Database changed
MariaDB [testdb]> CREATE TABLE employee (K1 int(11) NOT NULL auto_increment, Name char(20) NOT NULL, Dept char(20) NOT NULL, JobTitle char(20) NOT NULL, Salary int(11) NOT NULL , email char(30) NOT NULL, PRIMARY KEY (K1)) ;
Query OK, 0 rows affected (0.010 sec)

MariaDB [testdb]> DESCRIBE employee; 
+----------+----------+------+-----+---------+----------------+
| Field    | Type     | Null | Key | Default | Extra          |
+----------+----------+------+-----+---------+----------------+
| K1       | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name     | char(20) | NO   |     | NULL    |                |
| Dept     | char(20) | NO   |     | NULL    |                |
| JobTitle | char(20) | NO   |     | NULL    |                |
| Salary   | int(11)  | NO   |     | NULL    |                |
| email    | char(30) | NO   |     | NULL    |                |
+----------+----------+------+-----+---------+----------------+
6 rows in set (0.004 sec)

透過 INSERT 新增加了五筆資料.

MariaDB [testdb]> INSERT INTO employee (Name , Dept , JobTitle , Salary , email) VALUES ('Ben' , 'Testing' , 'Engineer' , '45000' , 'ben@benjr.tw') , ('Afa' , 'Power' , 'Engineer' , '48000' , 'Afa@benjr.tw') , ('Boss' ,'Testing' , 'Manager' , '75000' , 'Boss@benjr.tw') , ('Ben' , 'Testing' , 'Engineer' , '45000' , 'ben@benjr.tw') ,  ('Ben' , 'PM' , 'Engineer' , '85000' , 'ben10@benjr.tw');
Query OK, 5 rows affected (0.002 sec)
Records: 5  Duplicates: 0  Warnings: 0

MariaDB [testdb]> SELECT * FROM employee;
+----+------+---------+----------+--------+----------------+
| K1 | Name | Dept    | JobTitle | Salary | email          |
+----+------+---------+----------+--------+----------------+
|  1 | Ben  | Testing | Engineer |  45000 | ben@benjr.tw   |
|  2 | Afa  | Power   | Engineer |  48000 | Afa@benjr.tw   |
|  3 | Boss | Testing | Manager  |  75000 | Boss@benjr.tw  |
|  4 | Ben  | Testing | Engineer |  45000 | ben@benjr.tw   |
|  5 | Ben  | PM      | Engineer |  85000 | ben10@benjr.tw |
+----+------+---------+----------+--------+----------------+
5 rows in set (0.001 sec)

COUNT() 函數

COUNT 函數會回傳 SELECT 的筆數,通常搭配 GROUP BY 使用,這樣就可以查出哪些資料是重覆的.
下面會依據 Name 來查詢哪些資料是重覆的.

MariaDB [testdb]> SELECT Name , email, COUNT(Name) FROM employee GROUP BY Name HAVING COUNT(name) > 1;
+------+--------------+-------------+
| Name | email        | COUNT(Name) |
+------+--------------+-------------+
| Ben  | ben@benjr.tw |           3 |
+------+--------------+-------------+
1 row in set (0.001 sec)

下面除了 Name 之外還考慮了 email 是否一樣重覆了.

MariaDB [testdb]> SELECT Name , email, COUNT(Name) FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1;
+------+--------------+-------------+
| Name | email        | COUNT(Name) |
+------+--------------+-------------+
| Ben  | ben@benjr.tw |           2 |
+------+--------------+-------------+
1 row in set (0.001 sec)

雖然知道哪一個 Name 與 email 是重覆的,我們還需要知道他們的 K1 (Primary Key) 值是多少?需搭配 GROUP_CONCAT 函數,可以把同 Group 的欄位文字整合在一起.

MariaDB [testdb]> SELECT GROUP_CONCAT(K1) , Name , email, COUNT(Name) FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1;
+------------------+------+--------------+-------------+
| GROUP_CONCAT(K1) | Name | email        | COUNT(Name) |
+------------------+------+--------------+-------------+
| 1,4              | Ben  | ben@benjr.tw |           2 |
+------------------+------+--------------+-------------+
1 row in set (0.001 sec)

SELECT INNER JOIN (SELECT)

這個方式是先搜尋特定有重覆的欄位,再透過 Inner Join 顯示這些有重覆的的資料.

MariaDB [testdb]> SELECT K1 , employee.Name , email FROM employee INNER JOIN (SELECT Name FROM employee GROUP BY Name HAVING COUNT(name) > 1) dup ON employee.Name = dup.Name;
+----+------+----------------+
| K1 | Name | email          |
+----+------+----------------+
|  1 | Ben  | ben@benjr.tw   |
|  4 | Ben  | ben@benjr.tw   |
|  5 | Ben  | ben10@benjr.tw |
+----+------+----------------+
3 rows in set (0.001 sec)

下面除了 Name 之外還考慮了 email 是否一樣重覆了.

MariaDB [testdb]> SELECT K1 , employee.Name , employee.email FROM employee INNER JOIN (SELECT Name , email FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1 ) dup ON employee.Name = dup.Name AND employee.email = dup.email;
+----+------+--------------+
| K1 | Name | email        |
+----+------+--------------+
|  1 | Ben  | ben@benjr.tw |
|  4 | Ben  | ben@benjr.tw |
+----+------+--------------+
2 rows in set (0.001 sec)

SELECT WHERE IN (SELECT)

這個方式是先搜尋特定有重覆的欄位,再透過 Where 顯示這些有重覆的的資料.

MariaDB [testdb]> SELECT K1 , Name , email FROM employee WHERE Name IN ( SELECT Name FROM employee GROUP BY Name HAVING count(Name) > 1 ) ;
+----+------+----------------+
| K1 | Name | email          |
+----+------+----------------+
|  1 | Ben  | ben@benjr.tw   |
|  4 | Ben  | ben@benjr.tw   |
|  5 | Ben  | ben10@benjr.tw |
+----+------+----------------+
3 rows in set (0.001 sec)

因為 WHERE IN 的限制,只能針對單一欄位做搜尋 (多欄位時會顯示錯誤訊息 – ERROR 1241 (21000): Operand should contain 1 column(s)).

沒有解決問題,試試搜尋本站其他內容

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步瞭解 Akismet 如何處理網站訪客的留言資料