如何找到重覆的資料 (Duplicate Row , Column or Record)
測試環境為 CentOS 8 x86_64 (虛擬機)
先建立一個測試用資料庫 (testdb) , 與 tables (employee) 格式為 K1 int(11) – auto_increment & PRIMARY KEY , Name char(20), Dept char(20), jobTitle char(20) ,email char(30) 以及 Salary int(11).
[root@localhost ~]# mysql -u root -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 9 Server version: 10.3.11-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
將資料庫字元編碼 ( Character Sets ) 設定為 utf8 (8-bit Unicode Transformation Format) 與 文字排序 ( Collations ) 設定為utf8_general_ci .
MariaDB [(none)]> CREATE DATABASE testdb DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci; Query OK, 1 row affected (0.001 sec) MariaDB [(none)]> USE testdb; Database changed MariaDB [testdb]> CREATE TABLE employee (K1 int(11) NOT NULL auto_increment, Name char(20) NOT NULL, Dept char(20) NOT NULL, JobTitle char(20) NOT NULL, Salary int(11) NOT NULL , email char(30) NOT NULL, PRIMARY KEY (K1)) ; Query OK, 0 rows affected (0.010 sec) MariaDB [testdb]> DESCRIBE employee; +----------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+----------+------+-----+---------+----------------+ | K1 | int(11) | NO | PRI | NULL | auto_increment | | Name | char(20) | NO | | NULL | | | Dept | char(20) | NO | | NULL | | | JobTitle | char(20) | NO | | NULL | | | Salary | int(11) | NO | | NULL | | | email | char(30) | NO | | NULL | | +----------+----------+------+-----+---------+----------------+ 6 rows in set (0.004 sec)
透過 INSERT 新增加了五筆資料.
MariaDB [testdb]> INSERT INTO employee (Name , Dept , JobTitle , Salary , email) VALUES ('Ben' , 'Testing' , 'Engineer' , '45000' , 'ben@benjr.tw') , ('Afa' , 'Power' , 'Engineer' , '48000' , 'Afa@benjr.tw') , ('Boss' ,'Testing' , 'Manager' , '75000' , 'Boss@benjr.tw') , ('Ben' , 'Testing' , 'Engineer' , '45000' , 'ben@benjr.tw') , ('Ben' , 'PM' , 'Engineer' , '85000' , 'ben10@benjr.tw'); Query OK, 5 rows affected (0.002 sec) Records: 5 Duplicates: 0 Warnings: 0 MariaDB [testdb]> SELECT * FROM employee; +----+------+---------+----------+--------+----------------+ | K1 | Name | Dept | JobTitle | Salary | email | +----+------+---------+----------+--------+----------------+ | 1 | Ben | Testing | Engineer | 45000 | ben@benjr.tw | | 2 | Afa | Power | Engineer | 48000 | Afa@benjr.tw | | 3 | Boss | Testing | Manager | 75000 | Boss@benjr.tw | | 4 | Ben | Testing | Engineer | 45000 | ben@benjr.tw | | 5 | Ben | PM | Engineer | 85000 | ben10@benjr.tw | +----+------+---------+----------+--------+----------------+ 5 rows in set (0.001 sec)
COUNT() 函數
COUNT 函數會回傳 SELECT 的筆數,通常搭配 GROUP BY 使用,這樣就可以查出哪些資料是重覆的.
下面會依據 Name 來查詢哪些資料是重覆的.
MariaDB [testdb]> SELECT Name , email, COUNT(Name) FROM employee GROUP BY Name HAVING COUNT(name) > 1; +------+--------------+-------------+ | Name | email | COUNT(Name) | +------+--------------+-------------+ | Ben | ben@benjr.tw | 3 | +------+--------------+-------------+ 1 row in set (0.001 sec)
下面除了 Name 之外還考慮了 email 是否一樣重覆了.
MariaDB [testdb]> SELECT Name , email, COUNT(Name) FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1; +------+--------------+-------------+ | Name | email | COUNT(Name) | +------+--------------+-------------+ | Ben | ben@benjr.tw | 2 | +------+--------------+-------------+ 1 row in set (0.001 sec)
雖然知道哪一個 Name 與 email 是重覆的,我們還需要知道他們的 K1 (Primary Key) 值是多少?需搭配 GROUP_CONCAT 函數,可以把同 Group 的欄位文字整合在一起.
MariaDB [testdb]> SELECT GROUP_CONCAT(K1) , Name , email, COUNT(Name) FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1; +------------------+------+--------------+-------------+ | GROUP_CONCAT(K1) | Name | email | COUNT(Name) | +------------------+------+--------------+-------------+ | 1,4 | Ben | ben@benjr.tw | 2 | +------------------+------+--------------+-------------+ 1 row in set (0.001 sec)
SELECT INNER JOIN (SELECT)
這個方式是先搜尋特定有重覆的欄位,再透過 Inner Join 顯示這些有重覆的的資料.
MariaDB [testdb]> SELECT K1 , employee.Name , email FROM employee INNER JOIN (SELECT Name FROM employee GROUP BY Name HAVING COUNT(name) > 1) dup ON employee.Name = dup.Name; +----+------+----------------+ | K1 | Name | email | +----+------+----------------+ | 1 | Ben | ben@benjr.tw | | 4 | Ben | ben@benjr.tw | | 5 | Ben | ben10@benjr.tw | +----+------+----------------+ 3 rows in set (0.001 sec)
下面除了 Name 之外還考慮了 email 是否一樣重覆了.
MariaDB [testdb]> SELECT K1 , employee.Name , employee.email FROM employee INNER JOIN (SELECT Name , email FROM employee GROUP BY Name , email HAVING COUNT(name) > 1 AND COUNT(email) > 1 ) dup ON employee.Name = dup.Name AND employee.email = dup.email; +----+------+--------------+ | K1 | Name | email | +----+------+--------------+ | 1 | Ben | ben@benjr.tw | | 4 | Ben | ben@benjr.tw | +----+------+--------------+ 2 rows in set (0.001 sec)
SELECT WHERE IN (SELECT)
這個方式是先搜尋特定有重覆的欄位,再透過 Where 顯示這些有重覆的的資料.
MariaDB [testdb]> SELECT K1 , Name , email FROM employee WHERE Name IN ( SELECT Name FROM employee GROUP BY Name HAVING count(Name) > 1 ) ; +----+------+----------------+ | K1 | Name | email | +----+------+----------------+ | 1 | Ben | ben@benjr.tw | | 4 | Ben | ben@benjr.tw | | 5 | Ben | ben10@benjr.tw | +----+------+----------------+ 3 rows in set (0.001 sec)
因為 WHERE IN 的限制,只能針對單一欄位做搜尋 (多欄位時會顯示錯誤訊息 – ERROR 1241 (21000): Operand should contain 1 column(s)).