This is a read-only copy of the MariaDB Knowledgebase generated on 2024-11-16. For the latest, interactive version please visit https://mariadb.com/kb/.

Identifying and removing duplicates

I have a table that contains duplicates of some rows on all fields except the transaction ID. The table is quite large (1/2 gig) and there are several columns in each row (32).

I am using the following coding to identify and remove the duplicates:-

ALTER TABLE jobtransactionactuallabour ADD COLUMN duplicate TINYINT(1)NOT NULL DEFAULT '0';

UPDATE jobtransactionactuallabour SET duplicate=if(@SI_JobYearNumber=SI_JobYearNumber and @I_JobNumber=I_JobNumber and @DT_ActualDate=DT_ActualDate and @I_Hours1=I_Hours1 and @C_OperationCode=C_OperationCode and @C_EmployeeCode=C_EmployeeCode,1,0), SI_JobYearNumber=(@SI_JobYearNumber:=SI_JobYearNumber), I_JobNumber=(@I_JobNumber:=I_JobNumber), DT_ActualDate=(@DT_ActualDate:=DT_ActualDate), I_Hours1=(@I_Hours1:=I_Hours1), C_OperationCode=(@C_OperationCode:=C_OperationCode), C_EmployeeCode=(@C_EmployeeCode:=C_EmployeeCode) ORDER BY SI_JobYearNumber, I_JobNumber, DT_ActualDate, I_Hours1, C_OperationCode, C_EmployeeCode;

DELETE FROM jobtransactionactuallabour WHERE duplicate=1;

ALTER TABLE jobtransactionactuallabour DROP COLUMN duplicate;

On first running this doesn't work but sometimes, on subsequent runs, it achieves the correct result.

It would appear that the ORDER BY clause in the UPDATE doesn't always get the data in the correct order to identity duplicates.

I am at a loss to identify what it is that is causing the variable results and how to change my coding to achieve a consistent result each time.

Is there anyone that can shed some light on this problem?

Answer Answered by Ian Gilfillan in this comment.

It appears the values in all of your ORDER BY fields will be the same for more than one row, so your example does not specify a determinate order.

But since you have a unique ID, there's a much easier way to do this, without adding a new fields, or updating anything. See Removing Duplicates.

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.