Deleting Duplicate Rows in SQL Server Based on a Single Column
Posted: October 7th, 2010 | Author: Bret Kuhns | Filed under: Technology | Tags: microsoft, SQL, SQL Server | No Comments »An import from a client’s product list resulted in some incorrectly duplicated products. Normally this would be easy to sort out, except that the duplicates differed by a category column, so the good ol’ DISTINCT keyword was of no help. This would also have been easy to do if working with MySQL by using a GROUP BY clause to select a row by distinct instances of a particular column. SQL Server, however, is another story. As I found, if you try the GROUP BY trick, SQL Server will whine.
Server: Msg 8120, Level 16, State 1, Line 1
Column ‘ItemNumber’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I found some reasoning for this online, which is all well and good, except that MySQL does this like a champ and makes life easier as a result. So off I went to find a solution for SQL Server… I eventually stumbled on a page from the SQL Authority, by Pinal Dave (a very humble guy, judging by the title of his site). The gem, however, wasn’t the post itself but a comment by Madhivanan further down the page. Thanks to Madhivanan, I came up with the following solution.
DELETE t FROM(select (row_number() over (partition by ItemNumberorder by ItemID)) as cnt, * from Item) as tWHERE t.cnt > 1
Leave a Reply