How to delete duplicate records in a Snowflake Table
Often there are dupes in a table which needs to be cleaned up. Snowflake doesn't have ROWID to delete duplicate records. You cannot use the ROW_NUMBER() function to delete rows in Snowflake.
To remove the duplicates, Find out if the entire "row" (all columns) is a duplicate or just a few columns?
1. How to remove the full row duplicate record in Snowflake table:
If all columns, then the only solution is to do a SELECT DISTINCT from the table into a new table (and then rename/swap table names)
Step-1:
create table mytable_copy as select distinct * from mytable;
Step-2:
drop table mytable;
alter table mytable_copy rename to mytable;
2. How to remove duplicate record based on KEY field in Snowflake table:
In some instances, there are duplicate records based on the KEY column and not full row dupes. In this case, you can safely remove the unwanted record alone using the below method. In this example, we are keeping the latest record for each KEY field and removing the oldest record.
KEY | COLX | COLY | COLZ |
1 | A | B | 2018-10-01 |
1 | A | B | 2018-10-02 |
2 | A | B | 2018-10-01 |
2 | A | B | 2018-10-02 |
DELETE FROM MY_TABLE T
USING (
SELECT KEY
,COLZ
,ROW_NUMBER() OVER (PARTITION BY KEY ORDER BY COLZ DESC) AS RANK_IN_KEY
FROM MY_TABLE T
) X
WHERE X.RANK_IN_KEY <> 1
AND T.KEY = X.KEY
AND T.COLZ = X.COLZ;
You can use this code as a base and tinker around a little to fit your needs! What are your favorite code snippets, Show us using a new article!
Atori
posted on 29 Oct 18Enjoy great content like this and a lot more !
Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds
Post Comment