How to DELETE duplicate records from BigQuery table
#1 If the table has multiple records for each Id / Key and you want to keep only the latest record
For example, your table has duplicate records and you want to keep one record and get rid of the duplicate records, You can use the below query to cleanup the table
Note in this example, Id is the Key and for each Id there has to different loadTime for each record for this method to work
DELETE FROM `yourproject.yourdataset.duplicates`
WHERE STRUCT(id, loadTime) NOT IN (
SELECT AS STRUCT id, MAX(loadTime) loadTime
FROM `yourproject.yourdataset.duplicates`
GROUP BY id)
#2 How to remove absolute full row duplicates from Big query table
The easiest way is to re-create the whole table in place using DISTINCT. You need to use the same parameters (PARTITION BY) for the table.
CREATE OR REPLACE TABLE `transactions.testdata`
PARTITION BY date
AS SELECT DISTINCT * FROM `transactions.testdata`;
For unpartitioned table use the below query:
CREATE OR REPLACE TABLE `transactions.testdata`
AS SELECT DISTINCT * FROM `transactions.testdata`;
DataFreak
posted on 28 Aug 18Enjoy great content like this and a lot more !
Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds
Post Comment