vendredi 31 juillet 2015

T-SQL - Deduplicate large table

Sorry if this has already been asked. I see a lot of similar questions but none exactly like this one.
I am trying to de-dup a large set (about 500 M) records:

Sample data:

CUST_ID  PROD_TYPE  VALUE  DATE
------------------------------------
1        1          Y      5/1/2015 *
1        2          N      5/1/2015 *
1        1          N      5/2/2015 *
1        2          N      5/2/2015 
1        1          Y      5/3/2015 *
1        2          Y      5/3/2015 *
1        1          Y      5/6/2015 
1        2          N      5/6/2015 *

By CUST_ID and PROD_TYPE, I need to retain the initial records as well as any records having a changed VALUE (the records with the asterisks). There can sometimes be gaps between the dates. There are around 5M unique CUST_ID's.

Any help would be greatly appreciated.

Aucun commentaire:

Enregistrer un commentaire