Zero copy cloning is one of the cool and innovative features of Snowflake. In this tutorial you will learn:
Cloning, also referred to as “zero-copy cloning” creates a copy of a database, schema or table. A snapshot of data present in the source object is taken when the clone is created, and is made available to the cloned object. The cloned object is writable, and is independent of the clone source. That is, changes made to either the source object or the clone object are not part of the other
Here are some of the benefits of zero copy cloning:
You normally have to wait hours, days, maybe weeks to spin up a copy of your production data warehouse so you could have a test or a development environment? And have to pay extra for the test or development environment to able to hold all the replicated data? Or maybe you have some new data scientists that just want to play around with the data but they really need their own copy?
Cloning is very fast (as in it takes barely a minute to clone a 2TB database!), it allows you to have multiple copies of your data without the additional cost of storage usually associated with replicating data
Zero Copy cloning creates a copy of the object without having to duplicate the underlying storage. When a clone is created of a table, the clone utilizes no data storage because it shares all the existing micro-partitions of the original table at the time it was cloned; however, rows can then be added, deleted, or updated in the clone independently from the original table. Each change to the clone results in new micro-partitions that are owned exclusively by the clone and are protected through CDP
To clone a entire production database and make it available for development:
CREATE DATABASE Dev CLONE Prod;
To clone a schema
CREATE SCHEMA Dev.DataSchema1 CLONE Prod.DataSchema1;
To clone a single table:
CREATE TABLE C CLONE Dev.public.C;
Here are some of the frequently asked questions:
No. the clone is its very own copy, any changes made to the clone are local to the clone
No, when the clone is created it is a point in time view of the data. If you want a clone copy which receives updates from the source you could use Snowflake Data Sharing
There is no additional storage cost for the zero copy cloning as all the data is shared from the source version. The only additional cost would be for any changed data blocks which are local to the cloned copy
Time travel allows you to query tables as they were at a point in time in the past, up to 90 days with Enterprise edition. You can think of cloning as a point in time snapshot. You can clone from any point in time version within the 90 day retention period. Any clones which exist beyond the retention period are retained which means clones can be used as a form of backup
Yes, the cloned table will still have the original data
Snowflake uses immutable blocks in the object store. When you delete something, it isn't really deleted. Instead, the Snowflake Global Services layer just records that the blocks are no longer relevant for that table. If table 1 had 0 days of retention, those blocks would have eventually been flushed after the data had been deleted. However, if you had cloned table 1 prior to deleting the data, the Global Services layer would remember those blocks were still being referenced and they wouldn't be flushed