https://delta.io logo
s

Shazeb Fahmi

04/27/2023, 5:40 PM
We are moving 1tb data from 1 storage account to another (both mounted in Azure databricks) using dbutils.fs.mv(), this is slow operation. Meanwhile moving the same data manually by using "Microsoft Azure storage explorer" is 100x faster. Anyone has any suggestions for any APIs which can be used to perform this data movement using this higher speed transfer speed?
m

Martin

04/27/2023, 5:49 PM
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10 This is what Azure Storage Explorer is using under the hood
g

Gerhard Brueckl

04/27/2023, 6:20 PM
I would probably try to use Azure Data Factory - it scales pretty well for these scenarios
if you are moving Delta Lake tables you should also have a look at DEEP CLONES we copied 100TB of data in about 3 hours this way
g

GapyNi

05/16/2023, 8:04 PM
Hi all, to this topic, as far as i understand if it is in the same storage account i can just copy everything (including _checkpoint) to a new folder and it should work - i have tested it and it seems that streaming checkpoint does not save any absolute path. I would just need to re-register Hive tables to point to the new location. Regards, Gapy
2 Views