Anmol Rastogi
05/05/2023, 7:55 AM"""
full_data_load.py
~~~~~~~~~~~
This Python module is used to load data from RDS to data lake storage (i.e. S3 with respect to AWS)
"""
import os
import deltalake
env = 'dev'
region = 'ap-south-1'
os.environ["ENV"] = env
os.environ["REGION"] = region
bucket = 'analytics-pipeline-ap-south-1'
def main():
data_read = deltalake.table.DeltaTable(f's3://{bucket}/DeltaLake_convert/Anmol', storage_options={'region': region})
print(data_read)
# entry point for ELT application
if __name__ == '__main__':
main()
while doing this i am getting following error:
deltalake.PyDeltaTableError: Not a Delta table: No snapshot or version 0 found, perhaps s3://{bucket}/DeltaLake_convert/Anmol is an empty dir?
while if you see below image the deltalog directory and parquet exists there.
am i doing something wrong?
Note: I am not looking for something with spark. 🙂Gerhard Brueckl
05/05/2023, 8:32 AM00000000.json
file in there?Anmol Rastogi
05/05/2023, 9:47 AMGerhard Brueckl
05/05/2023, 9:53 AMAnmol Rastogi
05/05/2023, 9:55 AMGerhard Brueckl
05/05/2023, 10:05 AMWill Jones
05/05/2023, 2:52 PM.json.tmp
in the _delta_log
folder, that means the transaction failed to commit. I notice in your snippet you have a try: … except:
that captures all exceptions. That’s probably hiding the error that’s causing it to fail.