https://delta.io logo
l

Luis F Takahashi

03/06/2023, 10:57 AM
Hi everyone, I have create EMR cluster following this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Deltausing-cluster.html And now I'm trying to add a new step using airflow, however there is something wrong:
Copy code
from delta.tables import *
ModuleNotFoundError: No module named 'delta'
Here is how I added a new step:
Copy code
custom_args = [
                    "spark-submit", "--master", "yarn",
                    "--conf", '"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"',
                    "--conf", '"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"',
                    "--deploy-mode", EMR_DEPLOY_MODE,
                    "--py-files", f"<s3://youse-emr-assets>-{self.environment}/python-libraries/src/youse-datapipeline.zip",
                    self.execute_file,
                    f"--PATHS={path_list}",
                    f"--ENVIRONMENT={self.environment}",
                ]
                custom_args.extend(self.step_args)
                add_step = {
                    "Name": self.step_name,
                    "ActionOnFailure": "CONTINUE",
                    "HadoopJarStep": {  
                        "Jar": "command-runner.jar",
                        "Args": custom_args
                    }
                }
                SPARK_STEPS.append(add_step)
r

Rahul Sharma

03/06/2023, 3:30 PM
pip install delta-spark==2.1.0
please choose version according to your spark
2 Views