Luis F Takahashi
03/06/2023, 10:57 AMfrom delta.tables import *
ModuleNotFoundError: No module named 'delta'
Here is how I added a new step:custom_args = [
"spark-submit", "--master", "yarn",
"--conf", '"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"',
"--conf", '"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"',
"--deploy-mode", EMR_DEPLOY_MODE,
"--py-files", f"<s3://youse-emr-assets>-{self.environment}/python-libraries/src/youse-datapipeline.zip",
self.execute_file,
f"--PATHS={path_list}",
f"--ENVIRONMENT={self.environment}",
]
custom_args.extend(self.step_args)
add_step = {
"Name": self.step_name,
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": custom_args
}
}
SPARK_STEPS.append(add_step)
Rahul Sharma
03/06/2023, 3:30 PM