chetkhatri
01/08/2023, 9:30 PMJani Sourander
01/12/2023, 1:06 PMCREATE TEMPORARY FUNCTION
increment_one(x DOUBLE) RETURNS DOUBLE RETURN x + 1;
SELECT
typeof(my.nested.data.factor), -- array<array<double>>
TRANSFORM(my.nested.data, x -> transform(x, y -> y.factor + 1)) as this_works,
-- Output: [[2], [2]]
TRANSFORM(my.nested.data, x -> transform(x, y -> abs(y.factor))) as also_works,
-- Output: [[1], [1]]
TRANSFORM(my.nested.data, x -> transform(x, y -> increment_one(y.factor))) as does_not_work
-- Output: AnalysisException
FROM my_dummy_data
The exception thrown is: AnalysisException: Resolved attribute(s) y#4454 missing from in operator !Project [cast(lambda y#4454.factor as double) AS x#4455].; line 7 pos 59
I would expect that I could call the increment_one()
similarly as abs()
. I would expect them both to return a double. If I replace the y.factor
with a literal value such as 5 (thus calling increment_one(5)
inside the nested transform), my function will work as expected.chetkhatri
01/14/2023, 4:00 PMCarly Akerly
01/26/2023, 9:24 PMZohaa Qamar
01/29/2023, 3:57 AM# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 27571"...
I have tried using various instance types like 20-25 core instances of m5.16xlarge and r5.12xlarge. Also, tried playing around spark configurations like spark.driver.memory and spark.executor.memory from 30g to 300g, but nothing helped. The job has not any major computation but simply its spark.coalesce.write.partitionBy.parquet. Also tried setting _HADOOP_HEAPSIZE_ in the configuration to 100g and 200g. Please ask if more information is required. Thanks. Here is the screenshot of EMR executors:Carly Akerly
01/31/2023, 8:47 PMMike M
02/08/2023, 3:40 AMCarly Akerly
02/08/2023, 5:12 PMsabari dass
02/09/2023, 7:20 PMSK
02/11/2023, 4:32 AMMohan
02/11/2023, 5:26 PMa.usr= dbutils.secrets.get(scope = "test1",key = "test1-key",)
a.pwd = dbutils.secrets.get(scope = "test1",key = "test2-key"
,) CREATE TEMPORARY VIEW test
USING JDBC
options(driver=${<http://a.td|a.td>_driver},
url=${<http://a.td|a.td>_url},
dbtable=${<http://a.td|a.td>_tbl},
user=${a.usr},
password=${a.pwd}
)
How to format/handle the a.usr & a.pwd from extracted from azure key vault inside the JDBC options parameters in the sql way. Py way of parameterizing the credentials works, but want to do it the sql way to minimize lot of changes existing app. Thanks !sabari dass
02/14/2023, 6:08 PMBhopender Yadav
02/17/2023, 7:01 AMJerome Myers
02/18/2023, 7:45 PMAllie Ubisse
02/20/2023, 3:18 PMspark.hadoop.datanucleus.connectionPoolingType hikari
We got the same error as below.
Attempt 2:
Docs :
• https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/10.4ml
• https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/10.4#hikaricp-is-now-the-default-hive-metastore-connection-pool
installed packages: x
So 10.4 ML is different to 10.4 in that 10.4 ML is not using hikari (and has the below libraries missing from the databricks Libraries UI)
• We installed the following (because this is the standard from 10.4 non-ml version):
org.datanucleus:datanucleus-api-jdo:4.2.4
org.datanucleus:datanucleus-core:4.1.17
org.datanucleus:datanucleus-rdbms:4.1.19
org.datanucleus:javax.jdo:3.2.0-m3
configuration that we've tried:
spark.hadoop.datanucleus.connectionPoolingType hikari
spark.hadoop.datanucleus.connectionPoolingType HikariCP
spark.databricks.hive.metastore.client.pool.type hikari
spark.databricks.hive.metastore.client.pool.type HikariCP
All jar files were added and we are getting the following error after restarting.
``org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)`
... 128 more
`Caused by: java.lang.Throwable: Attempt to invoke the "hikari" plugin to create a ConnectionPool gave an error : The connection pool plugin of type "hikari" was not found in the CLASSPATH!``
Any one with knowledge on how to resolve this issue, we would appreciate your help. ThanksLucas Zago
02/23/2023, 11:43 AMLennart Skogmo
02/25/2023, 6:39 PMsabari dass
02/26/2023, 11:05 PMWagner Silveira
03/01/2023, 6:50 PMCarly Akerly
03/01/2023, 9:38 PMBarak Haver
03/08/2023, 12:32 PMforeachBatch
method, and the other transform and writes inside the batch function.
Are their side effects/issues/things worth taking into consideration?
Thanks!
Example:sabari dass
03/08/2023, 7:50 PMLennart Skogmo
03/13/2023, 10:04 PMOmar
03/21/2023, 8:19 PMPrashant Aggarwal
03/22/2023, 12:18 PMMartin
03/22/2023, 3:42 PMTianoKlein
03/27/2023, 12:43 PMsabari dass
04/03/2023, 3:26 PMJani Sourander
04/06/2023, 5:05 AMNagendra Darla
04/06/2023, 5:52 AMSparkSession spark = SparkSession.builder()
.config("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.config("fs.AbstractFileSystem.s3.impl", "org.apache.hadoop.fs.s3a.S3A")
.config("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.delta.logStore.s3.impl", "io.delta.storage.S3DynamoDBLogStore")
.config("spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName", "delta_log")
.config("spark.io.delta.storage.S3DynamoDBLogStore.ddb.region", "us-east-1")
.config("spark.io.delta.storage.S3DynamoDBLogStore.credentials.provider",
"com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
.getOrCreate();