https://delta.io logo
#random
Title
j

Jani Sourander

01/12/2023, 1:06 PM
I have a problem trying to run self-made function using the TRANSFORM() in Databricks. Does someone know the solution?
Copy code
CREATE TEMPORARY FUNCTION 
increment_one(x DOUBLE) RETURNS DOUBLE RETURN x + 1;

SELECT 
  typeof(my.nested.data.factor), -- array<array<double>>
  TRANSFORM(my.nested.data, x -> transform(x, y -> y.factor + 1)) as this_works,
  -- Output: [[2], [2]]
  TRANSFORM(my.nested.data, x -> transform(x, y -> abs(y.factor))) as also_works,
  -- Output: [[1], [1]]
  TRANSFORM(my.nested.data, x -> transform(x, y -> increment_one(y.factor))) as does_not_work
  -- Output: AnalysisException
FROM my_dummy_data
The exception thrown is:
AnalysisException: Resolved attribute(s) y#4454 missing from  in operator !Project [cast(lambda y#4454.factor as double) AS x#4455].; line 7 pos 59
I would expect that I could call the
increment_one()
similarly as
abs()
. I would expect them both to return a double. If I replace the
y.factor
with a literal value such as 5 (thus calling
increment_one(5)
inside the nested transform), my function will work as expected.
r

ruslan

01/12/2023, 8:36 PM
UDFs in lambda functions are not supported. The error message could have been better. There was an effort to fix this (tracked internally as SC-107365), but this is not prioritized.
j

Jani Sourander

01/13/2023, 5:51 AM
Ok, thank you for the info. It would be a great feature, though! Long SQL scripts can be fairly difficult to read and full of repetitive code. This is not an easy task to perform by replacing the whole lambda with a UDF. I've played around with Scala UDF's, but returning structs is a challenge. The return type in Scala
udf
method is deprecated, so using a simple
udf(myFunc, colSchema)
is not a good idea. One can always return a json string and parse that later on, but.. that does not sound like a best practice. I wonder if simple examples exist anywhere of a Scala UDF that, say, takes
array<long>
as an input parameter and returns odds and evens in different (named) struct fields:
struct<odds:Long, evens:Long>
. Returning a Tuple2 works, but the field names will be
_1
and
_2
.
r

ruslan

01/16/2023, 10:29 PM
From what I understand it should work for built-in SQL functions. Yes agreed it would be very useful. It’s currently broken in a couple of places, so might be a bigger effort to support this. I recommend reaching out to your account team to help prioritizing this work
4 Views