Jani Sourander
01/12/2023, 1:06 PMCREATE TEMPORARY FUNCTION
increment_one(x DOUBLE) RETURNS DOUBLE RETURN x + 1;
SELECT
typeof(my.nested.data.factor), -- array<array<double>>
TRANSFORM(my.nested.data, x -> transform(x, y -> y.factor + 1)) as this_works,
-- Output: [[2], [2]]
TRANSFORM(my.nested.data, x -> transform(x, y -> abs(y.factor))) as also_works,
-- Output: [[1], [1]]
TRANSFORM(my.nested.data, x -> transform(x, y -> increment_one(y.factor))) as does_not_work
-- Output: AnalysisException
FROM my_dummy_data
The exception thrown is: AnalysisException: Resolved attribute(s) y#4454 missing from in operator !Project [cast(lambda y#4454.factor as double) AS x#4455].; line 7 pos 59
I would expect that I could call the increment_one()
similarly as abs()
. I would expect them both to return a double. If I replace the y.factor
with a literal value such as 5 (thus calling increment_one(5)
inside the nested transform), my function will work as expected.ruslan
01/12/2023, 8:36 PMJani Sourander
01/13/2023, 5:51 AMudf
method is deprecated, so using a simple udf(myFunc, colSchema)
is not a good idea. One can always return a json string and parse that later on, but.. that does not sound like a best practice.
I wonder if simple examples exist anywhere of a Scala UDF that, say, takes array<long>
as an input parameter and returns odds and evens in different (named) struct fields: struct<odds:Long, evens:Long>
. Returning a Tuple2 works, but the field names will be _1
and _2
.ruslan
01/16/2023, 10:29 PM