https://delta.io logo
s

sabari dass

03/24/2023, 9:30 PM
Hi All, I have a scenario where I need to add key-value pair (maptype) to an existing maptype data using pyspark. Example: I/P: Day_field | id_field | maptype_field Day1 | 1 | {k1->v1} Day2 | 1 | {k2->v2} Day3 | 1 | {k1->v1_updt} Day4 | 1 | {k3:->v3} O/P should be : Day4 | 1 | {k3->v3, k2->v2,k1->v1_updt} Can anyone plz help me on this?
c

Christopher Grant

03/27/2023, 5:10 PM
please post non-delta questions to #random next time. it seems like you want to "concatenate" maps over an ordered window, where you want the latest value from each key, and in PySpark. this is definitely possible but not an out of the box feature. this is just one method, there are others: (1) pull the values from the map into separate StructFields (2) use the
last_value
or
first_value
functions over a window (and set the ignoreNulls option to true) - you would need one of these functions per column (3)
rank
(or row_number) over a similar window from (2) but making sure you're ordering in descending order (4) filter for the latest for each column (where your value from (3) is 1) (5) translate this back to a map
2 Views