https://delta.io logo
s

sabari dass

03/23/2023, 7:45 PM
Hi All, I have existing delta table name as XYZ (having 1M records) and having two columns namely ‘Name’ ( in string type) and ‘Comments’ (in string dtype). But now I want to change dtype as Array of MapType for ‘Comments’ column for the same table. Is there any way to handle this situation? Can anyone plz handle this using pyspark? Thanks!
j

JosephK (exDatabricks)

03/23/2023, 7:48 PM
There is schema evolution, but that’s not going to change the old comments column into an array. You’ll have to do a full read transform ovewrite with overwrite schema.
s

sabari dass

03/23/2023, 7:55 PM
Thanks @JosephK (exDatabricks). Do you have any code snippet for this scenario or any material to explore on it?
j

JosephK (exDatabricks)

03/23/2023, 8:08 PM
https://delta.io/blog/2023-02-08-delta-lake-schema-evolution/
Copy code
from pyspark.sql.functions import *
(spark.read.load(path)
 .select("Name", 
         split(col("Comment"), "`", 0).alias("Comment"))
 .write
 .mode("overwrite")
 .option("overwriteSchema", True)
 .save(path))
Split will create an array, I just chose a character that wasn’t in my string
s

sabari dass

03/23/2023, 8:24 PM
Thanks I will look into it