https://delta.io logo
t

Trey Yi

03/29/2023, 11:16 AM
Hi team, Anyone found some workarounds to mitigate the following error? I know the feature request is already on github, but wonder if someone has a workaround
Copy code
Failed to merge decimal types with incompatible scale 4 to 8
o

Omkar

03/29/2023, 3:06 PM
Yes, as mentioned in this comment, Delta Lake currently doesn't support merging different decimal types. As a possible workaround, you can try getting all the DecimalType columns to the same precision and scale and then retry the merge. Example: Let's say we've two dataframes (one from delta table and other for new data), with column
score
•
delta_table with score = [20.38121234, 80.12341234] --> DecimalType(30,8)
•
new_data with score = [30.2455, 90.1234] --> DecimalType(26,4)
Now, when you try to perform a merge operation of
delta_table
with
new_data
, you may encounter the error
Failed to merge decimal types with incompatible scale
since their scales (number of digits on right side of decimal point) are 8 and 4 respectively. So the workaround would be converting all values to single decimal type. e.g: converting
score
column to `DecimalType(30,8)`:
Copy code
df.withColumn("score",col("score").cast(DecimalType(30,8)))
And then retry the merge, which should work.
šŸ‘ 2
t

Trey Yi

03/29/2023, 3:48 PM
Hey Omkar, thank you so much for the prompt reply. This is amazing. Can I ask you some f/u questions? • there's an existing delta data with
score
column with decimal (17,4) • I created a delta table called `test_table`` with schema that contains
score
column with decimal (17.8) • now I have a new delta data with
score
column with decimal(17,8) • How could I merge the two dataset together and save to the
test_table
?
o

Omkar

03/29/2023, 3:55 PM
Hey @Trey Yi, it'll be a good idea to convert the
score
column from
test_table
to DecimalType(17, 4) using Spark's
cast()
function. This will avoid any decimal type conflicts with your existing delta data when you perform the
merge
operation. Regarding how to merge the two delta tables, you can refer to this doc: https://docs.delta.io/latest/delta-update.html#language-python
t

Trey Yi

03/29/2023, 3:58 PM
oh what if I firstly convert the existing delta data to Decimal(17,8) and then merge them together before writing? cuz the final schema I need is that contains
score
column with Decimal(17,8)
o

Omkar

03/29/2023, 4:01 PM
Your thinking is absolutely correct, but there's a slight catch there -
DecimalType(17,8)
holds lesser digits (17-8=9) on left side of the number as compared to
DecimalType(17,4)
(17-4=13), so there's a chance that Spark might throw a casting error.
šŸ‘ 1
t

Trey Yi

03/29/2023, 4:03 PM
I see... hmm.. I don't see some workarounds in this case that I need Decimal (17,8) at the end..
o

Omkar

03/29/2023, 4:08 PM
You can check one more thing, try converting both your existing delta table as well as your
test_table
to
DecimalType(38,8)
and then merge them. Since
DecimalType(38,8)
will be able to store larger number of digits on both left (30) as well as right (8) side of the decimal, it won't throw any casting errors and you'll also get the data in a single datatype. Try it out maybe!
šŸ’Æ 1
šŸ™Œ 1
t

Trey Yi

03/29/2023, 6:22 PM
Thanks Omkar! I definitely try that out:) • update: one thing I need to figure out is that the
score
becomes null when chainging Decimal(17,4) to Decimal(17,8). I should find out something else.
šŸ‘šŸ¼ 1
6 Views