https://delta.io logo
s

Satyam Singh

03/02/2023, 2:07 PM
Is it adviceable to write databricks notebook's pyspark code in Object Oriented way (using python classes and methods) ? Embedding all pyspark code in python class & methods. Will it have performace issues same as python udfs ? Any input is appreciated. Thanks.
j

JosephK (exDatabricks)

03/02/2023, 2:19 PM
That’s a great question. If all the code inside a function is spark code, you won’t run into any trouble. so for example:
Copy code
def function(input):
spark.read.path(input).filter().groupby().count().write()
won’t give you any problems. I always suggest that you write in any way you like as long as everyone in the organization follows the same style. You don’t want 4 different ways of writing code. Coming up with a style guide can be very useful.
👍 1
s

Satyam Singh

03/02/2023, 2:30 PM
Thanks for your reply. Completely agree. From where this question is coming is the idea of modularizing notebook's code - creating functions for reusable codes
j

JosephK (exDatabricks)

03/02/2023, 2:32 PM
Yes, it might be useful if you have 15 transformations for dates that you always apply. You can do a %run at the top of subsequent notebooks to include those resuable fucntions
👍 1
2 Views