sabari dass

02/14/2023, 6:08 PM
Hi All, Is there any way to do ‘dos2unix’ cmd in Databricks notebook. My usecase is I have a csv file in ADLS location but with some junk chars so I need to remove the junk chars while reading the file using pyspark in my databricks notebook. I can handle this using dos2unix cmd running manually but this suppose to be automate. Can anyone plz help on this? Thanks!

Dominique Brezinski

02/15/2023, 9:26 PM
You can use %sh to call shell commands, though dos2unix is not installed on the dbr images. You could use sed or awk to do the same thing.
👍 1

Kees Duvekot

02/16/2023, 10:04 PM
Or just use Python do achieve the same results:
👍 1
But dos2unix appears to only replace line endings
But if that's the only thing that you need to do to read the file .... You can set the line ending definition in the csv reader: lineSep See:
👍 1