HI I have 2 questions.
1. In my project, I need to bring data for MLOps with Delta Lake. I have JSON-formatted event data and video data from blob storage.
To manage the JSON data, I'm considering storing each event JSON data within a single cell of Parquet based on ID. Then, I can load and analyze it using Pandas when needed. I need updating the event JSON within the Parquet cell if there are any updates.
Please let me know your thoughts on my idea. Is it reasonable?
2. I want to know if there's any method to update parquet with delta-rs(without using Spark, for sure). So far, I have been using pandas to create and merge the data before overwriting for update. However, due to the large volume of event data being generated, it seems impractical to handle it that way.