Ryan Aston
09/04/2023, 5:29 PMWill Jones
09/04/2023, 6:20 PM1. get the last checkpointDo you want to just know what's the most recent version with a checkpoint? Or you want to know the content of the file?
2. get the total size of the table at the current loaded versionYou can compute this by summing the
size_bytes
column in `add_actions_table()`:
https://docs.rs/deltalake/0.14.0/deltalake/table_state/struct.DeltaTableState.html#method.add_actions_table3. get the size of the parquet file created from the json writer’s flush and commitI don't think that's part of the API, but seems like a reasonable request if you want to make a Github issue.
Ryan Aston
09/04/2023, 6:35 PMbuffer_len
function to return the current size of the parquet buffers in bytes, but the value seems to always be returned as zero. I am successfully writing data, so I’m not sure why it never shows any bytes. I’m printing the value of buffer_len after a .write()
but before the `.flush_and_commit()`:
let mut wrtr: deltalake::writer::JsonWriter =
deltalake::writer::json::JsonWriter::for_table(&table).unwrap();
wrtr.write(events_json.clone()).await.unwrap();
println!("size of buffers - {}", wrtr.buffer_len()); // always prints '0'
wrtr.flush_and_commit(&mut table).await.unwrap();
Will Jones
09/04/2023, 8:35 PMbuffered_record_batch_count
should be non-zero for you, is it?Ryan Aston
09/05/2023, 2:17 AMbuffer_len()
returns 0 and buffered_record_batch_count()
returns 1. The description for write()
is “Writes the given values to internal parquet buffers for each represented partition” and the description for buffer_len()
is “Returns the current byte length of the in memory buffer. This may be used by the caller to decide when to finalize the file write” which implies it would be the size of the data waiting to be flushed to disk, but that doesn’t seem to be the case. For now I’m working around it by calling the helper function I created for question 2 just before I flush and just after and finding the difference.
For 1 I’m working around it by basically creating my own helper function using inspiration from these two functions in the delta-rs source:
• https://github.com/delta-io/delta-rs/blob/787c13a63efa9ada96d303c10c093424215aaa80/rust/src/action/mod.rs#L910
• https://github.com/delta-io/delta-rs/blob/787c13a63efa9ada96d303c10c093424215aaa80/rust/tests/checkpoint_writer.rs#L59Will Jones
09/07/2023, 9:21 PMbut is there a way to get only the add actions that occurred as part of a particular version commit?No I don't think there is
Also, should I be scoping it to active adds only (would not doing so not take into account deletes)?The table only includes add actions that are active (not any that are removed)
Ryan Aston
09/07/2023, 10:25 PMWill Jones
09/07/2023, 10:54 PM