https://delta.io logo
k

Kenny Ma

02/06/2023, 10:34 PM
@Krzysztof Chmielewski Do you have any performance and load benchmark that you can share with us for the Flink Delta connector? Also do you have docs that I can read more about Delta connector during job submission? I saw in the Flink client log that it is trying to read in a bunch of rows and I am trying to understand why it is doing that in the client.
Copy code
2023-02-06 14:28:14,286 INFO  io.delta.standalone.internal.DeltaLogImpl                    [] - Loading version 34082 starting from checkpoint 34080.
2023-02-06 14:28:14,305 INFO  io.delta.standalone.internal.SnapshotImpl                    [] - [tableId=6ee75af7-8e09-4d40-b92a-2cb10a844f20] Created snapshot io.delta.standalone.internal.SnapshotImpl@4202bfe8
2023-02-06 14:28:15,756 INFO  org.apache.hadoop.fs.s3a.S3AInputStream                      [] - Switching to Random IO seek policy
2023-02-06 14:28:16,117 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - RecordReader initialized will read a total of 3094160 records.
2023-02-06 14:28:16,117 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - at row 0. reading next block
2023-02-06 14:28:24,143 INFO  org.apache.hadoop.io.compress.CodecPool                      [] - Got brand-new decompressor [.snappy]
2023-02-06 14:28:24,210 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - block read in memory in 8093 ms. row count = 639939
2023-02-06 14:28:28,277 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - Assembled and processed 639939 records from 38 columns in 3861 ms: 165.74437 rec/ms, 6298.286 cell/ms
2023-02-06 14:28:28,278 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - time spent so far 67% reading (8093 ms) and 32% processing (3861 ms)
2023-02-06 14:28:28,278 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - at row 639939. reading next block
2023-02-06 14:28:36,215 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - block read in memory in 7935 ms. row count = 629616
2023-02-06 14:28:39,398 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - Assembled and processed 1269555 records from 38 columns in 7018 ms: 180.89983 rec/ms, 6874.1934 cell/ms
2023-02-06 14:28:39,399 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - time spent so far 69% reading (16028 ms) and 30% processing (7018 ms)
2023-02-06 14:28:39,400 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - at row 1269555. reading next block
2023-02-06 14:28:47,303 INFO  shadedelta.org.apache.parquet.hadoop.InternalParquetRecordReader [] - block read in memory in 7902 ms. row count = 638410
👀 1