Al Nick Ortiz05/23/2023, 1:29 PM
Al Nick Ortiz05/23/2023, 1:30 PM
Al Nick Ortiz05/23/2023, 1:31 PM
liab05/23/2023, 5:27 PM
GapyNi05/23/2023, 5:45 PM
, where we do
in the next step to merge them into silver layer. Now the question is as i looked into the metrics it seems that the process read all the
(from bronze layer) does not exactly match with the
from silver layer, when performing
merge commando. Do you know what could be the reason? Thanks and regards, GapyNi
Daniel Bariudin05/24/2023, 8:17 AM
For given root path:
if the sub path
has deltalog directory, the function return True. The Documentation the provided is: "classmethod *`isDeltaTable`*(sparkSession: pyspark.sql.session.SparkSession, identifier: str) → bool Check if the provided identifier string, in this case a file path, is the root of a Delta table using the given SparkSession." _(_https://docs.delta.io/latest/api/python/index.html) So the question is : if the root the I provided path don't contains Delta table , but the sub-root path has delta table and 'isDeltaTable' returns True - is this not a bug? or am I missing something?
Simon Thelin05/24/2023, 9:30 AM
Soukaina05/24/2023, 11:10 AM
ERIC HAMMEL05/24/2023, 12:41 PM
Ainesh Pandey05/24/2023, 4:12 PM
column) as a Delta table in Databricks?
Suraj Malthumkar05/25/2023, 7:22 AM
Also when am running this(above code) it Does not read the parquet files. As the logs and checkpoint files points to the parquet files registered. How do i read it using delta connector ? 2nd way: Reading Parquet data (single-JVM)
System.out.println("Delta Read"); spark.read().format("delta") .load("<s3a://delta-laketest/spark_table>")
Code above was able to Read parquet data files after committed to delta table via standalone library. Is this correct way to Read parquet data files using delta connector when the data files committed to delta table via standalone library ? I would like to understand whats the difference in both ways? I am pretty new to the delta, please guide me through this. Thank you for your help! :)
DeltaLog log = DeltaLog.forTable(conf, "<s3a://delta-laketest/my_table>"); CloseableIterator<RowRecord> rowItr = log.snapshot().open(); //update().open();
bharat chaudhury05/25/2023, 8:17 AM
instead. Collecting delta-spark==2.3.0 Cache entry deserialization failed, entry ignored Using cached https://files.pythonhosted.org/packages/34/9e/c06f3b701de4746defc240fe7a2cc973f7bbfaa8fa17d57e045868c16925/delta_spark-2.3.0-py3-none-any.whl Collecting pyspark<3.4.0,>=3.3.0 (from delta-spark==2.3.0) Cache entry deserialization failed, entry ignored Could not find a version that satisfies the requirement pyspark<3.4.0,>=3.3.0 (from delta-spark==2.3.0) (from versions: 2.1.2, 2.1.3, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4) No matching distribution found for pyspark<3.4.0,>=3.3.0 (from delta-spark==2.3.0) According to the docs .
pip3 install --user
Alber Tadrous05/25/2023, 4:17 PM
Albert Wong05/25/2023, 6:46 PM
Vigneshraja Palaniraj05/25/2023, 8:56 PM
Kashyap Bhatt05/25/2023, 9:03 PM
? Our usecase is we're running some e2e tests (written using
) in a Jenkins pipeline, that create some delta tables and perform merges etc. The Jenkins agent is docker image with
as base (full Dockerfile attached). Problem is that snappy doesn't like the environment and we get following error (more complete stack trace attached):
I've tried many workarounds (to make snappy use some other writable-tmp-folder), but none work. So I'm hoping to create a docker image where I can write a simple python code that creates a delta table and writes to it. Thank you!
py4j.protocol.Py4JJavaError: An error occurred while calling o162.saveAsTable. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:638) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 13) (k-ci-fgao-2ffxcdata-10-2franking-report-by-session-type-1-wx95c executor driver): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:642) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:275) at org.xerial.snappy.Snappy.compress(Snappy.java:156) at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:78) ... 19 more
Suraj Malthumkar05/26/2023, 1:52 AM
Adishesh Kishore05/26/2023, 6:12 AM
Is there some way for me to specify the minreader and minwriter versions?
No delta log found for the Delta table at
Bryce Bartmann05/26/2023, 6:47 AM
Roshan Punnoose05/26/2023, 10:56 AM
Sagar Singh Rawal(TECH-BLR)05/26/2023, 2:23 PM
Rosmery Valle Ortiz05/26/2023, 2:37 PM
Afonso de Paula Feliciano05/26/2023, 6:30 PM
Is there a way to write delta using null type? I did some searches but I didn't find anything until the moment
Delta doesn't accept NullTypes in the schema for streaming writes.
Tuan Nguyen05/27/2023, 12:12 AM
. Has anyone had this problem before? Running `SHOW COLUMNS IN table_name`in Athena shows a list of column names as expected.
Ahmad Dorri05/27/2023, 4:39 PM
Suraj Malthumkar05/30/2023, 1:37 AM
Hana BOUACILA05/30/2023, 1:58 PM
ritwik singh05/31/2023, 3:11 AM
Divyansh Jain05/31/2023, 5:40 AM
Suraj Malthumkar05/31/2023, 8:29 AM