Anita A
03/10/2023, 4:49 PMDeltaTable.isDeltaTable
I get this error:
object is not callable
.
Thanks!Omer Ozsakarya
03/13/2023, 6:27 AMChristian Pfarr
03/13/2023, 9:54 PMAnalysisException: Table or view not found: default.delta_table;
doesnt matter if i use "default" as namespace or my own name...
(Yes the table data and deltalog is there in minio, so everything is fine from storage perspective and i can read the table by path)
Could you explain why i can only see my tables during a session and maybe what i could do to see my tables also from other sessions as well?Ovi
03/14/2023, 10:37 AMTarun Sharma
03/14/2023, 12:26 PMNermin Yehia
03/14/2023, 2:12 PMTarun Sharma
03/14/2023, 4:59 PMpravin suryawanshi
03/14/2023, 9:16 PMError in SQL statement: UnityCatalogServiceException: [RequestId= ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 's3://<path>' overlaps with managed storage
while running this command-
GENERATE symlink_format_manifest FOR TABLE delta.`<path-to-delta-table>`
is there any way available to generate manifest and share those into snowflake if delta tables under unity catalog?Omer Ozsakarya
03/15/2023, 12:42 AMFilippo Vecchiato
03/15/2023, 11:48 AMAlessandro Biagi
03/15/2023, 11:57 AMAkeel Ahamed
03/15/2023, 5:58 PMAjex
03/16/2023, 6:47 AMdef zorder(spark: SparkSession, opts: AppOptions): Unit = {
val path = opts.getString("path")
val partitionCondition = opts.getString("partition_condition")
val optimizeCols = opts.getString("optimize_cols").split(",").map(_.trim)
val table = DeltaTable.forPath(spark, path)
table.optimize()
.where(partitionCondition)
.executeZOrderBy(optimizeCols: _*)
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", false)
spark.sql(s"ALTER TABLE delta.`${path}` SET TBLPROPERTIES('delta.deletedFileRetentionDuration' = 'interval 1 minute')")
spark.sql(s"ALTER TABLE delta.`${path}` SET TBLPROPERTIES('delta.logRetentionDuration' = 'interval 1 day')")
Thread.sleep(90000) // sleep 1.5 minutes before vacuum
table.vacuum()
}
The problem is that the table I need to optimize has a lot of small files, around 400 files due to a batch job that writes every 15 minutes and then partitions by 3 (having 3 sources). When I try to optimize with z-order, even uses a lot of resources and I still get the following issue:
"Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 15 (run at ForkJoinTask.java:1402) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2 partition 0".Morgan
03/16/2023, 9:34 AMJoão Pinto
03/16/2023, 1:37 PMAkeel Ahamed
03/16/2023, 2:02 PMmode(“overwrite”)
it works fine.Priyal Chaudhari
03/16/2023, 3:03 PMSaurabh
03/17/2023, 3:50 AMMorgan
03/17/2023, 3:19 PMI use the docker image 'all-spark-notebook' that launch spark in local mode with PySpark==3.3.2 and delta-spark==2.2.0 and a minio docker image, all wrapped with docker-composeIs it expected behavior ?
nikhil raman
03/17/2023, 7:34 PMSatyam Singh
03/18/2023, 2:30 PMLennart Skogmo
03/19/2023, 8:43 PMSujit Pattnaik
03/20/2023, 6:23 AMJoydeep Banik Roy
03/20/2023, 8:33 AMPriyal Chaudhari
03/20/2023, 3:39 PMRoel Knitel
03/21/2023, 6:40 AMDhruvil Shah
03/21/2023, 11:16 AMDhruvil Shah
03/21/2023, 11:16 AMOmar
03/21/2023, 8:13 PMGodel Kurt
03/23/2023, 2:40 AM