Tejaswini Gorthi
02/28/2023, 2:38 PMtimestamp-micros
to sync timestamp columns, but looks like the library is unable to read the data. I am getting the below exception
scala.MatchError: LongValue(1677595007976000) (of class shadedelta.com.github.mjakubowski84.parquet4s.LongValue)
But i am able to read the data fine using Azure spark pool and also read the parquet files fine. Is there a way to read timestamp-micros
using the library?Matt Richards
03/01/2023, 3:53 PMDouglas Pires Martins
03/01/2023, 6:23 PMKetan Khairnar
03/02/2023, 9:12 AMChange Data Feed (CDF) includes the row data along with metadata indicating whether the specified row was inserted, deleted, or updated.This effectively doubles storage for tables with CDF enabled. is that correct?
Satyam Singh
03/02/2023, 2:07 PMJoĂŁo Pinto
03/03/2023, 3:51 PMChristina
03/03/2023, 6:47 PMLennart Skogmo
03/04/2023, 9:35 AMRoee Nevo
03/05/2023, 1:16 PMDROP COLUMN is only supported with v2 tables.
but I can’t found anything about that… can someone please explain to me what is going on and how to handle it?
Thanks!Phúc Võ Hồng
03/06/2023, 2:19 AMLuis F Takahashi
03/06/2023, 10:57 AMfrom delta.tables import *
ModuleNotFoundError: No module named 'delta'
Here is how I added a new step:Artsiom Yudovin
03/07/2023, 12:55 PMKashyap Bhatt
03/07/2023, 5:10 PMbetween
condition (instead of =
condition)? Or use something more optimal than between
?Roberto
03/07/2023, 7:21 PMfrederic lebeau
03/08/2023, 7:29 AMLucas Zago
03/08/2023, 12:55 PM(df.write.format("delta")
.mode("overwrite")
.partitionBy("year_partition")
.option("overwriteSchema","true")
.saveAsTable(path))
I'm looking to convert a parquet into delta before persist, I do not know if it is a viable optionHarun
03/08/2023, 6:25 PMwrite_deltalake
It raises `deltalake.PyDeltaTableError: Schema error: Invalid data type for Delta Lake: Null`error if there is a column with all none values.
Is there a way to solve this? E.g. define the schema for table or define column type
import pandas as pd
from deltalake.writer import write_deltalake
data = [['tom', None], ['nick', None], ['juli', None]]
data = pd.DataFrame(data, columns=['Name', 'City'])
write_deltalake("s3a://.....", data, mode='overwrite')
Vladimir Prus
03/08/2023, 7:01 PMimport io.delta.tables._
val deltaTable = DeltaTable.forPath(spark, "/tmp/delta/people-10m")
// Declare the predicate by using a SQL-formatted string.
deltaTable.delete("birthDate < '1955-01-01'")
But since Hive Metastore already has a location for a table, I don’t want to repeat myself, and would rather have
val deltaTable = DeltaTable.fromMetastore("mart.my_table")
is it possible?Lennart Skogmo
03/08/2023, 7:08 PMAnmol Jain
03/08/2023, 9:38 PMCannot have map type columns in DataFrame which calls set operations
To reproduce:
Select * from
(
SELECT struct('Spark', map(1,2)) as a
Union
SELECT struct('Spark', map(1,1)) as a
)
Please do share best practices on handling these casesOscar Cassetti
03/09/2023, 12:52 AMjava.lang.ClassCastException: class org.apache.spark.sql.catalyst.plans.logical.DeleteFromTable cannot be cast to class org.apache.spark.sql.delta.commands.DeleteCommand (org.apache.spark.sql.catalyst.plans.logical.DeleteFromTable and org.apache.spark.sql.delta.commands.DeleteCommand are in unnamed module of loader 'app')
The code that cause this exception is something along the lines of
df.write.format('delta').mode('overwrite').option("mergeSchema", false).option("overwriteSchema", false)\
.partitionBy("date").option("replaceWhere", "date BETWEEN '2023-01-01' AND '2023-01-02' AND vertical like '%xyz%' ").save("s3://")
running on Spark 3.3.1 Hadoop 3.4.4 and ``delta-core_2.12-2.1.1.jar` and ``delta-storage-2.1.1.jar`Godel Kurt
03/09/2023, 9:29 AMMohit Yadav
03/09/2023, 12:13 PMPredicate references non-partition column 'customerid'. Only the partition columns may be referenced: []
OPTIMIZE works for partitioned columns only.??....i am using unpartioned tableAkshay Ghiya
03/09/2023, 2:05 PMspark.sql("SELECT * from governor.governor_transactions VERSION AS OF 516")
Getting following error
mismatched input 'AS' expecting {<EOF>, ';'}(line 1, pos 53)
Using the following property
spark.sql.extensions='io.delta.sql.DeltaSparkSessionExtension'
spark.sql.catalog.spark_catalog='org.apache.spark.sql.delta.catalog.DeltaCatalog'
Please helpRitesh Malav
03/09/2023, 4:06 PMdatalake
of parquet files which we want to convert to deltalake
format.
Existing folder structure in our datalake is like this.
user_data
user_hash=1
date=2022-01-01
date=2022-01-02
user_hash=2
date=2022-01-01
date=2022-01-02
user_hash=3
date=2022-01-01
date=2022-01-02
user_hash=4
date=2022-01-01
date=2022-01-02
Can someone suggest how can I do this conversion faster ?
I have already tried following code snippet
data = spark.read.format("parquet").load("/data-pipeline")
data.write.format("delta").save("/tmp/delta/data-pipeline/")
but it gets rid of the folder structure and create the deltalake format in flat structureVibhor Gupta
03/09/2023, 5:36 PMAjex
03/10/2023, 8:04 AMAdishesh Kishore
03/10/2023, 9:23 AMAdishesh Kishore
03/10/2023, 9:24 AMMartin
03/10/2023, 2:36 PM