https://delta.io logo
i

Ian

01/20/2023, 6:37 AM
Hi Team, I am using the DeltaTableBuilder api to create delta table. Is succeeds with exit code 0 but I cant see the delta table so does this mean the delta_table is present within the sparksession and when written to the delta table it shows up? So this is the normal behavior?
Copy code
DeltaTable.create(spark).tableName(delta_table_name).addColumns(data_schema)\
                .partitionedBy(["partition_1", "partition_2", "partition_3"]).location(delta_table_location).execute()
j

Jon Stockham

01/20/2023, 9:33 AM
Not Team, but when I use the DeltaTableBuilder, it initialises an empty Delta table in the physical location with commit 0 in the _delta_log folder. So no I don't believe yours is behaving normally. Only difference between mine and yours is that I don't specify a .
tableName()
, only
.location()
.
i

Ian

01/20/2023, 9:37 AM
So then that’s like you are giving the location a path to a directory Will you be able to paste a sample code. Thanks !
j

Jon Stockham

01/20/2023, 9:42 AM
Copy code
(DeltaTable.createOrReplace(spark)
    .addColumns(schema)
    .location("<s3://mybucket/path/to/table>")
    .execute())
i

Ian

01/20/2023, 9:44 AM
hmm this way it seems to work for me too I tried creating using the
.tableName
it does create a table but does not show it physically in the location until i write to the table . I tried spark.sql(
select * from table_name
on the table created it does give me the schema
j

Jon Stockham

01/20/2023, 9:48 AM
when you write to it, do you use the path or the table name?
i

Ian

01/20/2023, 9:48 AM
table name
no actually i am giving the path to delta table
j

Jon Stockham

01/20/2023, 9:51 AM
OK that's what I guessed. I think DeltaTableBuilder is just ignoring the
.location()
option when
.tableName()
is provided and just using the default location.
i

Ian

01/20/2023, 9:53 AM
I see so
.location()
is the way to go ?
j

Jon Stockham

01/20/2023, 10:23 AM
yeah if that works for you. As to why it's behaving like that, I'm not sure because according to this, it should throw an exception..
Copy code
if (DeltaTableUtils.isValidPath(tableId) && location.nonEmpty
    && tableId.table != location.get) {
    throw DeltaErrors.analysisException(    s"Creating path-based Delta table with a different location isn't supported. "
    + s"Identifier: $identifier, Location: ${location.get}")
    }
i

Ian

01/20/2023, 10:24 AM
I tried it just now
It does thrown an exception if i do not mention
.tableName()
With
.tableName()
tho the delta_table is present in memory it looks like or I’m not sure how it’s handling it but shows up physically at the location once i write data to the table.
j

Jon Stockham

01/20/2023, 10:29 AM
Yes the write will create the table at path provided in the write command if it doesn't exist already.
i

Ian

01/20/2023, 10:32 AM
That’s what i thought. So create API is doing nothing but only enforcing a schema
j

Jon Stockham

01/20/2023, 10:33 AM
you're also defining the partitioning. I find it useful for this because you can set up your partitioning and then you don't have to worry about it in your data write code.
i

Ian

01/20/2023, 10:35 AM
Yes we can look at it that way. Also we will be forced to use the partitioning mentioned while creating the table so there goes flexibility.
j

Jon Stockham

01/20/2023, 10:37 AM
with both methods, you can't change the partition keys without rewriting the table AFAIK
i

Ian

01/20/2023, 10:38 AM
Yes
Thanks for your time Jon 😉
r

Ryan Zhu

01/20/2023, 4:42 PM
Did you set the required configurations? See https://docs.delta.io/latest/quick-start.html#pyspark-shell
the behavior you mentioned here sounds like missing the configurations and then the table will be created by Spark rather than Delta.
i

Ian

01/23/2023, 6:11 AM
Yes I created a separate environment with the required configurations just to be sure
@Ryan Zhu
2 Views