https://delta.io logo
r

Riccardo Delegà

07/24/2023, 7:42 AM
Hi! Is there any way of defining a table schema using StructType instead of a DDL string and defining a field that is a generated column? Right now the schema looks like this:
Copy code
StructType(
    [
        StructField(
            "id",
            StringType(),
            metadata={
                "comment": "Comment for this field"
            },
        )
    ],
    # other fields
)
I was hoping there is a way of adding a column to the
metadata
dict to define a certain
StructField
as a Delta Lake generated column. Thanks for your help!
t

Tom van Bussel

07/24/2023, 7:46 AM
DeltaTableBuilder.addColumns
allows passing in a
StructType
.
Alternatively, if you just want to be able to create a generated column then the
DeltaColumnBuilder
API can be used, which exposes
generatedAlwaysAs
and
comment
. That way you don’t have to manually create the metadata.
r

Riccardo Delegà

07/24/2023, 7:49 AM
I am feeding this to a Delta Live Table pipeline, so the schema has to be either StructType or a string
t

Tom van Bussel

07/24/2023, 7:50 AM
DeltaColumnBuilder
produces a
StructField
if you call
build
.
r

Riccardo Delegà

07/24/2023, 7:52 AM
I didn't know, I'll look into it! Are you certain it carries over generated column information?
t

Tom van Bussel

07/24/2023, 7:54 AM
Yes, setting metadata like the comment and the expression for the generated column is the entire goal of this API, so it would be weird if it wouldn’t…
r

Riccardo Delegà

07/24/2023, 7:59 AM
I'm checking and it looks like the
DeltaColumnBuilder
API isn't exposed to Python, so I would probably have to pass through py4j to do this... I was asking if it carries over because if calling
build
simply produces a
StructField
with some additional metadata I could try creating it directly without passing through delta (which isn't a dependency of the project right now, as everything in that department is handled by DLT)
Ok, I figured out how to solve this. The metadata dict key to use in this case is
delta.generationExpression
. It's documented here in the protocol definition. Thanks @Tom van Bussel for your help!