Dhruvil Shah
03/24/2023, 12:16 AMChandra
03/24/2023, 9:00 AMDhruvil Shah
03/26/2023, 1:24 PMdelta.logRetentionDuration
setting, you can use a combination of Glue Data Catalog settings and custom classifiers. Here are the steps to configure a Glue crawler to handle `delta.logRetentionDuration`:
1. Create a custom classifier to read Delta Lake tables. In the Glue Console, navigate to the "Classifiers" section and create a new classifier. For the classifier type, select "JSON". In the "JSON path" field, enter $[*].delta.logRetentionDuration
.
2. Add the custom classifier to your Glue crawler. In the "Crawlers" section of the Glue Console, edit your crawler and navigate to the "Classifiers" section. Add the custom classifier you created in step 1 to the crawler's list of classifiers.
3. Set up the Glue Data Catalog to handle delta.logRetentionDuration
. In the Glue Console, navigate to the "Databases" section and select the database containing your Delta Lake tables. Click on "Edit database" and add a new property with the key delta.logRetentionDuration
and the desired retention period in the format of n unit
(e.g. 7 days
, 1 week
, 168 hours
).
4. Run the Glue crawler to create or update the table metadata in the Glue Data Catalog. The crawler will read the Delta Lake table's metadata, including the delta.logRetentionDuration
setting, and store it in the Glue Data Catalog.
With these steps, your Glue crawler will be able to read and handle the delta.logRetentionDuration
setting for your Delta Lake tables. This information will be stored in the Glue Data Catalog, and you can use it to query and analyze your Delta Lake data with other AWS services such as Athena and QuickSight.