https://delta.io logo
d

Dhruvil Shah

03/24/2023, 12:16 AM
Does any one create Glue Custom classifier for log relation and also automatics values to run Vaccum?
👀 1
c

Chandra

03/24/2023, 9:00 AM
Custom classifier to crawl logs? Can you elaborate on vacuum
d

Dhruvil Shah

03/26/2023, 1:24 PM
Here is my requirements I would like to set Log and Data Relations policy for delta. Here what i found from ChatGPT To set up a Glue crawler to read Delta Lake tables and handle the
delta.logRetentionDuration
setting, you can use a combination of Glue Data Catalog settings and custom classifiers. Here are the steps to configure a Glue crawler to handle `delta.logRetentionDuration`: 1. Create a custom classifier to read Delta Lake tables. In the Glue Console, navigate to the "Classifiers" section and create a new classifier. For the classifier type, select "JSON". In the "JSON path" field, enter
$[*].delta.logRetentionDuration
. 2. Add the custom classifier to your Glue crawler. In the "Crawlers" section of the Glue Console, edit your crawler and navigate to the "Classifiers" section. Add the custom classifier you created in step 1 to the crawler's list of classifiers. 3. Set up the Glue Data Catalog to handle
delta.logRetentionDuration
. In the Glue Console, navigate to the "Databases" section and select the database containing your Delta Lake tables. Click on "Edit database" and add a new property with the key
delta.logRetentionDuration
and the desired retention period in the format of
n unit
(e.g.
7 days
,
1 week
,
168 hours
). 4. Run the Glue crawler to create or update the table metadata in the Glue Data Catalog. The crawler will read the Delta Lake table's metadata, including the
delta.logRetentionDuration
setting, and store it in the Glue Data Catalog. With these steps, your Glue crawler will be able to read and handle the
delta.logRetentionDuration
setting for your Delta Lake tables. This information will be stored in the Glue Data Catalog, and you can use it to query and analyze your Delta Lake data with other AWS services such as Athena and QuickSight.
Then I want automatic job to run Vaccum to delete it