Kashyap Bhatt
03/07/2023, 5:10 PMbetween
condition (instead of =
condition)? Or use something more optimal than between
?Matthew Powers
03/07/2023, 5:16 PMKashyap Bhatt
03/07/2023, 6:10 PMQuentin Ambard
03/07/2023, 9:47 PMDominique Brezinski
03/07/2023, 10:07 PMKashyap Bhatt
03/07/2023, 10:11 PMif you’re on databricks did you try DLT ?@Quentin Ambard, DLT would provide ways to populate facts and dimensions. I am talking about using/querying these tables. I already have many working ways to create/maintain my SCD tables. Besides the point, but DLT is too new to be used in our production system.
Dominique Brezinski
03/08/2023, 7:31 PMselect /*+ RANGE_JOIN(e, 365) */
e.name,
e.region,
year(s.sale_date) as sale_year,
SUM(s.sold_amt) as sale_amt
from sales s
left join employee_scd2 e
on e.name = s.name
and s.sale_date between e.start_date and e.end_date
group by e.name, e.region, year(s.sale_date);
e.start_date
and e.end_date
are DATE, bin size is days, so that says to make a bin for each year. It is well described at https://docs.databricks.com/optimizations/range-join.htmlKashyap Bhatt
03/08/2023, 8:30 PM/*+ RANGE_JOIN(e, 365) */
is the hint.
You could also probably run the aggregation over a window function on the range when you want to do something like the sum.Would appreciate an example of doing this using window function if you have the time. Thank you @Dominique Brezinski.