https://delta.io logo
l

Lucas Zago

04/19/2023, 12:17 AM
Hi guys, Is there a way to use subquery in pyspark ?
(select sum(case faturado.shkzg
when 'S' then faturado.menge
else - faturado.menge
end)
from sapsr3.ekbe faturado where faturado.ebeln = b.ebeln and b.ebelp = faturado.ebelp and faturado.vgabe = '1' -- entradas
) as qtd_faturada
Did not find any useful resource If someone have some tip i will appreciate, thanks
j

Jim Hibbard

04/19/2023, 12:25 AM
Hi Lucas, I think this doc will help you! Adding SQL docs here.
🙏 1
Sorry Lucas, read your message too quickly (fumes at the end of the day) and just linked to case statements. Here's a subquery example notebook, ping me if this isn't what you're looking for: notebook link
l

Lucas Zago

04/19/2023, 12:31 AM
I was trying to work as a custom function, not really sure id it works:
def qtd_faturada(col1,col2):
return (sum(when(col1=="S",col2)
.otherwise(-(col2))))
j

Jim Hibbard

04/19/2023, 12:33 AM
Gotcha, and how are you using the function w/ your DataFrame?
l

Lucas Zago

04/19/2023, 12:35 AM
def with_qtd_faturada(df):
return df.withColumn("QTD_FATURADA",qtd_faturada(col("SHKZG"),col("MENGE")))
Trying to work with functions as much as possible
j

Jim Hibbard

04/19/2023, 1:23 AM
Copy code
import pandas as pd
from pyspark.sql import functions as f


df = spark.createDataFrame(pd.DataFrame([
  {'SHKZG': 'S', 'MENGE': 1},
  {'SHKZG': 'S', 'MENGE': 1},
  {'SHKZG': 'N', 'MENGE': 1},
  {'SHKZG': 'N', 'MENGE': 1},
]))

def with_qtd_faturada(df, col1, col2):
  # create sum of column
  df_with_sum = df.select(f.sum(col1).alias('QTD_FATURADA'))

  # join with original df and flip sign based on col2's value
  return df.join(df_with_sum).withColumn(
    'QTD_FATURADA', f.when(col2=='S', f.col('QTD_FATURADA')).otherwise(-f.col('QTD_FATURADA'))
  )

with_qtd_faturada(df, f.col('MENGE'), f.col('SHKZG')).show()
l

Lucas Zago

04/19/2023, 1:48 AM
Thanks @Jim Hibbard
11 Views