Matthew Powers
07/23/2023, 2:38 PMfrom datafusion import SessionContext
from deltalake import DeltaTable
table = DeltaTable(f"{pathlib.Path.home()}/data/delta/G1_1e8_1e2_0_0")
ctx = SessionContext()
ctx.create_dataframe(table.to_pyarrow_dataset())
Here’s the error I got: TypeError: argument 'partitions': 'FileSystemDataset' object cannot be converted to 'PyList'
More info in the thread…batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])
Is there any way to go from <http://table.to|table.to>_pyarrow_dataset()
=> [[batch]]
?Jordan Fox
07/23/2023, 4:59 PMregister_table(name,table)
?
from datafusion import SessionContext
from deltalake import DeltaTable
table = DeltaTable(f"{pathlib.Path.home()}/data/delta/G1_1e8_1e2_0_0")
ctx = SessionContext()
ctx.register_table('my_table', table)
from datafusion import SessionContext
from deltalake import DeltaTable
table = DeltaTable(f"{pathlib.Path.home()}/data/delta/G1_1e8_1e2_0_0")
ctx = SessionContext()
ctx.create_dataframe(table.to_pyarrow_table().to_batches())
from datafusion import SessionContext
from deltalake import DeltaTable
table = DeltaTable(f"{pathlib.Path.home()}/data/delta/G1_1e8_1e2_0_0")
ctx = SessionContext()
ctx.create_dataframe(table.to_pyarrow_dataset().to_batches())
Matthew Powers
07/23/2023, 7:04 PMctx.register_dataset("my_dataset", table.to_pyarrow_dataset())
ctx.sql("select * from my_dataset where v2 > 5")
This didn’t work: ctx.register_table("my_table", <http://table.to|table.to>_pyarrow_table())
This didn’t work either: ctx.create_dataframe(<http://table.to|table.to>_pyarrow_table().to_batches())
Here’s the notebook if you’d like to take a look: https://github.com/delta-io/delta-examples/blob/master/notebooks/python-deltalake/datafusion-read-delta.ipynb
But one solution is working, so this is great!Jordan Fox
07/23/2023, 7:12 PM[table.to_pyarrow_table().to_batches()]
Am mobile, I'll confirm on your notebook.Matthew Powers
07/24/2023, 12:50 PM[<http://table.to|table.to>_pyarrow_table().to_batches()]
. Thank you!!