pantShrey commented on PR #21882:
URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4329695321
@alamb I opened this draft PR to get early feedback on the architecture.
1. The first point is around the sync read path. I introduced
`open_sync_reader` because `SortMergeJoin` currently has synchronous, blocking
code paths that directly open files using paths and `BufReader`, instead of
going through the spill abstractions. Converting this to fully async would
significantly increase the scope of this PR.
- Does it make sense to keep this escape hatch for now and handle making
these operators async in a follow-up PR?
2. The second point is regarding test failures. I have not modified the
original 64 MB limit in the tests because I wanted guidance here. Currently,
the `repartition` test in `mod.rs` is failing, and it seems related to spilling
not being triggered correctly, but I have not been able to fully identify the
root cause.
I might be missing something here, so would really appreciate your guidance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]