andygrove commented on PR #3845: URL: https://github.com/apache/datafusion-comet/pull/3845#issuecomment-4172527414
> * written blocks are not ordered by partition, am i correct (perhaps documentation about format of data file and index file could be added) The final file contains data ordered by partition. I will improve the docs. > * at the moment each written block will have schema definition included, would it be possible to have a "specialised" stream writer which does not write schema (as it is same for all blocks) ? It should be possible to move to the Arrow IPC Stream approach where schema is written once per partition. I have experimented with this in the past but it is quite a big change. > * can spill write to result file instead of temporary file ? Yes, if there is enough memory. We only spill to the temp files if the memory pool rejects a request to try_grow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
