andygrove commented on PR #3845:
URL: 
https://github.com/apache/datafusion-comet/pull/3845#issuecomment-4172527414

   > * written blocks are not ordered by partition, am i correct (perhaps 
documentation about format of data file and index file could be added)
   
   The final file contains data ordered by partition. I will improve the docs.
   
   > * at the moment each written block will have schema definition included, 
would it be possible to have a "specialised" stream writer which does not write 
schema (as it is same for all blocks) ?
   
   It should be possible to move to the Arrow IPC Stream approach where schema 
is written once per partition. I have experimented with this in the past but it 
is quite a big change. 
   
   > * can spill write to result file instead of temporary file ?
   
   Yes, if there is enough memory. We only spill to the temp files if the 
memory pool rejects a request to try_grow.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to