Re: [PR] fix: fix elapsed_compute metric in ParquetSink to report encoding time only [datafusion]

via GitHub Mon, 27 Apr 2026 05:28:56 -0700


fred1268 commented on code in PR #21825:
URL: https://github.com/apache/datafusion/pull/21825#discussion_r3147266706



##########
datafusion/datasource-parquet/src/file_format.rs:
##########
@@ -1599,6 +1650,7 @@ fn spawn_rg_join_and_finalize_task(
 /// on the next row group in parallel. So, parquet serialization is 
parallelized
 /// across both columns and row_groups, with a theoretical max number of 
parallel tasks
 /// given by n_columns * num_row_groups.
+#[expect(clippy::too_many_arguments)]
 fn spawn_parquet_parallel_serialization_task(

Review Comment:
   I didn't like this, but wanted to have some community thinking about this. I 
implemented a slightly different version of your proposal, but I did not 
include `encoding_time` inside of it since what's inside are more _input 
parameters_ and `encoding_time` is more of an output parameter. So I left it 
outside the struct. Let me know if this is what you had in mind.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: fix elapsed_compute metric in ParquetSink to report encoding time only [datafusion]

Reply via email to