gitzwz commented on issue #1479:
URL: 
https://github.com/apache/iceberg-python/issues/1479#issuecomment-2567363410

    The most time-consuming process is this :
   ```Python
           for manifest_entry in chain(
               *executor.map(
                   lambda args: _open_manifest(*args),
                   [
                       (
                           self.io,
                           manifest,
                           partition_evaluators[manifest.partition_spec_id],
                           metrics_evaluator,
                       )
                       for manifest in manifests
                       if self._check_sequence_number(min_sequence_number, 
manifest)
                   ],
               )
           ):
   ...
   ```
   For instance, consider a scenario with 6 manifest files, each containing 
7,000 entries. With **max-workers=32**, the code spawns 6 threads, each 
completing after approximately 30 seconds concurrently. In contrast, with 
**max-workers=1**, the code processes the manifest files sequentially, yet 
still finishes in roughly 30 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to