gitzwz commented on issue #1479: URL: https://github.com/apache/iceberg-python/issues/1479#issuecomment-2567363410
The most time-consuming process is this : ```Python for manifest_entry in chain( *executor.map( lambda args: _open_manifest(*args), [ ( self.io, manifest, partition_evaluators[manifest.partition_spec_id], metrics_evaluator, ) for manifest in manifests if self._check_sequence_number(min_sequence_number, manifest) ], ) ): ... ``` For instance, consider a scenario with 6 manifest files, each containing 7,000 entries. With **max-workers=32**, the code spawns 6 threads, each completing after approximately 30 seconds concurrently. In contrast, with **max-workers=1**, the code processes the manifest files sequentially, yet still finishes in roughly 30 seconds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org