qqchang2nd commented on issue #12251:
URL: https://github.com/apache/iceberg/issues/12251#issuecomment-2662259040

   Let me share our use case at Sensors Data. We use HadoopTables to manage 
Iceberg tables with Impala as our query engine. 
   
   In one of our customer environments, we encountered a query that only 
accessed one day's worth of data but took about 12 seconds total, with 6 
seconds spent just on plan analysis. Through Arthas tracing, we identified the 
bottleneck was in manifest loading. After enabling manifest caching, the plan 
analysis time was reduced to under 200ms. For these types of queries, manifest 
caching can improve overall query performance by up to 2x.
   
   At Sensors Data, we currently have over 100 customers using the HadoopTables 
management approach, and this number is expected to grow to thousands in the 
future. We don't have plans to migrate to other catalog implementations like 
HadoopCatalog.
   
   The performance improvement from manifest caching is significant for our use 
case because:
   1. We have long-running Impala daemons that repeatedly access these tables
   2. The manifest files remain relatively stable during query execution
   3. The memory overhead of caching is acceptable given the performance 
benefits
   
   Would you be interested in more detailed performance data or our specific 
usage patterns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to