Re: [I] Add properties support for HadoopTables.load() [iceberg]

via GitHub Fri, 14 Feb 2025 02:46:56 -0800


qqchang2nd commented on issue #12251:
URL: https://github.com/apache/iceberg/issues/12251#issuecomment-2658960764


   You're right that HadoopTables isn't a catalog - it's a lower-level 
implementation for managing Iceberg tables directly on HDFS without a catalog.
   
   Let me explain our use case:
   We have our own metadata management platform internally, and during our 
initial evaluation of Iceberg, we chose HadoopTables for table creation and 
management without fully investigating the differences between HadoopTables and 
catalogs.
   After running in production for a while, we noticed that some customer 
environments were experiencing slow performance during sql-analyze operations. 
Investigation revealed this was because tables created via HadoopTables don't 
support manifest caching. 
   
   To address this, I modified the Iceberg source code to allow 
HadoopTables.load() to accept properties, enabling manifest caching support 
which significantly improves sql-analyze performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add properties support for HadoopTables.load() [iceberg]

Reply via email to