findepi commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2307633354
> for instance with Hive that used ORC format and with Impala that wrote Parquet files. that is likely addressed by preferred file format being a table-level configuration? > Impala is more performant with Parquet but there are huge tables in production written in ORC hence the motivation to move from one format to another but they don't want to do it in one step due to the size of the table. that absolutely makes sense! if i have historical data, i can change its default format (eg from ORC to Parquet) but have no desire to rewrite old data but then -- why would anyone care, actually? most queries operate on freshmost data, so they will see Parquet files. some queries operate on large time windows and will see ORC and Parquet files. It is unclear what problem would per-format metrics solve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org