Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2025-01-29 Thread via GitHub
github-actions[bot] commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2623222418 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2025-01-29 Thread via GitHub
github-actions[bot] closed pull request #5837: API,Core: Introduce metrics for data files by file format URL: https://github.com/apache/iceberg/pull/5837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2025-01-15 Thread via GitHub
github-actions[bot] commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2594191590 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-10-28 Thread via GitHub
Fokko commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2441220940 Hey @gaborkaszab sorry for not replying earlier, I was out on parental leave. > E.g. Streaming ingest into AVRO for faster writes and then compact into Parquet for faster reads.

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-10-28 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2441133791 @Fokko About the justification of this PR I recently found another use case that could use this: Streaming ingestion using a different file format than the compaction. E.g. Streaming

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-10-24 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2435424210 Hi @Fokko , @findepi , Is there anything I can do to make progress on this PR? The motivation is clear, there is a need for this in a query engine, I think I could also address the

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-09-20 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2363706681 Thanks for taking a look @findepi , @Fokko! So far I don't see any reason why this can't be merged. Not as it is now but probably reverting to the initial version that didn't ha

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-29 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2317080098 Hey @Fokko, Thanks for your response and thanks for the explanation! I might miss some pieces of information here, but checked the snapshot summary in the metadata.jsons and

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-28 Thread via GitHub
Fokko commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2315744223 First of all, sorry for not jumping into this earlier. > 1) extra metrics never hurt This is unfortunately not true. The metadata JSON grows quite big in bytes very easily, and

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-27 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2312634994 > most queries operate on freshmost data, so they will see Parquet files In general this is true but we still see users ending up tables with mixed file formats and having queri

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-23 Thread via GitHub
findepi commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2307633354 > for instance with Hive that used ORC format and with Impala that wrote Parquet files. that is likely addressed by preferred file format being a table-level configuration?

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-22 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2304756917 Thanks for taking a look, @findepi ! I've seen users doing this. One of the motivation is that they gradually move away from one file format into another. What I've seen is that Imp

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-22 Thread via GitHub
findepi commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2304699180 > new metrics for the number of data files broken down by file format. how common is it to have tables with mixed file formats? -- This is an automated message from the Apache Git

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-21 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2301614609 Hey, I saw that stale label was added to this PR due to inactivity. I removed it since I still have the intention to merge this, however I find it pretty difficult to get someone w

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-08-18 Thread via GitHub
github-actions[bot] commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2295450317 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-07-09 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2217415364 Let me involve an even wider set of committers here since this has been open for a while now. Hopefully someone has some spare time to make this going again. Any reviews are appreciat

Re: [PR] API,Core: Introduce metrics for data files by file format [iceberg]

2024-06-13 Thread via GitHub
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2164746423 Hey @nastra , @rdblue , @danielcweeks , @jbonofre , It's been a while since I worked on this PR but it got to my radar again now. Would it be possible for any of you to take a look?