subject:"\[PR\] Flink\: Custom partitioner for bucket partitions \[iceberg\]"

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2024-09-13 Thread via GitHub

stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2350416841 @binshuohu Currently, there is no plan to reapply this change to the main branch. We have a more general range distribution available now (guided by statistics collection): https://ice

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2024-09-12 Thread via GitHub

binshuohu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2347381724 @stevenzwu Is there any plan to reapply this change to the main branch? Has there been any follow up since https://github.com/apache/iceberg/pull/8848 ? -- This is an automated messag

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-12-18 Thread via GitHub

stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1862079413 @bendevera here is our presentation: https://www.youtube.com/watch?v=GJplmOO7ULA&t=18s. here is the design doc: https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIj

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-12-18 Thread via GitHub

bendevera commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861920740 @stevenzwu thank you for the quick response! Okay, will run some `BucketPartitioner` tests for our use case by copying code manually. Smart shuffling sounds interesting and would

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-12-18 Thread via GitHub

stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861850713 It is reverted because there are users depending on the previous behavior of keyBy all partition columns. https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778 We w

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-12-18 Thread via GitHub

bendevera commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861295520 @stevenzwu I see defaulting to `BucketPartitioner` was reverted here: https://github.com/apache/iceberg/pull/8848 We've found performance issues with `DistributionMode.HASH`, and

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-16 Thread via GitHub

chenwyi2 commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1765530967 In normal conditition, only the data of current minute will be written. However, if the data is delayed, for example, at 11:50, the data has not been written until 11:55, then at 11:56

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-15 Thread via GitHub

stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1763706777 is the partition time an event time or ingestion/processing time? or asking in a different way, how many active minutes do the Flink writer job process for every commit cycle? I

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-15 Thread via GitHub

chenwyi2 commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1763604926 yes, I am creating very fine grained partitions, because i want to query and comput some business metrics between minutes ss fast as possible. As for bucket number, i use a fomula QPS *

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-13 Thread via GitHub

stevenzwu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1762592732 @chenwyi2 Is your point that we shouldn't only consider bucketing column (like did in this PR). you just want a plain keyBy in this case? that would be a fair point. Do you get balanced

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-13 Thread via GitHub

chenwyi2 commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778 Hi @stevenzwu @kengtin this PR can be create too many small files when parition with dt,hout,minute and bucekt(id), suppose paralisim is 120 and bucke number is 8, then 15 writes can wri

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

11 matches

Site Navigation

Mail list logo

Footer information