stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2350416841
@binshuohu Currently, there is no plan to reapply this change to the main
branch. We have a more general range distribution available now (guided by
statistics collection):
https://ice
binshuohu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2347381724
@stevenzwu Is there any plan to reapply this change to the main branch? Has
there been any follow up since https://github.com/apache/iceberg/pull/8848 ?
--
This is an automated messag
stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1862079413
@bendevera here is our presentation:
https://www.youtube.com/watch?v=GJplmOO7ULA&t=18s. here is the design doc:
https://docs.google.com/document/d/13N8cMqPi-ZPSKbkXGOBMPOzbv2Fua59j8bIj
bendevera commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861920740
@stevenzwu thank you for the quick response!
Okay, will run some `BucketPartitioner` tests for our use case by copying
code manually. Smart shuffling sounds interesting and would
stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861850713
It is reverted because there are users depending on the previous behavior of
keyBy all partition columns.
https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778
We w
bendevera commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861295520
@stevenzwu I see defaulting to `BucketPartitioner` was reverted here:
https://github.com/apache/iceberg/pull/8848
We've found performance issues with `DistributionMode.HASH`, and
chenwyi2 commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1765530967
In normal conditition, only the data of current minute will be written.
However, if the data is delayed, for example, at 11:50, the data has not been
written until 11:55, then at 11:56
stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1763706777
is the partition time an event time or ingestion/processing time? or asking
in a different way, how many active minutes do the Flink writer job process for
every commit cycle?
I
chenwyi2 commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1763604926
yes, I am creating very fine grained partitions, because i want to query and
comput some business metrics between minutes ss fast as possible. As for bucket
number, i use a fomula QPS *
stevenzwu commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1762592732
@chenwyi2 Is your point that we shouldn't only consider bucketing column
(like did in this PR). you just want a plain keyBy in this case? that would be
a fair point. Do you get balanced
chenwyi2 commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778
Hi @stevenzwu @kengtin this PR can be create too many small files when
parition with dt,hout,minute and bucekt(id), suppose paralisim is 120 and bucke
number is 8, then 15 writes can wri
11 matches
Mail list logo