Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-11 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1635624299 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-11 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1635623085 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -103,14 +105,41 @@ public void initializeState(StateInit

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-11 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1635019206 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/GlobalStatistics.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-11 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1634507795 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/GlobalStatistics.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159279193 > > > I think we can overlay the hash distribution above the ranges, > > > > > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distri

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633800594 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159222857 > > I think we can overlay the hash distribution above the ranges, > > Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires st

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2159001144 > I think we can overlay the hash distribution above the ranges, Not sure if we want that, hash distribution (keyBy) is simple and low overhead. Range distribution requires stat

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158978912 > > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can cr

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1633577198 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
stevenzwu commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2158524375 > * What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure that the records with the same id go to the same subtask, so we can

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on PR #10457: URL: https://github.com/apache/iceberg/pull/10457#issuecomment-2157707270 @stevenzwu: Unrelated question, but come up when I have reading the PR: - What happens in CDC cases, where the partition of the record might end up on multiple ranges. Did we make sure

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1632828545 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Seriali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-10 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1632808824 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Seriali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630345753 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630469792 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestAggregatedStatisticsTracker.java: ## @@ -246,7 +254,14 @@ public void receiveCompletedS

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630457171 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java: ## @@ -189,28 +198,75 @@ private void handleDataStatisticRe

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630345753 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630412274 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -103,14 +105,41 @@ public void initializeState(StateInit

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630376841 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java: ## @@ -189,28 +198,75 @@ private void handleDataStatisticRe

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630358521 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -103,14 +105,41 @@ public void initializeState(StateInit

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630354301 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -139,14 +168,16 @@ public void handleOperatorEvent(Opera

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630345753 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Ser

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630172641 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestAggregatedStatisticsTracker.java: ## @@ -246,7 +254,14 @@ public void receiveCompletedStati

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630169892 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestAggregatedStatisticsTracker.java: ## @@ -246,7 +254,14 @@ public void receiveCompletedStati

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630164955 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -103,14 +105,41 @@ public void initializeState(StateInitiali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630153587 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java: ## @@ -189,28 +198,75 @@ private void handleDataStatisticReques

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630149920 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java: ## @@ -35,28 +35,28 @@ class AggregatedStatistics implements Seriali

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630124473 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -139,14 +168,16 @@ public void handleOperatorEvent(OperatorE

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
pvary commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630122341 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java: ## @@ -139,14 +168,16 @@ public void handleOperatorEvent(OperatorE

Re: [PR] Flink: handle rescale properly for range bounds in sketch statistics [iceberg]

2024-06-06 Thread via GitHub
stevenzwu commented on code in PR #10457: URL: https://github.com/apache/iceberg/pull/10457#discussion_r1630015642 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java: ## @@ -71,6 +76,7 @@ class DataStatisticsCoordinator implem