Re: [PR] Optimise index stats collector for no dict [pinot]

via GitHub Wed, 01 Oct 2025 15:31:01 -0700


Jackie-Jiang commented on code in PR #16845:
URL: https://github.com/apache/pinot/pull/16845#discussion_r2396034481



##########
pinot-controller/src/main/java/org/apache/pinot/controller/recommender/data/generator/StringGenerator.java:
##########
@@ -51,8 +50,8 @@ public StringGenerator(Integer cardinality, Double 
numberOfValuesPerEntry, Integ
     int initValueSize = lengthOfEachString - _counterLength;
     Preconditions.checkState(initValueSize >= 0,
         String.format("Cannot generate %d unique string with length %d", 
_cardinality, lengthOfEachString));
-    _initialValue = RandomStringUtils.randomAlphabetic(initValueSize);
-    _rand = new Random(System.currentTimeMillis());
+    _rand = new Random(0L);

Review Comment:
   Could you elaborate more on this change? Ideally we want randomization 
within the test to get more confidence.
   Also, seems this util is not just for testing purpose, and changing it might 
have other side effect



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/DictionariesTest.java:
##########
@@ -83,12 +83,12 @@ public class DictionariesTest implements 
PinotBuffersAfterMethodCheckRule {
 
   private static TableConfig _tableConfig;
 
-  @AfterClass
+  @AfterMethod
   public static void cleanup() {
     FileUtils.deleteQuietly(INDEX_DIR);
   }
 
-  @BeforeClass
+  @BeforeMethod

Review Comment:
   Creating a new segment per test could be expensive. Any specific reason to 
change this?



##########
pinot-controller/src/test/java/org/apache/pinot/controller/recommender/realtime/provisioning/MemoryEstimatorTest.java:
##########
@@ -50,7 +50,7 @@ public void testSegmentGenerator()
       assertEquals(extract(metadata, "column.colFloatMV.cardinality = 
(\\d+)"), "250");
       assertEquals(extract(metadata, "column.colString.cardinality = (\\d+)"), 
"300");
       assertEquals(extract(metadata, "column.colStringMV.cardinality = 
(\\d+)"), "350");
-      assertEquals(extract(metadata, "column.colBytes.cardinality = (\\d+)"), 
"400");
+      assertEquals(extract(metadata, "column.colBytes.cardinality = (\\d+)"), 
"443");

Review Comment:
   This doesn't seem correct. Why is the error rate so high (more than 10%)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimise index stats collector for no dict [pinot]

Reply via email to