gf2121 commented on PR #12800: URL: https://github.com/apache/lucene/pull/12800#issuecomment-1820687175
Thanks for feedback @mikemccand ! > Hmm it looks like random got a bit slower in candidate? Flush time ~550 ish ms in baseline and maybe ~650 ish ms in candidate? Ohhh! I reconfirmed and it turns out i paste the benchmark result in wrong place, sorry! > And it's not quite random right -- it's sort of a sawtooth between 0 and 14? Or am I reading the results backwards? Thanks for pointing out this, it is indeed not random enough. I changed the RANDOM to use `java.util.Random` and renamed the origin RANDOM to ROUND. The new random distribution get improved a bit more, 20+%. <details><summary>Benchmark Detail</summary> **Baseline** ``` Using order: RANDOM DWPT 0 [2023-11-21T10:49:03.532762Z; main]: flush time 1138.599958 ms DWPT 0 [2023-11-21T10:49:05.455861Z; main]: flush time 1059.658875 ms DWPT 0 [2023-11-21T10:49:07.224901Z; main]: flush time 991.167625 ms DWPT 0 [2023-11-21T10:49:08.847679Z; main]: flush time 850.281875 ms DWPT 0 [2023-11-21T10:49:10.468672Z; main]: flush time 848.403625 ms DWPT 0 [2023-11-21T10:49:12.104744Z; main]: flush time 861.536542 ms DWPT 0 [2023-11-21T10:49:13.731466Z; main]: flush time 851.316958 ms DWPT 0 [2023-11-21T10:49:15.350887Z; main]: flush time 847.584917 ms DWPT 0 [2023-11-21T10:49:16.963837Z; main]: flush time 843.849042 ms DWPT 0 [2023-11-21T10:49:18.579249Z; main]: flush time 843.092666 ms Using order: ROUND DWPT 1 [2023-11-21T10:49:20.051903Z; main]: flush time 648.484542 ms DWPT 1 [2023-11-21T10:49:21.472366Z; main]: flush time 643.642417 ms DWPT 1 [2023-11-21T10:49:22.889719Z; main]: flush time 644.994541 ms DWPT 1 [2023-11-21T10:49:24.307484Z; main]: flush time 642.117958 ms DWPT 1 [2023-11-21T10:49:25.727815Z; main]: flush time 642.38 ms DWPT 1 [2023-11-21T10:49:27.143574Z; main]: flush time 639.769875 ms DWPT 1 [2023-11-21T10:49:28.562387Z; main]: flush time 644.234375 ms DWPT 1 [2023-11-21T10:49:29.975009Z; main]: flush time 639.01125 ms DWPT 1 [2023-11-21T10:49:31.396969Z; main]: flush time 643.216 ms DWPT 1 [2023-11-21T10:49:32.810467Z; main]: flush time 639.049041 ms Using order: ASC DWPT 2 [2023-11-21T10:49:34.100537Z; main]: flush time 473.33425 ms DWPT 2 [2023-11-21T10:49:35.236826Z; main]: flush time 352.816167 ms DWPT 2 [2023-11-21T10:49:36.312917Z; main]: flush time 293.915917 ms DWPT 2 [2023-11-21T10:49:37.386792Z; main]: flush time 290.221458 ms DWPT 2 [2023-11-21T10:49:38.463960Z; main]: flush time 287.046708 ms DWPT 2 [2023-11-21T10:49:39.537561Z; main]: flush time 287.051709 ms DWPT 2 [2023-11-21T10:49:40.610809Z; main]: flush time 287.296375 ms DWPT 2 [2023-11-21T10:49:41.686863Z; main]: flush time 290.536083 ms DWPT 2 [2023-11-21T10:49:42.751377Z; main]: flush time 289.183375 ms DWPT 2 [2023-11-21T10:49:43.824249Z; main]: flush time 289.238584 ms Using order: DESC DWPT 3 [2023-11-21T10:49:45.039267Z; main]: flush time 394.276959 ms DWPT 3 [2023-11-21T10:49:46.203835Z; main]: flush time 365.40575 ms DWPT 3 [2023-11-21T10:49:47.359253Z; main]: flush time 364.55 ms DWPT 3 [2023-11-21T10:49:48.548749Z; main]: flush time 385.198 ms DWPT 3 [2023-11-21T10:49:49.715963Z; main]: flush time 366.247083 ms DWPT 3 [2023-11-21T10:49:50.881628Z; main]: flush time 372.473333 ms DWPT 3 [2023-11-21T10:49:52.037239Z; main]: flush time 367.666041 ms DWPT 3 [2023-11-21T10:49:53.192338Z; main]: flush time 364.390834 ms DWPT 3 [2023-11-21T10:49:54.346795Z; main]: flush time 367.417208 ms DWPT 3 [2023-11-21T10:49:55.506692Z; main]: flush time 374.948625 ms ``` **Candidate** ``` Using order: RANDOM DWPT 0 [2023-11-21T10:31:14.638348Z; main]: flush time 926.650958 ms DWPT 0 [2023-11-21T10:31:16.527778Z; main]: flush time 983.61375 ms DWPT 0 [2023-11-21T10:31:18.105650Z; main]: flush time 745.283416 ms DWPT 0 [2023-11-21T10:31:19.545346Z; main]: flush time 614.212208 ms DWPT 0 [2023-11-21T10:31:20.986866Z; main]: flush time 621.046833 ms DWPT 0 [2023-11-21T10:31:22.418842Z; main]: flush time 613.169292 ms DWPT 0 [2023-11-21T10:31:23.843488Z; main]: flush time 608.060375 ms DWPT 0 [2023-11-21T10:31:25.289972Z; main]: flush time 633.770083 ms DWPT 0 [2023-11-21T10:31:26.729025Z; main]: flush time 617.815 ms DWPT 0 [2023-11-21T10:31:28.152042Z; main]: flush time 606.253292 ms Using order: ROUND DWPT 1 [2023-11-21T10:31:29.546556Z; main]: flush time 540.889709 ms DWPT 1 [2023-11-21T10:31:30.891868Z; main]: flush time 534.34825 ms DWPT 1 [2023-11-21T10:31:32.235487Z; main]: flush time 529.94025 ms DWPT 1 [2023-11-21T10:31:33.585848Z; main]: flush time 538.600959 ms DWPT 1 [2023-11-21T10:31:34.926304Z; main]: flush time 535.212458 ms DWPT 1 [2023-11-21T10:31:36.261841Z; main]: flush time 529.868792 ms DWPT 1 [2023-11-21T10:31:37.612535Z; main]: flush time 532.926375 ms DWPT 1 [2023-11-21T10:31:38.950114Z; main]: flush time 531.968 ms DWPT 1 [2023-11-21T10:31:40.283548Z; main]: flush time 529.449208 ms DWPT 1 [2023-11-21T10:31:41.621569Z; main]: flush time 531.614458 ms Using order: ASC DWPT 2 [2023-11-21T10:31:42.931710Z; main]: flush time 466.0205 ms DWPT 2 [2023-11-21T10:31:44.110242Z; main]: flush time 361.563833 ms DWPT 2 [2023-11-21T10:31:45.270395Z; main]: flush time 344.598167 ms DWPT 2 [2023-11-21T10:31:46.391066Z; main]: flush time 297.298416 ms DWPT 2 [2023-11-21T10:31:47.508596Z; main]: flush time 292.465833 ms DWPT 2 [2023-11-21T10:31:48.619912Z; main]: flush time 294.3465 ms DWPT 2 [2023-11-21T10:31:49.733508Z; main]: flush time 294.211834 ms DWPT 2 [2023-11-21T10:31:50.844318Z; main]: flush time 292.396292 ms DWPT 2 [2023-11-21T10:31:51.957632Z; main]: flush time 294.951792 ms DWPT 2 [2023-11-21T10:31:53.060245Z; main]: flush time 293.81875 ms Using order: DESC DWPT 3 [2023-11-21T10:31:54.309892Z; main]: flush time 397.02825 ms DWPT 3 [2023-11-21T10:31:55.507951Z; main]: flush time 375.452125 ms DWPT 3 [2023-11-21T10:31:56.705769Z; main]: flush time 379.94275 ms DWPT 3 [2023-11-21T10:31:57.916353Z; main]: flush time 374.742583 ms DWPT 3 [2023-11-21T10:31:59.098488Z; main]: flush time 370.185083 ms DWPT 3 [2023-11-21T10:32:00.286668Z; main]: flush time 373.631208 ms DWPT 3 [2023-11-21T10:32:01.479051Z; main]: flush time 369.689833 ms DWPT 3 [2023-11-21T10:32:02.665413Z; main]: flush time 370.781 ms DWPT 3 [2023-11-21T10:32:03.841312Z; main]: flush time 372.006916 ms DWPT 3 [2023-11-21T10:32:05.019313Z; main]: flush time 374.449833 ms ``` </details> <details><summary>Code</summary> ``` enum Order { RANDOM, ROUND, ASC, DESC; } public static void main(String[] args) throws IOException { Random random = new Random(4317849138248L); for (Order order : Order.values()) { System.out.println("Using order: " + order.name()); Directory dir = FSDirectory.open(Paths.get("/tmp/a")); IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer()); cfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE); cfg.setInfoStream(new PrintStreamInfoStream(System.out)); cfg.setMaxBufferedDocs(1_000_000); cfg.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH); cfg.setIndexSort( new Sort(LongField.newSortField("sort_field", false, SortedNumericSelector.Type.MIN))); IndexWriter w = new IndexWriter(dir, cfg); Document doc = new Document(); LongField sortField = new LongField("sort_field", 0); doc.add(sortField); TextField stringField1 = new TextField("string_field", "", Field.Store.NO); doc.add(stringField1); TextField stringField2 = new TextField("string_field", "", Field.Store.NO); doc.add(stringField2); TextField stringField3 = new TextField("string_field", "", Field.Store.NO); doc.add(stringField3); for (int i = 0; i < 10_000_000; ++i) { long sortValue = switch (order) { case RANDOM -> random.nextLong(15); case ROUND -> i % 15; case ASC -> i; case DESC -> -i; }; sortField.setLongValue(sortValue); stringField1.setStringValue(Integer.toBinaryString(i % 10)); stringField2.setStringValue(Integer.toBinaryString(i % 100)); stringField3.setStringValue(Integer.toBinaryString(i % 1000)); w.addDocument(doc); } w.flush(); w.commit(); w.close(); } } ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org