neoremind edited a comment on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-827678981
I spent some time trying to use the real case benchmark. The speedup of `IndexWriter` is what we expected, faster than main branch, total time elapsed (include adding doc, building index and merging) decreased by about 20%. If we only consider `flush_time`, the speedup is more obvious, time cost drops about 40% - 50%. 1) Run [IndexAndSearchOpenStreetMaps1D.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps1D.java) against the two branches and take down the [log](https://github.com/neoremind/luceneutil/tree/master/log/OpenStreetMaps). _note: comment query stage, modify some of the code to adapt to latest Lucene main branch._ main branch: ``` $ egrep "flush time|sec to build index" open-street-maps.log DWPT 0 [2021-04-27T11:33:04.518908Z; main]: flush time 17284.537739 msec DWPT 0 [2021-04-27T11:33:37.888449Z; main]: flush time 12039.476885 msec 72.49147722 sec to build index ``` PR branch: ``` $ egrep "flush time|sec to build index" open-street-maps-optimized.log DWPT 0 [2021-04-28T18:06:57.931560Z; main]: flush time 9483.778536 msec DWPT 0 [2021-04-28T18:07:26.493593Z; main]: flush time 8145.392875 msec 59.176608435 sec to build index ``` 2) Further more, I come up with an idea to use TPC-H LINEITEM to verify. I have a 10GB TPC-H dataset and develop a new test case to import the first 5 INT fields, which is more typical in real case. Run [IndexAndSearchTpcHLineItem.java](https://github.com/neoremind/luceneutil/blob/master/src/main/perf/IndexAndSearchTpcHLineItem.java) against the two branches and take down the [log](https://github.com/neoremind/luceneutil/tree/master/log/TPC-H-LINEITEM). main branch: ``` $ egrep "flush time|sec to build index" tpch-lineitem.log DWPT 0 [2021-04-27T11:17:25.329006Z; main]: flush time 13850.23328 msec DWPT 0 [2021-04-27T11:17:50.289370Z; main]: flush time 12228.723665 msec DWPT 0 [2021-04-27T11:18:15.546002Z; main]: flush time 12537.085005 msec DWPT 0 [2021-04-27T11:18:40.140413Z; main]: flush time 11819.225223 msec DWPT 0 [2021-04-27T11:19:04.850989Z; main]: flush time 12004.380921 msec DWPT 0 [2021-04-27T11:19:29.435183Z; main]: flush time 11850.273453 msec DWPT 0 [2021-04-27T11:19:54.016951Z; main]: flush time 11882.316067 msec DWPT 0 [2021-04-27T11:20:18.932727Z; main]: flush time 12223.151464 msec DWPT 0 [2021-04-27T11:20:43.522117Z; main]: flush time 11871.276323 msec DWPT 0 [2021-04-27T11:20:52.060300Z; main]: flush time 3422.434221 msec 271.188917715 sec to build index ``` PR branch: ``` $ egrep "flush time|sec to build index" tpch-lineitem-optimized.log DWPT 0 [2021-04-28T18:09:17.063371Z; main]: flush time 7547.521814 msec DWPT 0 [2021-04-28T18:09:36.070457Z; main]: flush time 7226.72845 msec DWPT 0 [2021-04-28T18:09:55.085997Z; main]: flush time 7275.426344 msec DWPT 0 [2021-04-28T18:10:13.928021Z; main]: flush time 7140.31387 msec DWPT 0 [2021-04-28T18:10:32.788150Z; main]: flush time 7173.103266 msec DWPT 0 [2021-04-28T18:10:51.830926Z; main]: flush time 7371.514576 msec DWPT 0 [2021-04-28T18:11:10.644303Z; main]: flush time 7132.407293 msec DWPT 0 [2021-04-28T18:11:29.586830Z; main]: flush time 7150.281669 msec DWPT 0 [2021-04-28T18:11:48.268161Z; main]: flush time 7009.686475 msec DWPT 0 [2021-04-28T18:11:55.172851Z; main]: flush time 2115.221804 msec 213.240120432 sec to build index ``` For benchmark command, please refer to [my document](https://github.com/neoremind/luceneutil/tree/master/command). Test environment: ``` CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz Stepping: 4 CPU MHz: 2500.000 BogoMIPS: 5000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0-31 Memory: $cat /proc/meminfo MemTotal: 65703704 kB Disk: SATA $fdisk -l | grep Disk Disk /dev/vdb: 35184.4 GB, 35184372088832 bytes, 68719476736 sectors OS: Linux 4.19.57-15.1.al7.x86_64 JDK: openjdk version "11.0.11" 2021-04-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
