msfroh commented on PR #13054: URL: https://github.com/apache/lucene/pull/13054#issuecomment-1920420714
I did some rough benchmarks using the large synonym file attached to https://issues.apache.org/jira/browse/LUCENE-3233 The benchmark code and input is at https://github.com/msfroh/lucene/commit/174f98a91e8709ee66dc2ed84b5c0b54e1a10635 |Attempt|On-heap load|Off-heap load|Off-heap reload|On-heap process|Off-heap process|Off-heap reload process |-------|------------|-------------|---------------|---------------|----------------|----------------------- |1|1146.022381|1117.004359|4.099065|569.120851|656.430684|613.475144 |2|1079.578922|1060.926854|1.761465|456.203168|655.596275|622.534246 |3|1035.911388|1076.611629|1.750233|579.41094|655.955431|614.788388 |4|1037.825728|1085.513933|2.074129|696.390519|688.664985|613.266972 |5|1017.489384|1008.209808|1.717748|485.510526|620.800148|620.708538 |6|1014.641653|1024.412669|1.740371|483.617261|619.696259|619.910897 |7|1027.691397|1045.129567|1.727786|670.49456|622.48759|616.303549 |8|984.005971|1009.265777|1.736832|513.543926|615.448442|613.06279 |9|1027.841112|1027.057453|1.732985|486.502644|622.535269|620.285635 |10|981.689573|1074.613506|1.71059|707.810107|613.417977|624.34832 |11|1026.165712|1065.3181|1.689407|479.610417|621.454353|616.183786 |12|994.949905|1046.898091|1.730394|498.938696|612.279425|619.965444 |13|1035.144288|1043.119169|1.739726|472.821155|619.267425|613.029508 |14|996.056368|1017.663948|1.699742|692.135015|619.725163|620.454352 |15|1046.605644|1018.287866|1.713526|470.391592|619.723699|612.068366 |16|1007.579733|1042.062818|1.70251|508.481346|619.481298|619.178419 |17|1038.166702|1054.039165|1.683814|485.439337|620.901934|616.017789 |18|1000.900448|1058.492139|1.7267|515.185816|622.204031|627.560895 |19|1236.416447|1080.877889|1.643654|434.73928|624.825435|625.622426 |20|997.663619|1038.478411|1.657257|497.232157|623.337627|620.943519 |**Mean**|1036.617319|1049.699158|1.8518967|535.1789657|628.7116725|618.4854492 |**Stddev**|59.71799264|28.44516049|0.535792004|86.95026923|19.55324941|4.52695571 So, it looks like the time to load synonyms is mostly unchanged (1050ms versus 1037ms), and loading "pre-compiled" synonyms is super-duper fast. We do seem to take a 17.5% hit on processing time. (629ms versus 535ms.) I might try profiling to see where that time is being spent. If it's doing FST operations, I'll assume it's a cost of doing business. If it's spent loading the also off-heap output words, I might consider moving those (optionally?) back on heap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org