[ https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456577#comment-17456577 ]
Feng Guo commented on LUCENE-10297: ----------------------------------- Hi [~jpountz] , Would you please help take a look at this issue when you have free time? Thanks! > Speed up medium cardinality fields with readLongs and SIMD > ---------------------------------------------------------- > > Key: LUCENE-10297 > URL: https://issues.apache.org/jira/browse/LUCENE-10297 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We introduced a bitset optimization for extremly low cardinality fields in > [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium > cardinality fields (like 32/128) can rarely trigger this optimization, I'm > trying to find out a way to speed up them. > In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to > use readLELongs to speed up BKD id blocks, but did not get a obvious gain on > this approach. I think the reason could be that we were trying to optimize > the unsorted situation (typically happens for high cardinality fields) and > the bottleneck of queries on high cardinality fields is {{visitDocValues}} > but not {{{}readDocIds{}}}. _(Not sure, i'm doing some more benchmark on > this)_ > However, medium cardinality fields may be tempted for this optimization > because they need to read lots of ids for each term. The basic idea is that > we can compute the delta of the sorted ids and encode/decode them like what > we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking > some random longPoint and querying them with {{{}PointInSetQuery{}}}. As > expected, the medium cardinality fields got spped up and high cardinality > fields get even results. > *Benchmark Result* > |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff > percentage|baseline(QPS)|candidate(QPS)|diff percentage| > |100000000|32|1|19|16|-15.79%|52.63|62.50|18.75%| > |100000000|32|2|34|14|-58.82%|29.41|71.43|142.86%| > |100000000|32|4|76|22|-71.05%|13.16|45.45|245.45%| > |100000000|32|8|139|42|-69.78%|7.19|23.81|230.95%| > |100000000|32|16|279|82|-70.61%|3.58|12.20|240.24%| > |100000000|128|1|17|11|-35.29%|58.82|90.91|54.55%| > |100000000|128|8|75|23|-69.33%|13.33|43.48|226.09%| > |100000000|128|16|126|25|-80.16%|7.94|40.00|404.00%| > |100000000|128|32|245|50|-79.59%|4.08|20.00|390.00%| > |100000000|128|64|528|97|-81.63%|1.89|10.31|444.33%| > |100000000|1024|1|3|2|-33.33%|333.33|500.00|50.00%| > |100000000|1024|8|13|8|-38.46%|76.92|125.00|62.50%| > |100000000|1024|32|31|19|-38.71%|32.26|52.63|63.16%| > |100000000|1024|128|120|67|-44.17%|8.33|14.93|79.10%| > |100000000|1024|512|480|133|-72.29%|2.08|7.52|260.90%| > |100000000|8192|1|3|3|0.00%|333.33|333.33|0.00%| > |100000000|8192|16|18|15|-16.67%|55.56|66.67|20.00%| > |100000000|8192|64|19|14|-26.32%|52.63|71.43|35.71%| > |100000000|8192|512|69|43|-37.68%|14.49|23.26|60.47%| > |100000000|8192|2048|236|134|-43.22%|4.24|7.46|76.12%| > |100000000|1048576|1|3|2|-33.33%|333.33|500.00|50.00%| > |100000000|1048576|16|18|19|5.56%|55.56|52.63|-5.26%| > |100000000|1048576|64|17|17|0.00%|58.82|58.82|0.00%| > |100000000|1048576|512|34|32|-5.88%|29.41|31.25|6.25%| > |100000000|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%| -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org