[ https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Guo updated LUCENE-10297: ------------------------------ Summary: Speed up medium cardinality fields with readLongs and SIMD (was: Speed up medium cardinality fields with readLELongs and SIMD) > Speed up medium cardinality fields with readLongs and SIMD > ---------------------------------------------------------- > > Key: LUCENE-10297 > URL: https://issues.apache.org/jira/browse/LUCENE-10297 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We already have a bitset optimization for low cardinality fields, but the > optimization only works on extremly low cardinality fields (doc count > 1/16 > total doc), for medium cardinality case like 32/128 can rarely get this > optimization. > In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to > use readLELongs to speed up BKD id blocks, but did not get a obvious gain on > this approach. Maybe this is because we are trying to optimize the unsorted > situation (typically happens for high cardinality fields) and the bottleneck > of queries on high cardinality fields is {{visitDocValues}} but not > {{readDocIds}} ? > However, medium cardinality fields may be tempted for this optimization > because they need to read lots of ids for each term. The basic idea is that > we can compute the delta of the sorted ids and encode/decode them like what > we do in {{StoredFieldsInts}}. I benchmarked the optimization by mocking some > random longPoint and querying them with {{PointInSetQuery}}. As expected, the > medium cardinality fields got spped up and high cardinality fields get even > results. > *Benchmark Result* > |doc count|field cardinality|query term count|baseline(ms)|candidate(ms)|diff > percentage| > |100000000|32|1|19|16|-15.79%| > |100000000|32|2|34|14|-58.82%| > |100000000|32|4|76|22|-71.05%| > |100000000|32|8|139|42|-69.78%| > |100000000|32|16|279|82|-70.61%| > |100000000|128|1|17|11|-35.29%| > |100000000|128|8|75|23|-69.33%| > |100000000|128|16|126|25|-80.16%| > |100000000|128|32|245|50|-79.59%| > |100000000|128|64|528|97|-81.63%| > |100000000|1024|1|3|2|-33.33%| > |100000000|1024|8|13|8|-38.46%| > |100000000|1024|32|31|19|-38.71%| > |100000000|1024|128|120|67|-44.17%| > |100000000|1024|512|480|133|-72.29%| > |100000000|8192|1|3|3|0.00%| > |100000000|8192|16|18|15|-16.67%| > |100000000|8192|64|19|14|-26.32%| > |100000000|8192|512|69|43|-37.68%| > |100000000|8192|2048|236|134|-43.22%| > |100000000|1048576|1|3|2|-33.33%| > |100000000|1048576|16|18|19|5.56%| > |100000000|1048576|64|17|17|0.00%| > |100000000|1048576|512|34|32|-5.88%| > |100000000|1048576|2048|89|93|4.49%| > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org