[jira] [Updated] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD

Feng Guo (Jira) Wed, 08 Dec 2021 20:45:06 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Feng Guo updated LUCENE-10297:
------------------------------
    Summary: Speed up medium cardinality fields with readLongs and SIMD  (was: 
Speed up medium cardinality fields with readLELongs and SIMD)

> Speed up medium cardinality fields with readLongs and SIMD
> ----------------------------------------------------------
>
>                 Key: LUCENE-10297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10297
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We already have a bitset optimization for low cardinality fields, but the 
> optimization only works on extremly low cardinality fields (doc count > 1/16 
> total doc), for medium cardinality case like 32/128 can rarely get this 
> optimization.
> In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
> use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
> this approach. Maybe this is because we are trying to optimize the unsorted 
> situation (typically happens for high cardinality fields) and the bottleneck 
> of queries on high cardinality fields is {{visitDocValues}} but not 
> {{readDocIds}} ? 
> However, medium cardinality fields may be tempted for this optimization 
> because they need to read lots of ids for each term. The basic idea is that 
> we can compute the delta of the sorted ids and encode/decode them like what 
> we do in {{StoredFieldsInts}}. I benchmarked the optimization by mocking some 
> random longPoint and querying them with {{PointInSetQuery}}. As expected, the 
> medium cardinality fields got spped up and high cardinality fields get even 
> results.
> *Benchmark Result*
> |doc count|field cardinality|query term count|baseline(ms)|candidate(ms)|diff 
> percentage|
> |100000000|32|1|19|16|-15.79%|
> |100000000|32|2|34|14|-58.82%|
> |100000000|32|4|76|22|-71.05%|
> |100000000|32|8|139|42|-69.78%|
> |100000000|32|16|279|82|-70.61%|
> |100000000|128|1|17|11|-35.29%|
> |100000000|128|8|75|23|-69.33%|
> |100000000|128|16|126|25|-80.16%|
> |100000000|128|32|245|50|-79.59%|
> |100000000|128|64|528|97|-81.63%|
> |100000000|1024|1|3|2|-33.33%|
> |100000000|1024|8|13|8|-38.46%|
> |100000000|1024|32|31|19|-38.71%|
> |100000000|1024|128|120|67|-44.17%|
> |100000000|1024|512|480|133|-72.29%|
> |100000000|8192|1|3|3|0.00%|
> |100000000|8192|16|18|15|-16.67%|
> |100000000|8192|64|19|14|-26.32%|
> |100000000|8192|512|69|43|-37.68%|
> |100000000|8192|2048|236|134|-43.22%|
> |100000000|1048576|1|3|2|-33.33%|
> |100000000|1048576|16|18|19|5.56%|
> |100000000|1048576|64|17|17|0.00%|
> |100000000|1048576|512|34|32|-5.88%|
> |100000000|1048576|2048|89|93|4.49%|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD

Reply via email to