[jira] [Comment Edited] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD

Feng Guo (Jira) Sat, 11 Dec 2021 10:39:07 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456577#comment-17456577
 ]


Feng Guo edited comment on LUCENE-10297 at 12/11/21, 6:38 PM:
--------------------------------------------------------------

Hi [~jpountz], [~ivera] ! Could you help take a look at this issue when you 
have free time? Thanks!


was (Author: gf2121):
Hi [~jpountz], [~ivera] ! Would you please help take a look at this issue when 
you have free time? Thanks!

> Speed up medium cardinality fields with readLongs and SIMD
> ----------------------------------------------------------
>
>                 Key: LUCENE-10297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10297
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We introduced a bitset optimization for extremly low cardinality fields in 
> [LUCENE-10233|https://issues.apache.org/jira/browse/LUCENE-10233], but medium 
> cardinality fields (like 32/128) can rarely trigger this optimization, I'm 
> trying to find out a way to speed up them.
> In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
> use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
> this approach. I think the reason could probably be that we were trying to 
> optimize the unsorted situation (typically happens for high cardinality 
> fields) and the bottleneck of queries on high cardinality fields is 
> {{visitDocValues}} but not {{{}readDocIds{}}}. _(Not sure, i'm doing some 
> more benchmark on this)_
> However, medium cardinality fields may be tempted for this optimization 
> because they need to read lots of ids for each term. The basic idea is that 
> we can compute the delta of the sorted ids and encode/decode them like what 
> we do in {{{}StoredFieldsInts{}}}. I benchmarked the optimization by mocking 
> some random longPoint and querying them with {{{}PointInSetQuery{}}}. As 
> expected, the medium cardinality fields got spped up and high cardinality 
> fields get even results.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline(ms)|candidate(ms)|diff 
> percentage|baseline(QPS)|candidate(QPS)|diff percentage|
> |100000000|32|1|19|16|-15.79%|52.63|62.50|18.75%|
> |100000000|32|2|34|14|-58.82%|29.41|71.43|142.86%|
> |100000000|32|4|76|22|-71.05%|13.16|45.45|245.45%|
> |100000000|32|8|139|42|-69.78%|7.19|23.81|230.95%|
> |100000000|32|16|279|82|-70.61%|3.58|12.20|240.24%|
> |100000000|128|1|17|11|-35.29%|58.82|90.91|54.55%|
> |100000000|128|8|75|23|-69.33%|13.33|43.48|226.09%|
> |100000000|128|16|126|25|-80.16%|7.94|40.00|404.00%|
> |100000000|128|32|245|50|-79.59%|4.08|20.00|390.00%|
> |100000000|128|64|528|97|-81.63%|1.89|10.31|444.33%|
> |100000000|1024|1|3|2|-33.33%|333.33|500.00|50.00%|
> |100000000|1024|8|13|8|-38.46%|76.92|125.00|62.50%|
> |100000000|1024|32|31|19|-38.71%|32.26|52.63|63.16%|
> |100000000|1024|128|120|67|-44.17%|8.33|14.93|79.10%|
> |100000000|1024|512|480|133|-72.29%|2.08|7.52|260.90%|
> |100000000|8192|1|3|3|0.00%|333.33|333.33|0.00%|
> |100000000|8192|16|18|15|-16.67%|55.56|66.67|20.00%|
> |100000000|8192|64|19|14|-26.32%|52.63|71.43|35.71%|
> |100000000|8192|512|69|43|-37.68%|14.49|23.26|60.47%|
> |100000000|8192|2048|236|134|-43.22%|4.24|7.46|76.12%|
> |100000000|1048576|1|3|2|-33.33%|333.33|500.00|50.00%|
> |100000000|1048576|16|18|19|5.56%|55.56|52.63|-5.26%|
> |100000000|1048576|64|17|17|0.00%|58.82|58.82|0.00%|
> |100000000|1048576|512|34|32|-5.88%|29.41|31.25|6.25%|
> |100000000|1048576|2048|89|93|4.49%|11.24|10.75|-4.30%|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10297) Speed up medium cardinality fields with readLongs and SIMD

Reply via email to