[ 
https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10297:
------------------------------
    Description: 
Though we already have a bitset optimization for low cardinality fields, but 
the optimization usually only works on extremly low cardinality fields 
(cardinality < 16), for medium cardinality case like 30, 100 can rarely get 
this optimization.

In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
this approach. I think this is because we are trying to optimize the unsorted 
situation, which typically happens for high cardinality fields. Maybe this is 
because the bottleneck of queries on high cardinality fields is usually 
visitDocValues but not readDocIds? I think medium cardinality fields are 
tempted for this optimization.

I benchmark the optimization by mocking some random longPoint and querying them 
with PointInSetQuery. As expected, the medium cardinality fields got spped up 
and high cardinality fields are even.

*BaseLine*
{code:java}
task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29
task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40
task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74
task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144
task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284
task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20
task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70
task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127
task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251
task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576
task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2
task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11
task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18
task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88
task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: 266
task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3
task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: 11
task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: 8
task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, took: 
33
task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, took: 
97
task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4
task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: 20
task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: 31
task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, took: 
70
task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, took: 
209
{code}
*candidate*
{code:java}
task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18
task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16
task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26
task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46
task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88
task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12
task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22
task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29
task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50
task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93
task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9
task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13
task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: 42
task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: 
129
task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, took: 9
task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, took: 9
task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, took: 
32
task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, 
took: 93
task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, took: 
21
task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, took: 
38
task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, took: 
73
task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, 
took: 204
{code}

  was:
Though we already have a bitset optimization for low cardinality fields, but 
the optimization usually only works on extremly low cardinality fields 
(cardinality < 16), for medium cardinality case like 30, 100 can rarely get 
this optimization.

In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
this approach. I think this is because we are trying to optimize the unsorted 
situation, which typically happens for high cardinality fields. Maybe this is 
because the bottleneck of queries on high cardinality fields is usually 
visitDocValues but not readDocIds? I think medium cardinality fields are 
tempted for this optimization.

The basic idea is that we get deltas for sorted ids and encode them with encode

I benchmark the optimization by mocking some random longPoint and querying them 
with PointInSetQuery. As expected, the medium cardinality fields got spped up 
and high cardinality fields are even.

*BaseLine*
{code:java}
task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29
task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40
task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74
task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144
task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284
task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20
task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70
task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127
task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251
task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576
task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2
task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11
task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18
task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88
task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: 266
task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3
task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: 11
task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: 8
task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, took: 
33
task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, took: 
97
task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4
task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: 20
task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: 31
task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, took: 
70
task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, took: 
209
{code}
*candidate*
{code:java}
task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18
task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16
task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26
task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46
task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88
task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12
task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22
task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29
task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50
task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93
task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9
task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13
task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: 42
task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: 
129
task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, took: 9
task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, took: 9
task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, took: 
32
task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, 
took: 93
task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: 2
task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, took: 
21
task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, took: 
38
task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, took: 
73
task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, 
took: 204
{code}


> Speed up medium cardinality fields with readLELongs and SIMD
> ------------------------------------------------------------
>
>                 Key: LUCENE-10297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10297
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Though we already have a bitset optimization for low cardinality fields, but 
> the optimization usually only works on extremly low cardinality fields 
> (cardinality < 16), for medium cardinality case like 30, 100 can rarely get 
> this optimization.
> In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to 
> use readLELongs to speed up BKD id blocks, but did not get a obvious gain on 
> this approach. I think this is because we are trying to optimize the unsorted 
> situation, which typically happens for high cardinality fields. Maybe this is 
> because the bottleneck of queries on high cardinality fields is usually 
> visitDocValues but not readDocIds? I think medium cardinality fields are 
> tempted for this optimization.
> I benchmark the optimization by mocking some random longPoint and querying 
> them with PointInSetQuery. As expected, the medium cardinality fields got 
> spped up and high cardinality fields are even.
> *BaseLine*
> {code:java}
> task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29
> task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40
> task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74
> task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144
> task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284
> task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20
> task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70
> task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127
> task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251
> task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576
> task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2
> task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11
> task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18
> task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88
> task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: 
> 266
> task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3
> task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: 
> 11
> task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: 
> 8
> task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, 
> took: 33
> task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, 
> took: 97
> task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4
> task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: 
> 20
> task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: 
> 31
> task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, 
> took: 70
> task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, 
> took: 209
> {code}
> *candidate*
> {code:java}
> task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18
> task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16
> task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26
> task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46
> task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88
> task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12
> task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22
> task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29
> task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50
> task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93
> task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2
> task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9
> task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13
> task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: 
> 42
> task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: 
> 129
> task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: 
> 2
> task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, 
> took: 9
> task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, 
> took: 9
> task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, 
> took: 32
> task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, 
> took: 93
> task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: 
> 2
> task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, 
> took: 21
> task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, 
> took: 38
> task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, 
> took: 73
> task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, 
> took: 204
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to