[ https://issues.apache.org/jira/browse/LUCENE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Guo updated LUCENE-10297: ------------------------------ Description: Though we already have a bitset optimization for low cardinality fields, but the optimization usually only works on extremly low cardinality fields (cardinality < 16), for medium cardinality case like 30, 100 can rarely get this optimization. In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to use readLELongs to speed up BKD id blocks, but did not get a obvious gain on this approach. I think this is because we are trying to optimize the unsorted situation, which typically happens for high cardinality fields, and the bottleneck of queries on high cardinality fields is usually visitDocValues but not readDocIds. Medium cardinality fields may be tempted for this optimization :) I benchmarked the optimization by mocking some random longPoint and querying them with PointInSetQuery. As expected, the medium cardinality fields got spped up and high cardinality fields get even results. *BaseLine* {code:java} task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29 task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40 task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74 task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144 task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284 task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20 task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70 task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127 task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251 task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576 task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2 task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11 task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18 task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88 task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: 266 task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3 task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: 11 task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: 8 task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, took: 33 task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, took: 97 task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4 task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: 20 task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: 31 task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, took: 70 task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, took: 209 {code} *candidate* {code:java} task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18 task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16 task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26 task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46 task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88 task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12 task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22 task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29 task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50 task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93 task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9 task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13 task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: 42 task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: 129 task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, took: 9 task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, took: 9 task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, took: 32 task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, took: 93 task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, took: 21 task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, took: 38 task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, took: 73 task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, took: 204 {code} was: Though we already have a bitset optimization for low cardinality fields, but the optimization usually only works on extremly low cardinality fields (cardinality < 16), for medium cardinality case like 30, 100 can rarely get this optimization. In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to use readLELongs to speed up BKD id blocks, but did not get a obvious gain on this approach. I think this is because we are trying to optimize the unsorted situation, which typically happens for high cardinality fields. Maybe this is because the bottleneck of queries on high cardinality fields is usually visitDocValues but not readDocIds? I think medium cardinality fields are tempted for this optimization. I benchmark the optimization by mocking some random longPoint and querying them with PointInSetQuery. As expected, the medium cardinality fields got spped up and high cardinality fields are even. *BaseLine* {code:java} task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29 task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40 task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74 task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144 task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284 task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20 task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70 task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127 task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251 task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576 task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2 task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11 task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18 task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88 task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: 266 task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3 task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: 11 task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: 8 task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, took: 33 task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, took: 97 task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4 task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: 20 task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: 31 task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, took: 70 task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, took: 209 {code} *candidate* {code:java} task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18 task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16 task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26 task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46 task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88 task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12 task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22 task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29 task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50 task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93 task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9 task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13 task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: 42 task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: 129 task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, took: 9 task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, took: 9 task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, took: 32 task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, took: 93 task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: 2 task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, took: 21 task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, took: 38 task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, took: 73 task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, took: 204 {code} > Speed up medium cardinality fields with readLELongs and SIMD > ------------------------------------------------------------ > > Key: LUCENE-10297 > URL: https://issues.apache.org/jira/browse/LUCENE-10297 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Though we already have a bitset optimization for low cardinality fields, but > the optimization usually only works on extremly low cardinality fields > (cardinality < 16), for medium cardinality case like 30, 100 can rarely get > this optimization. > In [https://github.com/apache/lucene-solr/pull/1538], we made some effort to > use readLELongs to speed up BKD id blocks, but did not get a obvious gain on > this approach. I think this is because we are trying to optimize the unsorted > situation, which typically happens for high cardinality fields, and the > bottleneck of queries on high cardinality fields is usually visitDocValues > but not readDocIds. Medium cardinality fields may be tempted for this > optimization :) > I benchmarked the optimization by mocking some random longPoint and querying > them with PointInSetQuery. As expected, the medium cardinality fields got > spped up and high cardinality fields get even results. > *BaseLine* > {code:java} > task: index_100000000_doc_32_cardinality_baseline, term count: 1, took: 29 > task: index_100000000_doc_32_cardinality_baseline, term count: 2, took: 40 > task: index_100000000_doc_32_cardinality_baseline, term count: 4, took: 74 > task: index_100000000_doc_32_cardinality_baseline, term count: 8, took: 144 > task: index_100000000_doc_32_cardinality_baseline, term count: 16, took: 284 > task: index_100000000_doc_128_cardinality_baseline, term count: 1, took: 20 > task: index_100000000_doc_128_cardinality_baseline, term count: 8, took: 70 > task: index_100000000_doc_128_cardinality_baseline, term count: 16, took: 127 > task: index_100000000_doc_128_cardinality_baseline, term count: 32, took: 251 > task: index_100000000_doc_128_cardinality_baseline, term count: 64, took: 576 > task: index_100000000_doc_8192_cardinality_baseline, term count: 1, took: 2 > task: index_100000000_doc_8192_cardinality_baseline, term count: 16, took: 11 > task: index_100000000_doc_8192_cardinality_baseline, term count: 64, took: 18 > task: index_100000000_doc_8192_cardinality_baseline, term count: 512, took: 88 > task: index_100000000_doc_8192_cardinality_baseline, term count: 2048, took: > 266 > task: index_100000000_doc_1048576_cardinality_baseline, term count: 1, took: 3 > task: index_100000000_doc_1048576_cardinality_baseline, term count: 16, took: > 11 > task: index_100000000_doc_1048576_cardinality_baseline, term count: 64, took: > 8 > task: index_100000000_doc_1048576_cardinality_baseline, term count: 512, > took: 33 > task: index_100000000_doc_1048576_cardinality_baseline, term count: 2048, > took: 97 > task: index_100000000_doc_8388608_cardinality_baseline, term count: 1, took: 4 > task: index_100000000_doc_8388608_cardinality_baseline, term count: 16, took: > 20 > task: index_100000000_doc_8388608_cardinality_baseline, term count: 64, took: > 31 > task: index_100000000_doc_8388608_cardinality_baseline, term count: 512, > took: 70 > task: index_100000000_doc_8388608_cardinality_baseline, term count: 2048, > took: 209 > {code} > *candidate* > {code:java} > task: index_100000000_doc_32_cardinality_candidate, term count: 1, took: 18 > task: index_100000000_doc_32_cardinality_candidate, term count: 2, took: 16 > task: index_100000000_doc_32_cardinality_candidate, term count: 4, took: 26 > task: index_100000000_doc_32_cardinality_candidate, term count: 8, took: 46 > task: index_100000000_doc_32_cardinality_candidate, term count: 16, took: 88 > task: index_100000000_doc_128_cardinality_candidate, term count: 1, took: 12 > task: index_100000000_doc_128_cardinality_candidate, term count: 8, took: 22 > task: index_100000000_doc_128_cardinality_candidate, term count: 16, took: 29 > task: index_100000000_doc_128_cardinality_candidate, term count: 32, took: 50 > task: index_100000000_doc_128_cardinality_candidate, term count: 64, took: 93 > task: index_100000000_doc_8192_cardinality_candidate, term count: 1, took: 2 > task: index_100000000_doc_8192_cardinality_candidate, term count: 16, took: 9 > task: index_100000000_doc_8192_cardinality_candidate, term count: 64, took: 13 > task: index_100000000_doc_8192_cardinality_candidate, term count: 512, took: > 42 > task: index_100000000_doc_8192_cardinality_candidate, term count: 2048, took: > 129 > task: index_100000000_doc_1048576_cardinality_candidate, term count: 1, took: > 2 > task: index_100000000_doc_1048576_cardinality_candidate, term count: 16, > took: 9 > task: index_100000000_doc_1048576_cardinality_candidate, term count: 64, > took: 9 > task: index_100000000_doc_1048576_cardinality_candidate, term count: 512, > took: 32 > task: index_100000000_doc_1048576_cardinality_candidate, term count: 2048, > took: 93 > task: index_100000000_doc_8388608_cardinality_candidate, term count: 1, took: > 2 > task: index_100000000_doc_8388608_cardinality_candidate, term count: 16, > took: 21 > task: index_100000000_doc_8388608_cardinality_candidate, term count: 64, > took: 38 > task: index_100000000_doc_8388608_cardinality_candidate, term count: 512, > took: 73 > task: index_100000000_doc_8388608_cardinality_candidate, term count: 2048, > took: 204 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org