[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r906765458 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -114,6 +116,7 @@ private void finishCurrentDoc() { } lastValue = termID; } +maxBitsRequired |= count; Review Comment: Thanks for catching this, @jpountz I saw we already have a `maxCount`, that's what we needed. Addressed in https://github.com/apache/lucene/pull/967/commits/542c2f9a5fa7dab0a9b3cc84fc777d8988fec3d7 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r906767186 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -114,6 +116,7 @@ private void finishCurrentDoc() { } lastValue = termID; } +maxBitsRequired |= count; Review Comment: Thanks for catching this, @jpountz , I saw we already have a `maxCount`, that is what we wanted. Addressed in https://github.com/apache/lucene/pull/967/commits/542c2f9a5fa7dab0a9b3cc84fc777d8988fec3d7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r906769090 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -805,11 +805,9 @@ public int nextDoc() throws IOException { int doc = values.nextDoc(); if (doc != NO_MORE_DOCS) { docValueCount = 0; - for (long ord = values.nextOrd(); - ord != SortedSetDocValues.NO_MORE_ORDS; - ord = values.nextOrd()) { + for (int j = 0; j < values.docValueCount(); j++) { ords = ArrayUtil.grow(ords, docValueCount + 1); Review Comment: Addressed in https://github.com/apache/lucene/pull/967/commits/0c6abf3ebd3b734cddabd26e35fbaa9d64089dff . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie opened a new pull request, #982: Fix typos and minor refactoring to FacetConfig
shaie opened a new pull request, #982: URL: https://github.com/apache/lucene/pull/982 ### Description (or a Jira issue link if you have one) Some typos fixes + small refactoring to simplify `FacetConfig` code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #983: Some refactoring/cleanup of AbstractSortedSetDocValueFacetCounts
gsmiller opened a new pull request, #983: URL: https://github.com/apache/lucene/pull/983 A little refactoring/cleanup of common functionality in `AbstractSortedSetDocValueFacetCounts`. No functional change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #984: Switch Float/IntTaxonomyFacets to primitive list data structures in getAllChildren
gsmiller opened a new pull request, #984: URL: https://github.com/apache/lucene/pull/984 Let's avoid creating some garbage and unnecessary boxing/unboxing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1166714692 Hi @jpountz, I've taken some ideas from your bulk scorer implementation and was able to simplify my code as well as to boost the performance when under default `SEARCH_NUM_THREADS` [here](https://github.com/apache/lucene/pull/972/commits/cb8ab7485a405e9517049822eef36ae590f2f65b). The benchmark results look similar now albeit a bit varying : ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDateSSDVFacets4.38 (35.8%)4.01 (30.3%) -8.3% ( -54% - 90%) 0.431 Prefix3 811.56 (5.6%) 782.08 (7.8%) -3.6% ( -16% - 10%) 0.091 OrHighMedDayTaxoFacets 11.42 (5.6%) 11.11 (8.0%) -2.7% ( -15% - 11%) 0.223 IntNRQ 297.19 (1.5%) 291.62 (5.0%) -1.9% ( -8% -4%) 0.107 Wildcard 269.43 (5.0%) 264.57 (6.4%) -1.8% ( -12% - 10%) 0.319 BrowseRandomLabelSSDVFacets 20.22 (8.8%) 19.86 (8.4%) -1.8% ( -17% - 16%) 0.518 HighTermTitleBDVSort 236.73 (8.6%) 232.93 (8.6%) -1.6% ( -17% - 17%) 0.555 AndHighHighDayTaxoFacets 12.67 (2.9%) 12.48 (4.4%) -1.5% ( -8% -5%) 0.186 BrowseMonthTaxoFacets 32.18 (36.3%) 31.72 (38.8%) -1.4% ( -56% - 115%) 0.904 LowPhrase 1725.41 (3.3%) 1702.14 (5.3%) -1.3% ( -9% -7%) 0.334 MedSloppyPhrase 111.58 (3.2%) 110.16 (3.8%) -1.3% ( -8% -5%) 0.250 HighPhrase 930.18 (2.5%) 919.75 (3.4%) -1.1% ( -6% -4%) 0.234 MedTermDayTaxoFacets 46.10 (3.9%) 45.68 (4.8%) -0.9% ( -9% -8%) 0.514 TermDTSort 341.03 (7.2%) 338.23 (8.5%) -0.8% ( -15% - 15%) 0.740 AndHighMedDayTaxoFacets 39.88 (1.9%) 39.57 (3.1%) -0.8% ( -5% -4%) 0.349 HighTermDayOfYearSort 148.85 (7.6%) 147.86 (8.3%) -0.7% ( -15% - 16%) 0.792 HighTermMonthSort 218.46 (8.6%) 217.06 (9.2%) -0.6% ( -16% - 18%) 0.819 OrNotHighLow 2696.50 (5.4%) 2681.95 (5.0%) -0.5% ( -10% - 10%) 0.743 LowSloppyPhrase 22.79 (2.0%) 22.69 (2.9%) -0.4% ( -5% -4%) 0.585 Fuzzy2 125.08 (2.7%) 124.54 (4.3%) -0.4% ( -7% -6%) 0.708 HighSloppyPhrase 21.02 (2.3%) 20.94 (3.0%) -0.4% ( -5% -5%) 0.629 OrHighNotMed 1805.04 (4.7%) 1797.98 (5.8%) -0.4% ( -10% - 10%) 0.816 BrowseMonthSSDVFacets 29.37 (14.0%) 29.26 (13.4%) -0.4% ( -24% - 31%) 0.933 MedPhrase 205.52 (1.7%) 204.78 (3.0%) -0.4% ( -4% -4%) 0.643 Fuzzy1 128.47 (2.8%) 128.05 (4.2%) -0.3% ( -7% -6%) 0.772 AndHighLow 2126.24 (5.3%) 2124.42 (5.6%) -0.1% ( -10% - 11%) 0.960 Respell 83.33 (3.2%) 83.33 (4.2%)0.0% ( -7% -7%) 0.998 OrHighNotHigh 1415.44 (4.4%) 1419.78 (4.5%)0.3% ( -8% -9%) 0.827 OrHighNotLow 1655.08 (4.4%) 1663.51 (4.7%)0.5% ( -8% - 10%) 0.725 OrNotHighHigh 1035.89 (3.1%) 1042.85 (4.6%)0.7% ( -6% -8%) 0.587 PKLookup 283.77 (5.1%) 285.92 (4.5%)0.8% ( -8% - 10%) 0.616 LowTerm 3616.62 (4.1%) 3655.48 (5.3%)1.1% ( -8% - 10%) 0.476 HighSpanNear 15.54 (2.2%) 15.71 (3.5%)1.1% ( -4% -7%) 0.241 MedTerm 2615.07 (4.0%) 2645.27 (4.0%)1.2% ( -6% -9%) 0.364 OrNotHighMed 1759.45 (4.2%) 1779.94 (4.6%)1.2% ( -7% - 10%) 0.406 LowSpanNear 66.06 (2.9%) 66.83 (4.3%)1.2% ( -5% -8%) 0.316 BrowseDayOfYearSSDVFacets 26.94 (10.7%) 27.30 (9.7%)1.3% ( -17% - 24%) 0.684 MedIntervalsOrdered 86.40 (5.1%) 87.58 (4.8%)1.4% ( -8% - 11%) 0.387 AndHigh
[GitHub] [lucene] shaie merged pull request #982: Fix typos and minor refactoring to FacetConfig
shaie merged PR #982: URL: https://github.com/apache/lucene/pull/982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie opened a new pull request, #985: Fix typos and minor refactoring to FacetConfig (#982)
shaie opened a new pull request, #985: URL: https://github.com/apache/lucene/pull/985 ### Description Backport `9338909373a` to branch_9x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie merged pull request #985: Fix typos and minor refactoring to FacetConfig (#982)
shaie merged PR #985: URL: https://github.com/apache/lucene/pull/985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection
jtibshirani commented on PR #951: URL: https://github.com/apache/lucene/pull/951#issuecomment-1166948233 For context, I also reran benchmarks and didn't see any slowdown to the typical case (not backed by a BitSet). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection
jtibshirani merged PR #951: URL: https://github.com/apache/lucene/pull/951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries
[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559019#comment-17559019 ] ASF subversion and git services commented on LUCENE-10606: -- Commit 03846b468e52126582c09816f7e85e98aee9a405 in lucene's branch refs/heads/main from Kaival Parikh [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=03846b468e5 ] LUCENE-10606: For KnnVectorQuery, optimize case where filter is backed by BitSetIterator (#951) Instead of collecting hit-by-hit using a `LeafCollector`, we break down the search by instantiating a weight, creating scorers, and checking the underlying iterator. If it is backed by a `BitSet`, we directly update the reference (as we won't be editing the `Bits`). Else we can create a new `BitSet` from the iterator using `BitSet.of`. > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Kaival Parikh >Priority: Minor > Labels: performance > Time Spent: 3h 50m > Remaining Estimate: 0h > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org