[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-26 Thread GitBox


LuXugang commented on code in PR #967:
URL: https://github.com/apache/lucene/pull/967#discussion_r906765458


##
lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java:
##
@@ -114,6 +116,7 @@ private void finishCurrentDoc() {
   }
   lastValue = termID;
 }
+maxBitsRequired |= count;

Review Comment:
   Thanks for catching this, @jpountz  I saw we already have a `maxCount`, 
that's what we needed. 
   
   Addressed in 
https://github.com/apache/lucene/pull/967/commits/542c2f9a5fa7dab0a9b3cc84fc777d8988fec3d7
 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-26 Thread GitBox


LuXugang commented on code in PR #967:
URL: https://github.com/apache/lucene/pull/967#discussion_r906767186


##
lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java:
##
@@ -114,6 +116,7 @@ private void finishCurrentDoc() {
   }
   lastValue = termID;
 }
+maxBitsRequired |= count;

Review Comment:
   
   Thanks for catching this, @jpountz , I saw we already have a `maxCount`, 
that is what we wanted.
   
   Addressed in 
https://github.com/apache/lucene/pull/967/commits/542c2f9a5fa7dab0a9b3cc84fc777d8988fec3d7



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-26 Thread GitBox


LuXugang commented on code in PR #967:
URL: https://github.com/apache/lucene/pull/967#discussion_r906769090


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -805,11 +805,9 @@ public int nextDoc() throws IOException {
 int doc = values.nextDoc();
 if (doc != NO_MORE_DOCS) {
   docValueCount = 0;
-  for (long ord = values.nextOrd();
-  ord != SortedSetDocValues.NO_MORE_ORDS;
-  ord = values.nextOrd()) {
+  for (int j = 0; j < values.docValueCount(); j++) {
 ords = ArrayUtil.grow(ords, docValueCount + 1);

Review Comment:
   Addressed in 
https://github.com/apache/lucene/pull/967/commits/0c6abf3ebd3b734cddabd26e35fbaa9d64089dff
 . 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie opened a new pull request, #982: Fix typos and minor refactoring to FacetConfig

2022-06-26 Thread GitBox


shaie opened a new pull request, #982:
URL: https://github.com/apache/lucene/pull/982

   ### Description (or a Jira issue link if you have one)
   
   Some typos fixes + small refactoring to simplify `FacetConfig` code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request, #983: Some refactoring/cleanup of AbstractSortedSetDocValueFacetCounts

2022-06-26 Thread GitBox


gsmiller opened a new pull request, #983:
URL: https://github.com/apache/lucene/pull/983

   A little refactoring/cleanup of common functionality in 
`AbstractSortedSetDocValueFacetCounts`. No functional change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request, #984: Switch Float/IntTaxonomyFacets to primitive list data structures in getAllChildren

2022-06-26 Thread GitBox


gsmiller opened a new pull request, #984:
URL: https://github.com/apache/lucene/pull/984

   Let's avoid creating some garbage and unnecessary boxing/unboxing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-06-26 Thread GitBox


zacharymorn commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1166714692

   Hi @jpountz, I've taken some ideas from your bulk scorer implementation and 
was able to simplify my code as well as to boost the performance when under 
default `SEARCH_NUM_THREADS` 
[here](https://github.com/apache/lucene/pull/972/commits/cb8ab7485a405e9517049822eef36ae590f2f65b).
 The benchmark results look similar now albeit a bit varying :
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   BrowseDateSSDVFacets4.38 (35.8%)4.01 
(30.3%)   -8.3% ( -54% -   90%) 0.431
Prefix3  811.56  (5.6%)  782.08  
(7.8%)   -3.6% ( -16% -   10%) 0.091
 OrHighMedDayTaxoFacets   11.42  (5.6%)   11.11  
(8.0%)   -2.7% ( -15% -   11%) 0.223
 IntNRQ  297.19  (1.5%)  291.62  
(5.0%)   -1.9% (  -8% -4%) 0.107
   Wildcard  269.43  (5.0%)  264.57  
(6.4%)   -1.8% ( -12% -   10%) 0.319
BrowseRandomLabelSSDVFacets   20.22  (8.8%)   19.86  
(8.4%)   -1.8% ( -17% -   16%) 0.518
   HighTermTitleBDVSort  236.73  (8.6%)  232.93  
(8.6%)   -1.6% ( -17% -   17%) 0.555
   AndHighHighDayTaxoFacets   12.67  (2.9%)   12.48  
(4.4%)   -1.5% (  -8% -5%) 0.186
  BrowseMonthTaxoFacets   32.18 (36.3%)   31.72 
(38.8%)   -1.4% ( -56% -  115%) 0.904
  LowPhrase 1725.41  (3.3%) 1702.14  
(5.3%)   -1.3% (  -9% -7%) 0.334
MedSloppyPhrase  111.58  (3.2%)  110.16  
(3.8%)   -1.3% (  -8% -5%) 0.250
 HighPhrase  930.18  (2.5%)  919.75  
(3.4%)   -1.1% (  -6% -4%) 0.234
   MedTermDayTaxoFacets   46.10  (3.9%)   45.68  
(4.8%)   -0.9% (  -9% -8%) 0.514
 TermDTSort  341.03  (7.2%)  338.23  
(8.5%)   -0.8% ( -15% -   15%) 0.740
AndHighMedDayTaxoFacets   39.88  (1.9%)   39.57  
(3.1%)   -0.8% (  -5% -4%) 0.349
  HighTermDayOfYearSort  148.85  (7.6%)  147.86  
(8.3%)   -0.7% ( -15% -   16%) 0.792
  HighTermMonthSort  218.46  (8.6%)  217.06  
(9.2%)   -0.6% ( -16% -   18%) 0.819
   OrNotHighLow 2696.50  (5.4%) 2681.95  
(5.0%)   -0.5% ( -10% -   10%) 0.743
LowSloppyPhrase   22.79  (2.0%)   22.69  
(2.9%)   -0.4% (  -5% -4%) 0.585
 Fuzzy2  125.08  (2.7%)  124.54  
(4.3%)   -0.4% (  -7% -6%) 0.708
   HighSloppyPhrase   21.02  (2.3%)   20.94  
(3.0%)   -0.4% (  -5% -5%) 0.629
   OrHighNotMed 1805.04  (4.7%) 1797.98  
(5.8%)   -0.4% ( -10% -   10%) 0.816
  BrowseMonthSSDVFacets   29.37 (14.0%)   29.26 
(13.4%)   -0.4% ( -24% -   31%) 0.933
  MedPhrase  205.52  (1.7%)  204.78  
(3.0%)   -0.4% (  -4% -4%) 0.643
 Fuzzy1  128.47  (2.8%)  128.05  
(4.2%)   -0.3% (  -7% -6%) 0.772
 AndHighLow 2126.24  (5.3%) 2124.42  
(5.6%)   -0.1% ( -10% -   11%) 0.960
Respell   83.33  (3.2%)   83.33  
(4.2%)0.0% (  -7% -7%) 0.998
  OrHighNotHigh 1415.44  (4.4%) 1419.78  
(4.5%)0.3% (  -8% -9%) 0.827
   OrHighNotLow 1655.08  (4.4%) 1663.51  
(4.7%)0.5% (  -8% -   10%) 0.725
  OrNotHighHigh 1035.89  (3.1%) 1042.85  
(4.6%)0.7% (  -6% -8%) 0.587
   PKLookup  283.77  (5.1%)  285.92  
(4.5%)0.8% (  -8% -   10%) 0.616
LowTerm 3616.62  (4.1%) 3655.48  
(5.3%)1.1% (  -8% -   10%) 0.476
   HighSpanNear   15.54  (2.2%)   15.71  
(3.5%)1.1% (  -4% -7%) 0.241
MedTerm 2615.07  (4.0%) 2645.27  
(4.0%)1.2% (  -6% -9%) 0.364
   OrNotHighMed 1759.45  (4.2%) 1779.94  
(4.6%)1.2% (  -7% -   10%) 0.406
LowSpanNear   66.06  (2.9%)   66.83  
(4.3%)1.2% (  -5% -8%) 0.316
  BrowseDayOfYearSSDVFacets   26.94 (10.7%)   27.30  
(9.7%)1.3% ( -17% -   24%) 0.684
MedIntervalsOrdered   86.40  (5.1%)   87.58  
(4.8%)1.4% (  -8% -   11%) 0.387
 AndHigh

[GitHub] [lucene] shaie merged pull request #982: Fix typos and minor refactoring to FacetConfig

2022-06-26 Thread GitBox


shaie merged PR #982:
URL: https://github.com/apache/lucene/pull/982


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie opened a new pull request, #985: Fix typos and minor refactoring to FacetConfig (#982)

2022-06-26 Thread GitBox


shaie opened a new pull request, #985:
URL: https://github.com/apache/lucene/pull/985

   ### Description
   
   Backport `9338909373a` to branch_9x
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie merged pull request #985: Fix typos and minor refactoring to FacetConfig (#982)

2022-06-26 Thread GitBox


shaie merged PR #985:
URL: https://github.com/apache/lucene/pull/985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection

2022-06-26 Thread GitBox


jtibshirani commented on PR #951:
URL: https://github.com/apache/lucene/pull/951#issuecomment-1166948233

   For context, I also reran benchmarks and didn't see any slowdown to the 
typical case (not backed by a BitSet).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection

2022-06-26 Thread GitBox


jtibshirani merged PR #951:
URL: https://github.com/apache/lucene/pull/951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries

2022-06-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559019#comment-17559019
 ] 

ASF subversion and git services commented on LUCENE-10606:
--

Commit 03846b468e52126582c09816f7e85e98aee9a405 in lucene's branch 
refs/heads/main from Kaival Parikh
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=03846b468e5 ]

LUCENE-10606: For KnnVectorQuery, optimize case where filter is backed by 
BitSetIterator (#951)

Instead of collecting hit-by-hit using a `LeafCollector`, we break down the
search by instantiating a weight, creating scorers, and checking the underlying
iterator. If it is backed by a `BitSet`, we directly update the reference (as
we won't be editing the `Bits`). Else we can create a new `BitSet` from the
iterator using `BitSet.of`.

> Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed 
> queries
> 
>
> Key: LUCENE-10606
> URL: https://issues.apache.org/jira/browse/LUCENE-10606
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Kaival Parikh
>Priority: Minor
>  Labels: performance
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> While working on this [PR|https://github.com/apache/lucene/pull/932] to add 
> prefilter testing support, we saw that hit collection took a long time for 
> BitSetIterator backed scorers (due to iteration over the entire underlying 
> BitSet, and copying it into an internal one)
> These BitSetIterators can be frequent (as they are used in LRUQueryCache), 
> and bulk collection can be optimized with more knowledge of the underlying 
> iterator



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org