[GitHub] [lucene] jpountz merged pull request #11726: Prevent term vectors from exceeding the maximum dictionary size.

2022-09-08 Thread GitBox
jpountz merged PR #11726: URL: https://github.com/apache/lucene/pull/11726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-09-08 Thread GitBox
jpountz commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1240622777 The historical objection against multi-value binary support is that it could be easily implemented on top of binary doc values. So multi-value binary support would add API surface a

[GitHub] [lucene] rmuir commented on issue #11702: Multi-Value Support for Binary DocValues [LUCENE-10666]

2022-09-08 Thread GitBox
rmuir commented on issue #11702: URL: https://github.com/apache/lucene/issues/11702#issuecomment-1240646341 The use-case here is also not great, talking about a doc having multiple locations. Its a pet peeve of mine, I don't think we should add a new major docvalues type for such crap :)

[GitHub] [lucene] msokolov commented on a diff in pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-08 Thread GitBox
msokolov commented on code in PR #11756: URL: https://github.com/apache/lucene/pull/11756#discussion_r965934198 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -175,9 +176,42 @@ private TopDocs approximateSearch(LeafReaderContext context, Bits accept

[GitHub] [lucene] msokolov commented on a diff in pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-08 Thread GitBox
msokolov commented on code in PR #11756: URL: https://github.com/apache/lucene/pull/11756#discussion_r965936802 ## lucene/core/src/java/org/apache/lucene/search/VectorScorer.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] jpountz commented on pull request #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost

2022-09-08 Thread GitBox
jpountz commented on PR #11741: URL: https://github.com/apache/lucene/pull/11741#issuecomment-1240710691 I guess that the main downside of this approach is that the terms lookups are the bottleneck of a `TermInSetQuery` when the included terms have low docFreqs. So moving the cost to `score

[GitHub] [lucene] jpountz commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
jpountz commented on PR #11738: URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240714080 It might not be a big win in practice, but it should be enough to compare the `docFreq` with the `docCount` (rather than `maxDoc`) and use this postings whose `docFreq` is equal to `docC

[GitHub] [lucene] jpountz commented on pull request #11731: 11730 removed unused fst load mode

2022-09-08 Thread GitBox
jpountz commented on PR #11731: URL: https://github.com/apache/lucene/pull/11731#issuecomment-1240744318 Changing the default for the completion postings format might be controversial. Maybe we should close this PR since we're still using the ON_HEAP mode and have a separate discussion abou

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966070400 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext c

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
rmuir commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966085670 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext cont

[GitHub] [lucene] gsmiller commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on PR #11738: URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240856191 @jpountz: > It might not be a big win in practice, but it should be enough to compare the docFreq with the docCount (rather than maxDoc) and use this postings whose docFreq is eq

[GitHub] [lucene] rmuir commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
rmuir commented on PR #11738: URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240897004 @gsmiller I think the question is, is it worth adding all those extra conditionals? I don't think the `DocIdSet#all` will really be that much faster in practice (I'm not even sure how ofte

[GitHub] [lucene] jpountz commented on pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-08 Thread GitBox
jpountz commented on PR #1068: URL: https://github.com/apache/lucene/pull/1068#issuecomment-1240949394 The fact that it should be legal for `ConjunctionDISI` to return `NO_MORE_DOCS` when the lead iterator advances to `NO_MORE_DOCS` without advancing other iterators makes me wonder if this

[GitHub] [lucene] jpountz commented on a diff in pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-08 Thread GitBox
jpountz commented on code in PR #1068: URL: https://github.com/apache/lucene/pull/1068#discussion_r966192242 ## lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java: ## @@ -281,6 +281,12 @@ private int doNext(int doc) throws IOException { advanceLead:

[GitHub] [lucene] gsmiller commented on pull request #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost

2022-09-08 Thread GitBox
gsmiller commented on PR #11741: URL: https://github.com/apache/lucene/pull/11741#issuecomment-1241017654 @jpountz thanks for the feedback! If we assume a scenario where we have a `TermInSetQuery` over very selective terms (low docFreqs for each), we'd want to use the index query unless the

[GitHub] [lucene] rmuir commented on pull request #11757: Fix TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull to handle IllegalStateException from startCommit()

2022-09-08 Thread GitBox
rmuir commented on PR #11757: URL: https://github.com/apache/lucene/pull/11757#issuecomment-1241023176 Thank you for reviewing @vigyasharma. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [lucene] rmuir closed issue #11755: TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure

2022-09-08 Thread GitBox
rmuir closed issue #11755: TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure URL: https://github.com/apache/lucene/issues/11755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] rmuir merged pull request #11757: Fix TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull to handle IllegalStateException from startCommit()

2022-09-08 Thread GitBox
rmuir merged PR #11757: URL: https://github.com/apache/lucene/pull/11757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] navneet1v commented on pull request #11753: Added interface to relate a LatLonShape with another shape represented as Component2D

2022-09-08 Thread GitBox
navneet1v commented on PR #11753: URL: https://github.com/apache/lucene/pull/11753#issuecomment-1241043924 The function is publically exposed but the class LatLonShape cannot be created publically. Hence we need a way to create it. Please let me know what could be right way to do it.

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-08 Thread GitBox
mayya-sharipova commented on code in PR #11743: URL: https://github.com/apache/lucene/pull/11743#discussion_r966264433 ## lucene/core/src/test/org/apache/lucene/util/TestRamUsageEstimator.java: ## @@ -222,6 +229,28 @@ public void testPrintValues() { System.out.println("LONG

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-08 Thread GitBox
mayya-sharipova commented on code in PR #11743: URL: https://github.com/apache/lucene/pull/11743#discussion_r966265177 ## lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java: ## @@ -74,12 +74,8 @@ public void setup() { similarityFunction = VectorSim

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-08 Thread GitBox
mayya-sharipova commented on code in PR #11743: URL: https://github.com/apache/lucene/pull/11743#discussion_r966265508 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -46,7 +46,7 @@ public NeighborArray(int maxSize, boolean descOrder) { * nodes.

[GitHub] [lucene] gsmiller commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on PR #11738: URL: https://github.com/apache/lucene/pull/11738#issuecomment-1241061002 @rmuir that's a fair point. I'll put up another iteration shortly that tries to address this feedback. Hopefully it will converge on something that makes sense to everyone :) -- This

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-08 Thread GitBox
jmazanec15 commented on code in PR #1068: URL: https://github.com/apache/lucene/pull/1068#discussion_r966298807 ## lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java: ## @@ -281,6 +281,12 @@ private int doNext(int doc) throws IOException { advanceLead:

[GitHub] [lucene] jmazanec15 commented on pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-08 Thread GitBox
jmazanec15 commented on PR #1068: URL: https://github.com/apache/lucene/pull/1068#issuecomment-1241094561 > The fact that it should be legal for ConjunctionDISI to return NO_MORE_DOCS when the lead iterator advances to NO_MORE_DOCS without advancing other iterators makes me wonder if this c

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-08 Thread GitBox
jtibshirani commented on code in PR #11756: URL: https://github.com/apache/lucene/pull/11756#discussion_r966310331 ## lucene/core/src/java/org/apache/lucene/search/VectorScorer.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] jtibshirani commented on pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-08 Thread GitBox
jtibshirani commented on PR #11756: URL: https://github.com/apache/lucene/pull/11756#issuecomment-1241095296 Thanks for the reviews, I'll merge and backport to 9.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
rmuir commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966326978 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -179,8 +189,29 @@ private WeightOrDocIdSet rewrite(LeafReaderContext cont

[GitHub] [lucene] jtibshirani merged pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-08 Thread GitBox
jtibshirani merged PR #11756: URL: https://github.com/apache/lucene/pull/11756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966335478 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -179,8 +189,29 @@ private WeightOrDocIdSet rewrite(LeafReaderContext c

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966336768 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext c

[GitHub] [lucene] gsmiller merged pull request #1035: LUCENE-10652: Add a top-n range faceting example to RangeFacetsExample

2022-09-08 Thread GitBox
gsmiller merged PR #1035: URL: https://github.com/apache/lucene/pull/1035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] gsmiller commented on pull request #1035: LUCENE-10652: Add a top-n range faceting example to RangeFacetsExample

2022-09-08 Thread GitBox
gsmiller commented on PR #1035: URL: https://github.com/apache/lucene/pull/1035#issuecomment-1241128652 LGTM. Thanks @Yuti-G! Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
rmuir commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966338760 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -179,8 +189,29 @@ private WeightOrDocIdSet rewrite(LeafReaderContext cont

[GitHub] [lucene] Yuti-G commented on pull request #1035: LUCENE-10652: Add a top-n range faceting example to RangeFacetsExample

2022-09-08 Thread GitBox
Yuti-G commented on PR #1035: URL: https://github.com/apache/lucene/pull/1035#issuecomment-1241182558 Thank you so much @gsmiller! On Thu, Sep 8, 2022 at 12:20 PM Greg Miller ***@***.***> wrote: > LGTM. Thanks @Yuti-G ! Merged. > > — > Repl

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-08 Thread GitBox
gsmiller commented on code in PR #11738: URL: https://github.com/apache/lucene/pull/11738#discussion_r966403309 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -179,8 +189,29 @@ private WeightOrDocIdSet rewrite(LeafReaderContext c

[GitHub] [lucene] mayya-sharipova merged pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-08 Thread GitBox
mayya-sharipova merged PR #11743: URL: https://github.com/apache/lucene/pull/11743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

[GitHub] [lucene] jtibshirani opened a new issue, #11758: Follow-up refactors to 8-bit quantization change

2022-09-08 Thread GitBox
jtibshirani opened a new issue, #11758: URL: https://github.com/apache/lucene/issues/11758 ### Description This issue tracks ideas for refactors as a follow-up to #11613, where we added support for 8-bit vector values: * We extended `KnnVectorsWriter` to be generic, accepting both

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-08 Thread GitBox
jmazanec15 commented on code in PR #1068: URL: https://github.com/apache/lucene/pull/1068#discussion_r966532126 ## lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java: ## @@ -281,6 +281,12 @@ private int doNext(int doc) throws IOException { advanceLead: