[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub
zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1720704189 Actually I just tried it myself and this will always reproduce the error: ``` actual.seekExact(0); actual.seekCeil(new BytesRef("")); for (int i = 0; i <

[GitHub] [lucene] zhaih commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub
zhaih commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1326538550 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws IOE

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub
jimczi commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720078533 Adding some charts together to compare how effective it is to use a dynamic efSearch. The first chart shows how well different efSearch values work on one segment, on multiple segm

[GitHub] [lucene] benwtrent commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub
benwtrent commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720048714 @jimczi I like this idea at first glance, but I have one major concern. What about data that is indexed according to a specific order? Two tests to verify how this behaves would

[GitHub] [lucene] epotyom commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub
epotyom commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1719935323 Extended existing nightly random tests to catch the issue most of the time. Would that be enough or do we need a test that catches it every single time? -- This is an automated message

[GitHub] [lucene] Tony-X commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub
Tony-X commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719878383 @mikemccand hey Mike, I did not make a new Codec for this. IIRC, `FSTPostingsFormat` will be exercised by the RandomCodec. Also there is `TestFSTPostingsFormat extends BasePostingsFormatT

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub
jpountz commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763923 Since it's fairly unintrusive to other functionality, I felt free to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub
jpountz commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763914 Since it's fairly unintrusive to other functionality, I felt free to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] jpountz merged pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub
jpountz merged PR #12489: URL: https://github.com/apache/lucene/pull/12489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub
jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw IOException URL: https://github.com/apache/lucene/issues/12492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [lucene] jpountz merged pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub
jpountz merged PR #12554: URL: https://github.com/apache/lucene/pull/12554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] epotyom opened a new pull request, #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub
epotyom opened a new pull request, #12555: URL: https://github.com/apache/lucene/pull/12555 Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167) TermsDict `ord` and `bytes` can be out of sync after a call to seekCeil which caused test fai

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub
jimczi commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1719529457 I made some adjustments to the formula to consider the logarithmic complexity of the greedy search. I conducted tests on two datasets: 1. The standard SIFT dataset, which has 128 d

[GitHub] [lucene] jpountz commented on pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub
jpountz commented on PR #12554: URL: https://github.com/apache/lucene/pull/12554#issuecomment-1719334101 Looks great, can you add a CHANGES entry under "Lucene 9.8.0" / "API Changes"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] mikemccand commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub
mikemccand commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719297920 @Tony-X have you tried passing all Lucene unit tests using this Codec? I think you can add `-Dtests.codec=...` to force all tests to use it. -- This is an automated message from th

[GitHub] [lucene] mikemccand commented on a diff in pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub
mikemccand commented on code in PR #12552: URL: https://github.com/apache/lucene/pull/12552#discussion_r1325827523 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java: ## @@ -191,7 +193,9 @@ final class TermsReader extends Terms { this.sumTotalTe

[GitHub] [lucene] gokaai opened a new pull request, #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub
gokaai opened a new pull request, #12554: URL: https://github.com/apache/lucene/pull/12554 ### Description Allows `org.apache.lucene.search.FilteredDocIdSetIterator#match(doc)` to throw an IOException so that users don't have to explicitly catch it Closes #12492 -- This is

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-09-14 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1718893926 FYI there was an interesting observation on another benchmark that took advantage of recursive graph bisection: https://jpountz.github.io/lucene-9.7-vs-9.8/. One query (`the incredibles`