date:20230914

[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub

zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1720704189 Actually I just tried it myself and this will always reproduce the error: ``` actual.seekExact(0); actual.seekCeil(new BytesRef("")); for (int i = 0; i <

[GitHub] [lucene] zhaih commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub

zhaih commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1326538550 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws IOE

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub

jimczi commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720078533 Adding some charts together to compare how effective it is to use a dynamic efSearch. The first chart shows how well different efSearch values work on one segment, on multiple segm

[GitHub] [lucene] benwtrent commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub

benwtrent commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720048714 @jimczi I like this idea at first glance, but I have one major concern. What about data that is indexed according to a specific order? Two tests to verify how this behaves would

[GitHub] [lucene] epotyom commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub

epotyom commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1719935323 Extended existing nightly random tests to catch the issue most of the time. Would that be enough or do we need a test that catches it every single time? -- This is an automated message

[GitHub] [lucene] Tony-X commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub

Tony-X commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719878383 @mikemccand hey Mike, I did not make a new Codec for this. IIRC, `FSTPostingsFormat` will be exercised by the RandomCodec. Also there is `TestFSTPostingsFormat extends BasePostingsFormatT

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub

jpountz commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763923 Since it's fairly unintrusive to other functionality, I felt free to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub

jpountz commented on PR #12489: URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763914 Since it's fairly unintrusive to other functionality, I felt free to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] jpountz merged pull request #12489: Add support for recursive graph bisection.

2023-09-14 Thread via GitHub

jpountz merged PR #12489: URL: https://github.com/apache/lucene/pull/12489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub

jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw IOException URL: https://github.com/apache/lucene/issues/12492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [lucene] jpountz merged pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub

jpountz merged PR #12554: URL: https://github.com/apache/lucene/pull/12554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] epotyom opened a new pull request, #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub

epotyom opened a new pull request, #12555: URL: https://github.com/apache/lucene/pull/12555 Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167) TermsDict `ord` and `bytes` can be out of sync after a call to seekCeil which caused test fai

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

2023-09-14 Thread via GitHub

jimczi commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-1719529457 I made some adjustments to the formula to consider the logarithmic complexity of the greedy search. I conducted tests on two datasets: 1. The standard SIFT dataset, which has 128 d

[GitHub] [lucene] jpountz commented on pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub

jpountz commented on PR #12554: URL: https://github.com/apache/lucene/pull/12554#issuecomment-1719334101 Looks great, can you add a CHANGES entry under "Lucene 9.8.0" / "API Changes"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] mikemccand commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub

mikemccand commented on PR #12552: URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719297920 @Tony-X have you tried passing all Lucene unit tests using this Codec? I think you can add `-Dtests.codec=...` to force all tests to use it. -- This is an automated message from th

[GitHub] [lucene] mikemccand commented on a diff in pull request #12552: Make FSTPostingsFormat load FSTs off-heap

2023-09-14 Thread via GitHub

mikemccand commented on code in PR #12552: URL: https://github.com/apache/lucene/pull/12552#discussion_r1325827523 ## lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java: ## @@ -191,7 +193,9 @@ final class TermsReader extends Terms { this.sumTotalTe

[GitHub] [lucene] gokaai opened a new pull request, #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

2023-09-14 Thread via GitHub

gokaai opened a new pull request, #12554: URL: https://github.com/apache/lucene/pull/12554 ### Description Allows `org.apache.lucene.search.FilteredDocIdSetIterator#match(doc)` to throw an IOException so that users don't have to explicitly catch it Closes #12492 -- This is

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-09-14 Thread via GitHub

jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1718893926 FYI there was an interesting observation on another benchmark that took advantage of recursive graph bisection: https://jpountz.github.io/lucene-9.7-vs-9.8/. One query (`the incredibles`

[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

[GitHub] [lucene] zhaih commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

[GitHub] [lucene] benwtrent commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

[GitHub] [lucene] epotyom commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

[GitHub] [lucene] Tony-X commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

[GitHub] [lucene] jpountz merged pull request #12489: Add support for recursive graph bisection.

[GitHub] [lucene] jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

[GitHub] [lucene] jpountz merged pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

[GitHub] [lucene] epotyom opened a new pull request, #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

[GitHub] [lucene] jimczi commented on pull request #12551: Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery

[GitHub] [lucene] jpountz commented on pull request #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

[GitHub] [lucene] mikemccand commented on pull request #12552: Make FSTPostingsFormat load FSTs off-heap

[GitHub] [lucene] mikemccand commented on a diff in pull request #12552: Make FSTPostingsFormat load FSTs off-heap

[GitHub] [lucene] gokaai opened a new pull request, #12554: Allow FilteredDocIdSetIterator.match(doc) to throw IOException

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

18 matches

Site Navigation

Mail list logo

Footer information