[GitHub] [lucene] stefanvodita opened a new pull request, #11780: GH#11601: Add ability to compute reader states after refresh
stefanvodita opened a new pull request, #11780: URL: https://github.com/apache/lucene/pull/11780 This PR is only meant as a starting point for a conversation and not to be merged as is. It is a proposal to let users retrieve new `SortedSedDocValuesReaderState` objects after doing a refresh in a single method call to the `ReferenceManager`. The new reader states will have updated ordinal maps. Problems this solves: 1. Getting reader states that correspond to the current `IndexReader`. We ensure this by using the `refreshLock`. 2. Avoid a circular dependency from the core module on the facets module by using the `StateCalculator` functional interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request, #11781: Diversity check bugfix
msokolov opened a new pull request, #11781: URL: https://github.com/apache/lucene/pull/11781 I was looking into the changes in recall we have been observing in various test cases. Thanks @jtibshirani for pointing out we should not see any change at all! I found a couple of related bugs in the diversity-checking code that I had introduced when refactoring as part of splitting into float[]- and byte[]- oriented versions. I found some gaps in our test coverage and added tests. One of these demonstrates the bug; the other is just filling in another case. With this fix I was able to measure the identical recall comparing against test runs with 9.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new issue, #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
msokolov opened a new issue, #11782: URL: https://github.com/apache/lucene/issues/11782 ### Description we observed changes in recall that can be traced to these diversity checks done while indexing. ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
msokolov commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1250373809 https://github.com/apache/lucene/pull/11781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #591: LUCENE-10365 Wizard changes contributed from Solr
msokolov commented on PR #591: URL: https://github.com/apache/lucene/pull/591#issuecomment-1250375265 ah thanks Jan! Since I went through all the GPG setup I feel I should use it! But ... probably will backport -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new issue, #11783: Make NeighborArray fixed size
msokolov opened a new issue, #11783: URL: https://github.com/apache/lucene/issues/11783 ### Description Today we allow NeighborArray to grow dynamically, but we always allocate at its full size, and then grow it by 1 because when it is full-sized, we store a new neighbor at index *size* and then remove one of the neighbors based on a diversity criterion. Instead, we might want to figure out how to do more efficient surgery on the array. But in the meantime, we could oversize the arrays by 1 from the start and then never change their size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request, #11784: NeighborArray is now fixed size
msokolov opened a new pull request, #11784: URL: https://github.com/apache/lucene/pull/11784 Remove the ability for `NeighborArray` to grow. Always allocates it one larger than requested, to reserve space for a temporary neighbor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11784: NeighborArray is now fixed size
msokolov commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1250379348 Fixes https://github.com/apache/lucene/issues/11783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11769: TestKnnVectorQuery.testScoreEuclidean fails
msokolov commented on issue #11769: URL: https://github.com/apache/lucene/issues/11769#issuecomment-1250380506 I believe https://github.com/apache/lucene/commit/e69c48b8d941f44bb73f0594b5f72947efe80948 should fix. I guess LuceneTestCase.newIndexWriterConfig can randomly commit, or result in unusual configs anyway. I don't think we want to forceMerge in this case because some of the test users of that method rely on document order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11781: Diversity check bugfix
msokolov commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r973782739 ## lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java: ## @@ -555,6 +556,78 @@ public void testDiversity() throws IOException { assertLevel0Neighbors(builder.hnsw, 5, 1, 4); } + public void testDiversityFallback() throws IOException { +vectorEncoding = randomVectorEncoding(); +similarityFunction = VectorSimilarityFunction.EUCLIDEAN; +// Some test cases can't be exercised in two dimensions; +// in particular if a new neighbor displaces an existing neighbor +// by being closer to the target, yet none of the existing neighbors is closer to the new vector +// than to the target -- ie they all remain diverse, so we simply drop the farthest one. +float[][] values = { + {0, 0, 0}, + {0, 1, 0}, + {0, 0, 2}, + {1, 0, 0}, + {0, 0.4f, 0} Review Comment: hm, I guess this works for bytes too, but we should probably multiply everything here to make it non-fractional -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org