[GitHub] [lucene] rmuir commented on pull request #12191: Increase KnnByteVectorField limit on dimensions to 2048
rmuir commented on PR #12191: URL: https://github.com/apache/lucene/pull/12191#issuecomment-1484133375 > As for performance issues, this is why I am only suggesting the increase for byte encoded vectors as their size & performance improvements are just as reasonable at 2048 as float is at 1024. reasonable? I'm not sure this word can be applied here. How long does it take for me to index 50 million documents? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12191: Increase KnnByteVectorField limit on dimensions to 2048
rmuir commented on PR #12191: URL: https://github.com/apache/lucene/pull/12191#issuecomment-1484135459 > @rmuir Ah, I thought your main concern was performance. I have multiple concerns: * HNSW doesn't scale at all (time, memory space) and there seems to be no plan to look into alternative * HNSW especially horribly slow with higher dimensions * i fear we are slowly locking lucene permanently into this horrible HNSW and it may already be too late, can a codec implementing another algorithm even be added at this point? I'm -1 to increasing this dimension count. i don't care how many PRs get opened to do it. I will be the bad guy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on a diff in pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI
zacharymorn commented on code in PR #12194: URL: https://github.com/apache/lucene/pull/12194#discussion_r1148629792 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -82,6 +82,11 @@ public int advance(int target) throws IOException { return doc; } + @Override Review Comment: Good catch. I have overridden `range` and also `empty` now. ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java: ## @@ -479,6 +481,36 @@ private void refillDocs() throws IOException { assert docBuffer[BLOCK_SIZE] == NO_MORE_DOCS; } +@Override +public int peekNextNonMatchingDocID() throws IOException { Review Comment: Yup indeed! I was planning to do that after enhancing assertions / `CheckIndex` for existing changes. I have added that to all 5 of them now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org