[GitHub] [lucene] rmuir commented on pull request #12191: Increase KnnByteVectorField limit on dimensions to 2048

2023-03-26 Thread via GitHub


rmuir commented on PR #12191:
URL: https://github.com/apache/lucene/pull/12191#issuecomment-1484133375

   > As for performance issues, this is why I am only suggesting the increase 
for byte encoded vectors as their size & performance improvements are just as 
reasonable at 2048 as float is at 1024.
   
   reasonable? I'm not sure this word can be applied here. How long does it 
take for me to index 50 million documents?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12191: Increase KnnByteVectorField limit on dimensions to 2048

2023-03-26 Thread via GitHub


rmuir commented on PR #12191:
URL: https://github.com/apache/lucene/pull/12191#issuecomment-1484135459

   > @rmuir Ah, I thought your main concern was performance.
   
   I have multiple concerns:
   * HNSW doesn't scale at all (time, memory space) and there seems to be no 
plan to look into alternative
   * HNSW especially horribly slow with higher dimensions
   * i fear we are slowly locking lucene permanently into this horrible HNSW 
and it may already be too late, can a codec implementing another algorithm even 
be added at this point?
   
   I'm -1 to increasing this dimension count. i don't care how many PRs get 
opened to do it. I will be the bad guy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on a diff in pull request #12194: [GITHUB-11915] [Discussion Only] Make Lucene smarter about long runs of matches via new API on DISI

2023-03-26 Thread via GitHub


zacharymorn commented on code in PR #12194:
URL: https://github.com/apache/lucene/pull/12194#discussion_r1148629792


##
lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java:
##
@@ -82,6 +82,11 @@ public int advance(int target) throws IOException {
 return doc;
   }
 
+  @Override

Review Comment:
   Good catch. I have overridden `range` and also `empty` now.



##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java:
##
@@ -479,6 +481,36 @@ private void refillDocs() throws IOException {
   assert docBuffer[BLOCK_SIZE] == NO_MORE_DOCS;
 }
 
+@Override
+public int peekNextNonMatchingDocID() throws IOException {

Review Comment:
   Yup indeed! I was planning to do that after enhancing assertions / 
`CheckIndex` for existing changes. I have added that to all 5 of them now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org