msokolov commented on issue #13565:
URL: https://github.com/apache/lucene/issues/13565#issuecomment-2293683592

    I made this tool; while testing it I ran into some unexpected wrinkles 
relating to our vector format.  I created a new index from an existing one, 
with a new docid order by:
   
                     writer.addIndexes(SortingCodecReader.wrap((CodecReader) 
ctx.reader(), docMap, null));
   
   But this recreates the HNSW graph doing unneccessary work, when all we 
really want to do is to *renumber* it. And the new graph it creates ignores the 
parameters that were used to create the original graph, substituting defaults, 
such as M=16.  It makes me wonder if we ought to write M to the index as 
per-field metadata so it can be reused by tools such as this that may not have 
access to a schema, or in general when merging the field in the future.
   
   I guess for my preliminary tests I can simply hack the 
`DEFAULT_MAX_CONNECTIONS` to fit my test data, but I'd like to hear folks' 
opinions on how we should address this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to