msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2293683592
I made this tool; while testing it I ran into some unexpected wrinkles relating to our vector format. I created a new index from an existing one, with a new docid order by: writer.addIndexes(SortingCodecReader.wrap((CodecReader) ctx.reader(), docMap, null)); But this recreates the HNSW graph doing unneccessary work, when all we really want to do is to *renumber* it. And the new graph it creates ignores the parameters that were used to create the original graph, substituting defaults, such as M=16. It makes me wonder if we ought to write M to the index as per-field metadata so it can be reused by tools such as this that may not have access to a schema, or in general when merging the field in the future. I guess for my preliminary tests I can simply hack the `DEFAULT_MAX_CONNECTIONS` to fit my test data, but I'd like to hear folks' opinions on how we should address this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org