jimczi commented on PR #12169: URL: https://github.com/apache/lucene/pull/12169#issuecomment-1521454510
> Adding a clone() / copy() method to HnswGraph seems like a fairly unobtrusive way to support this. Javadoc definitely needs to be clearer. I looked at this and I am a bit reluctant to add this method since the `OnHeapHnswGraph` is mutable. That would be confusing and could lead to the same issue we have in this PR. I'd say that in this current form the `HnswGraph` is an internal class that shouldn't be used outside of the indexing primitives. I am also curious to hear what's the expectation for this filter. I checked some resources created with word2vec or GloVe and they output a large vocabulary size. With a single thread, building a graph of 1M vectors with hundreds of dimensions would take several minutes (16 minutes at 1000 docs/s). That seems overkill for an application to rebuild this graph on every restart. Could we instead build the resource offline (the hnsw graph) and just load it when the filter starts? > is it worth reverting or we can simply fix it/commit a workaround as soon as we find it in the next few days? I don't have a strong feeling here but I am not sure we should add this filter in the common analysis module. As it stands it feels more experimental so the sandbox might be more appropriate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org