[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577970#comment-17577970 ]
Michael Sokolov commented on LUCENE-10471: ------------------------------------------ > Maybe I do not understand the code base of Lucene well enough, but wouldn't > it be possible to have a default limit of 1024 or 2028 and that one can set a > different limit programmable on the IndexWriter/Reader/Searcher? I think the idea is to protect ourselves from accidental booboos; this could eventually get exposed in some shared configuration file, and then if somebody passes MAX_INT it could lead to allocating huge buffers somewhere and taking down a service shared by many people/groups? Hypothetical, but it's basically following the principle that we should be strict to help stop people shooting themselves and others in the feet. We may also want to preserve our ability to introduce optimizations that rely on some limits to the size, which would become difficult if usage of larger sizes became entrenched. (We can't so easily take it back once it's out there). Having said that I still feel a 16K limit, while allowing for models that are beyond reasonable, wouldn't cause any of these sort of issues, so that's the number I'm advocating. > Increase the number of dims for KNN vectors to 2048 > --------------------------------------------------- > > Key: LUCENE-10471 > URL: https://issues.apache.org/jira/browse/LUCENE-10471 > Project: Lucene - Core > Issue Type: Wish > Reporter: Mayya Sharipova > Priority: Trivial > Time Spent: 40m > Remaining Estimate: 0h > > The current maximum allowed number of dimensions is equal to 1024. But we see > in practice a couple well-known models that produce vectors with > 1024 > dimensions (e.g > [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1] > uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing > max dims to `2048` will satisfy these use cases. > I am wondering if anybody has strong objections against this. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org