[ https://issues.apache.org/jira/browse/LUCENE-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534018#comment-17534018 ]
Julie Tibshirani commented on LUCENE-10471: ------------------------------------------- I also don't have an objection to increasing it a bit. But along the same lines as Robert's point, it'd be good to think about our decision making process -- otherwise we'd be tempted to continuously increase it. I've already heard users requesting 12288 dims (to handle OpenAI DaVinci embeddings). Two possible approaches I could see: 1. We do more research on the literature and decide on a reasonable max dimension. If a user wants to go beyond that, they should reconsider the model or perform dimensionality reduction. This would encourage users to think through their embedding strategy to optimize for performance. The improvements can be significant, since search time scales with vector dimensionality. 2. Or we take a flexible approach where we bump the limit to a high upper bound. This upper bound would be based on how much memory usage is reasonable for one vector (similar to the max term size?) I feel a bit better about approach 2 because I'm not confident I could come up with a statement about a "reasonable max dimension", especially given the fast-moving research. > Increase the number of dims for KNN vectors to 2048 > --------------------------------------------------- > > Key: LUCENE-10471 > URL: https://issues.apache.org/jira/browse/LUCENE-10471 > Project: Lucene - Core > Issue Type: Wish > Reporter: Mayya Sharipova > Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > The current maximum allowed number of dimensions is equal to 1024. But we see > in practice a couple well-known models that produce vectors with > 1024 > dimensions (e.g > [mobilenet_v2|https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/feature_vector/1] > uses 1280d vectors, OpenAI / GPT-3 Babbage uses 2048d vectors). Increasing > max dims to `2048` will satisfy these use cases. > I am wondering if anybody has strong objections against this. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org