ctlgcustodio commented on PR #874: URL: https://github.com/apache/lucene/pull/874#issuecomment-1131711152
> My concerns are on the JIRA issue, I don't want them to be forgotten. https://issues.apache.org/jira/browse/LUCENE-10471 > > I don't know how we can say "we will not recommend further increase". What happens when the latest trendy dataset comes out with 4096 dimensions? > > I want to understand, why so many dimensions are really needed for search purposes. What is the concrete benefit in terms of quality, because we know what the performance hit is going to be. I understand that in general the more features you have in a vector of embeddings, the more details the model returns from the classification. So you have a more refined result. However, while it does not support greater than 1024, if possible use a weighted average and evaluate your result. In my case I used Fixed Average and it worked fine for Elmo model, as mentioned here in 3 Alternative Weighting Schemes https://arxiv.org/pdf/1904.02954.pdf [Other option](https://github.com/lior-k/fast-elasticsearch-vector-scoring) If I'm not mistaken this git is capable of supporting vectors larger than 1024. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org