[GitHub] [lucene] ctlgcustodio commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048

GitBox Thu, 19 May 2022 06:44:16 -0700


ctlgcustodio commented on PR #874:
URL: https://github.com/apache/lucene/pull/874#issuecomment-1131711152


   > My concerns are on the JIRA issue, I don't want them to be forgotten. 
https://issues.apache.org/jira/browse/LUCENE-10471
   > 
   > I don't know how we can say "we will not recommend further increase". What 
happens when the latest trendy dataset comes out with 4096 dimensions?
   > 
   > I want to understand, why so many dimensions are really needed for search 
purposes. What is the concrete benefit in terms of quality, because we know 
what the performance hit is going to be.
   
   
   I understand that in general the more features you have in a vector of 
embeddings, the more details the model returns from the classification.
   So you have a more refined result.
   However, while it does not support greater than 1024, if possible use a 
weighted average and evaluate your result.
   
   In my case I used Fixed Average and it worked fine for Elmo model, as 
mentioned here in 3 Alternative Weighting Schemes
   https://arxiv.org/pdf/1904.02954.pdf
   
   [Other option](https://github.com/lior-k/fast-elasticsearch-vector-scoring) 
If I'm not mistaken this git is capable of supporting vectors larger than 1024.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] ctlgcustodio commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048

Reply via email to