veqtor commented on PR #874:
URL: https://github.com/apache/lucene/pull/874#issuecomment-1517620551

   > willing to take actions that go against science because vendors have told 
them it is right
   
   If, as you say, an entire document, regardless of it's lenght, content and 
so on, can be represented by a vector of 768 floats, why is it then that GPT-4, 
which internally represents each token with a vector of more than 8192, still 
inaccurately recalls information about entities?
   
   Do you see the flaw in your reasoning here?
   
   If the real issue is with the use of HNSW, which isn't suitable for this, 
not that highe-dimensionality embeddings have value, then the solution isn't to 
not provide the feature, but to switch technologies to something more suitable 
for the type of applications that people use Lucene for: Search over large 
amounts of data.
   
   If you need this functionality then you have no reason to use anything else 
than FAISS.
   HNSW works ok, but only if you use it for max 500 or so embeddings, then it 
becomes too slow.
   Using FAISS you can hierarchically partition the vector space and all 
calculations are done efficiently.
   
   If bringing in FAISS is too drastical, then it's implementation should be 
studied and integrated instead.
   
   Fast efficient vector functionality is a must, if lucene doesn't support 
this then it and anything that builds off of it is doomed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to