jasperjiaguo commented on issue #10919:
URL: https://github.com/apache/pinot/issues/10919#issuecomment-1593447601

   Recommendation systems and Language Model (LLM) applications often utilize 
high-dimensional vector spaces to represent complex data like user profiles or 
linguistic patterns. Similarity-based vector indexing/search, a crucial element 
of these systems, identifies 'close' vectors in this space, signifying high 
similarity. This is commonly achieved through calculating the cosine similarity 
or Euclidean distance between vectors.
   
   For instance, (1) in recommendation systems, items similar to a user's past 
interests are identified and suggested. (2) Meanwhile in LLM applications, 
instead of submitting a customer’s prompt directly to model, the question is 
first routed to the vector database (can be considered as the memory of the 
LLM), which will retrieve the top 10 or 15 most relevant documents for that 
query. The vector database then bundles those supporting documents with the 
user’s original question, submits the full package as the knowledge context 
prompt to the LLM, which returns more relevant answer. 
(https://mlops.community/combine-and-query-multiple-documents-with-llm/, 
https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/MilvusIndexDemo.html)
   
   However, given the potentially vast number of vectors, searching for the 
most similar ones can be computationally challenging. Therefore, Approximate 
Nearest Neighbor (ANN) algorithms like FAISS, Annoy, or ScaNN are employed to 
expedite this process by quickly finding the nearest vectors in 
high-dimensional spaces.
   
   https://github.com/facebookresearch/faiss
   
   
https://www.datanami.com/2023/03/27/vector-databases-emerge-to-fill-critical-role-in-ai/
   
   https://github.com/linkedin/venice#read-compute


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to