jasperjiaguo commented on issue #10919: URL: https://github.com/apache/pinot/issues/10919#issuecomment-1593447601
Recommendation systems and Language Model (LLM) applications often utilize high-dimensional vector spaces to represent complex data like user profiles or linguistic patterns. Similarity-based vector indexing/search, a crucial element of these systems, identifies 'close' vectors in this space, signifying high similarity. This is commonly achieved through calculating the cosine similarity or Euclidean distance between vectors. For instance, (1) in recommendation systems, items similar to a user's past interests are identified and suggested. (2) Meanwhile in LLM applications, instead of submitting a customer’s prompt directly to model, the question is first routed to the vector database (can be considered as the memory of the LLM), which will retrieve the top 10 or 15 most relevant documents for that query. The vector database then bundles those supporting documents with the user’s original question, submits the full package as the knowledge context prompt to the LLM, which returns more relevant answer. (https://mlops.community/combine-and-query-multiple-documents-with-llm/, https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/MilvusIndexDemo.html) However, given the potentially vast number of vectors, searching for the most similar ones can be computationally challenging. Therefore, Approximate Nearest Neighbor (ANN) algorithms like FAISS, Annoy, or ScaNN are employed to expedite this process by quickly finding the nearest vectors in high-dimensional spaces. https://github.com/facebookresearch/faiss https://www.datanami.com/2023/03/27/vector-databases-emerge-to-fill-critical-role-in-ai/ https://github.com/linkedin/venice#read-compute -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org