xiangfu0 commented on issue #10919:
URL: https://github.com/apache/pinot/issues/10919#issuecomment-1593813274

   Here are some takes from my side:
   High level principals:
   - CPU solution
   - KNN search has to be a distributed solution
   - The minimal search space is considered within one segment level(10-100MM 
rows/points)
   - Pluggable index structure along with the search algorithm
   
   Considering the doc size in one segment is usually < 10MM, so I think any of 
current **billion scale** approach is sufficient for us.
   
   In terms of implementation, here is just take an example of using 
SPTAG(https://github.com/microsoft/SPTAG), paper is: 
https://arxiv.org/pdf/2111.08566.pdf
   
   During Index build phase, we need to build per segment basis SPTAG index. 
Use hierarchical balanced clustering to generate a set of regions(centroids).
   We can configure below two parameters:
   - Number of regions or the percentage of total points are centroids(number 
of regions). From paper, 16% for best for search performance and memory usage
   - Replicas for a vector assigned to multiple closed clusters, larger number 
means better recall but search requires more resources and longer latency. From 
paper, 8 is best to balance perf and latency. Need to use RNG algorithm to 
avoid the high similarity of posting list for close regions
   
   During Query phase:
   kNN search functionality should be able to configure:
   - k(required), which is how many results to fetch,
   - t(optional), a percent number to include more regions to search based on 
the distance to the closest centroids, this will increase the recall rate but 
still keep low resources usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to