nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814530858
> meaning that we can get same recall for a smaller max-conn value now. I ran some tests with with max-conn 16 and max-conn = 8 and it seems like with [my proposal](https://github.com/apache/lucene/pull/12783/commits), even max-conn=8 is better as compared to max-conn=16 of mainline. I will also add more stats. I diverged from `main` branch at commit `f64bb19697708bfd91e05ff4314976c991f60cbc` (15 Oct). I haven't merged back from there. I think this is `Lucene > 9.7.0`. But will confirm. --- `main` : Commit ID : f64bb19697708bfd91e05ff4314976c991f60cbc with max-conn = 16 |recall |avgCpuTime |numDocs |fanout |maxConn |beamWidth |totalVisited |reindexTimeMsec |selectivity |prefilter| |---|---|---|---|---|---|---|---|---|---| |0.451 |17.22 |1000000 |0 |16 |100 |10 |406166 |1.00 |post-filter| candidate with max conn = 16 |recall |avgCpuTime |numDocs |fanout |maxConn |beamWidth |totalVisited |reindexTimeMsec |selectivity |prefilter| |---|---|---|---|---|---|---|---|---|---| |0.595 (+32%) |24.19 (+40%) |1000000 |0 |16 |100 |10 |581090 (+43%) |1.00 |post-filter| candidate with max conn = 8 |recall |avgCpuTime |numDocs |fanout |maxConn |beamWidth |totalVisited |reindexTimeMsec |selectivity |prefilter| |---|---|---|---|---|---|---|---|---|---| |0.465 (+3%) |16.35 (- 5%) |1000000 |0 |8 |100 |10 |325321 (-20%) |1.00 |post-filter| --- Interesting fact: simple implementation of using 2 for loops to find the common neighbours works better than using HashSet<Integer> or IntIntHashMap(). As I think the major contribution to indexing time is because of increased number of connections. But as shown above, the indexing time decreases drastically by decreasing max-conn, while still maintaining or slightly improving the recall and search avgCpuTIme. I will update with more scripts + info/stats and some code improvements next. Also, I am thinking I should do a 1-1 comparison of the 3 heuristics as mentioned in the paper, with the level of disconnecteness in each approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org