[GitHub] [lucene] jbellis opened a new pull request, #12248: use HashMap instead of TreeMap for OnHeapHnswGraph neighbors

via GitHub Thu, 27 Apr 2023 09:43:52 -0700


jbellis opened a new pull request, #12248:
URL: https://github.com/apache/lucene/pull/12248


   Switches OnHeapHnswGraph from representing a single node's neighbors as a 
TreeMap, to a HashMap, and updates its callers to no longer assume that 
NodeIterator results are ordered by ordinal.
   
   This means that looking up neighbors goes from an O(log N) operation to an 
O(1) operation for all upper levels.  Only callers that need ordered neighbors 
need to pay the cost of sorting.
   
   On my machine, building the graph and running the queries from the SIFT 
dataset at http://corpus-texmex.irisa.fr/ sees about a 4% speedup.  These are 
dimension 128 vectors; the relative importance of looking up neighbors vs 
computing similarities will vary inversely with dimensionality.
   
   (First I did three runs for each codebase, but there was enough variance 
that I then ran five more.)
   
   [TreeMap]
   Run 0 took 708.876145328 seconds
   Run 1 took 754.700354185 seconds
   Run 2 took 710.236167725 seconds
   Run 0 took 717.61476478 seconds
   Run 1 took 742.494343683 seconds
   Run 2 took 736.983143255 seconds
   Run 3 took 719.305541186 seconds
   Run 4 took 722.831859596 seconds
   
   [HashMap]
   Run 0 took 724.875610053 seconds
   Run 1 took 682.65666933 seconds
   Run 2 took 682.655977613 seconds
   Run 0 took 716.165341298 seconds
   Run 1 took 684.314657618 seconds
   Run 2 took 686.456432263 seconds
   Run 3 took 717.067567184 seconds
   Run 4 took 702.009706983 seconds
   
   The timing harness is here: 
https://github.com/jbellis/hnswdemo/tree/lucene-bench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jbellis opened a new pull request, #12248: use HashMap instead of TreeMap for OnHeapHnswGraph neighbors

Reply via email to