jbellis opened a new pull request, #12248: URL: https://github.com/apache/lucene/pull/12248
Switches OnHeapHnswGraph from representing a single node's neighbors as a TreeMap, to a HashMap, and updates its callers to no longer assume that NodeIterator results are ordered by ordinal. This means that looking up neighbors goes from an O(log N) operation to an O(1) operation for all upper levels. Only callers that need ordered neighbors need to pay the cost of sorting. On my machine, building the graph and running the queries from the SIFT dataset at http://corpus-texmex.irisa.fr/ sees about a 4% speedup. These are dimension 128 vectors; the relative importance of looking up neighbors vs computing similarities will vary inversely with dimensionality. (First I did three runs for each codebase, but there was enough variance that I then ran five more.) [TreeMap] Run 0 took 708.876145328 seconds Run 1 took 754.700354185 seconds Run 2 took 710.236167725 seconds Run 0 took 717.61476478 seconds Run 1 took 742.494343683 seconds Run 2 took 736.983143255 seconds Run 3 took 719.305541186 seconds Run 4 took 722.831859596 seconds [HashMap] Run 0 took 724.875610053 seconds Run 1 took 682.65666933 seconds Run 2 took 682.655977613 seconds Run 0 took 716.165341298 seconds Run 1 took 684.314657618 seconds Run 2 took 686.456432263 seconds Run 3 took 717.067567184 seconds Run 4 took 702.009706983 seconds The timing harness is here: https://github.com/jbellis/hnswdemo/tree/lucene-bench -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org