On Wed, May 27, 2020 at 7:34 PM Jed Brown <j...@jedbrown.org> wrote: > Mark Adams <mfad...@lbl.gov> writes: > > > Nvidias's NSight with 2D Q3 and bs=10. (attached). > > Thanks; this is basically the same as a CPU -- the cost is searching the > sorted rows for the next entry. I've long thought we should optimize > the implementations to fast-path when the next column index in the > sparse matrix equals the next index in the provided block. It'd just > take a good CPU test to demonstrate that payoff. >
So you first check whether the next index is the one in the set passed in, and otherwise fall back on the search? Good idea. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>