On Wed, May 27, 2020 at 7:37 PM Matthew Knepley <knep...@gmail.com> wrote:
> On Wed, May 27, 2020 at 7:34 PM Jed Brown <j...@jedbrown.org> wrote: > >> Mark Adams <mfad...@lbl.gov> writes: >> >> > Nvidias's NSight with 2D Q3 and bs=10. (attached). >> >> Thanks; this is basically the same as a CPU -- the cost is searching the >> sorted rows for the next entry. I've long thought we should optimize >> the implementations to fast-path when the next column index in the >> sparse matrix equals the next index in the provided block. It'd just >> take a good CPU test to demonstrate that payoff. >> > > So you first check whether the next index is the one in the set passed in, > and otherwise > fall back on the search? Good idea. > The existing code seems to have *a line (low=i+1) *that seems to be trying to exploit consecutive indices but it is not quite right, I don't think, This is the existing code fragment (this code has been cloned many times and there are several instances of this kernel). I've *added code* that I think might make this do the right thing. if (col <= lastcol) low = 0; else high = nrow; lastcol = col; while (high-low > 5) { t = (low+high)/2; *if (rp[low] == col) high = low+1;else * if (rp[t] > col) high = t; else low = t; } for (i=low; i<high; i++) { if (rp[i] > col) break; *// delete this check if you don't add new columns* if (rp[i] == col) { ap[i] += value; * low = i + 1;* goto noinsert; } } I'll experiment with this. Thanks, Mark > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >