Mark Adams <mfad...@lbl.gov> writes:

> Nvidias's NSight with 2D Q3 and bs=10. (attached).

Thanks; this is basically the same as a CPU -- the cost is searching the
sorted rows for the next entry.  I've long thought we should optimize
the implementations to fast-path when the next column index in the
sparse matrix equals the next index in the provided block.  It'd just
take a good CPU test to demonstrate that payoff.

Reply via email to