Karl Lin <karl.lin...@gmail.com> writes: > Thanks for the reply. However, if the matrix is huge, like 13.5TB in our > case, it will take significant amount of time to loop over insertion twice. > Any other time and resource saving options? Thank you very much.
Where do the matrix entries come from? Counting nonzeros should run at near STREAM bandwidth, which is a 200-300 GB/s for modern 2-socket compute nodes. How many nodes do you need to have the memory capacity? On 100 nodes, that preallocation counting pass should take less than a second.