You're clearly doing almost all your allocation *not* using PetscMalloc (so not in a Vec or Mat). If you're managing your own mesh yourself, you might be allocating a global amount on each rank, instead of strictly using scalable data structures (i.e., always partitioned).
My favorite tool for understanding memory use is heaptrack. https://urldefense.us/v3/__https://github.com/KDE/heaptrack__;!!G_uCfscf7eWS!agSDvRnjou_irVa09mE8tn11M8EkGEsPjrHe8yzMxmZyJkn-U6e0AxubboUT6qOgDuK4nIlW9w1Xr4TxxNk$ David Scott <d.sc...@epcc.ed.ac.uk> writes: > OK. > > I had started to wonder if that was the case. I'll do some further > investigation. > > Thanks, > > David > > On 22/11/2024 22:10, Matthew Knepley wrote: >> This email was sent to you by someone outside the University. >> You should only click on links or attachments if you are certain that >> the email is genuine and the content is safe. >> On Fri, Nov 22, 2024 at 12:57 PM David Scott <d.sc...@epcc.ed.ac.uk> >> wrote: >> >> Matt, >> >> Thanks for the quick response. >> >> Yes 1) is trivially true. >> >> With regard to 2), from the SLURM output: >> [0] Maximum memory PetscMalloc()ed 29552 maximum size of entire >> process 4312375296 >> [1] Maximum memory PetscMalloc()ed 29552 maximum size of entire >> process 4311990272 >> Yes only 29KB was malloced but the total figure was 4GB per process. >> >> Looking at >> mem0 = 16420864.000000000 >> mem0 = 16117760.000000000 >> mem1 = 4311490560.0000000 >> mem1 = 4311826432.0000000 >> mem2 = 4311490560.0000000 >> mem2 = 4311826432.0000000 >> mem0 is written after PetscInitialize. >> mem1 is written roughly half way through the options being read. >> mem2 is written on completion of the options being read. >> >> The code does very little other than read configuration options. >> Why is so much memory used? >> >> >> This is not due to options processing, as that would fall under Petsc >> malloc allocations. I believe we are measuring this >> using RSS which includes the binary, all shared libraries which are >> paged in, and stack/heap allocations. I think you are >> seeing the shared libraries come in. You might be able to see all the >> libraries that come in using strace. >> >> Thanks, >> >> Matt >> >> I do not understand what is going on and I may have expressed >> myself badly but I do have a problem as I certainly cannot use >> anywhere near 128 processes on a node with 128GB of RAM before I >> get an OOM error. (The code runs successfully on 32 processes but >> not 64.) >> >> Regards, >> >> David >> >> On 22/11/2024 16:53, Matthew Knepley wrote: >>> This email was sent to you by someone outside the University. >>> You should only click on links or attachments if you are certain >>> that the email is genuine and the content is safe. >>> On Fri, Nov 22, 2024 at 11:36 AM David Scott >>> <d.sc...@epcc.ed.ac.uk> wrote: >>> >>> Hello, >>> >>> I am using the options mechanism of PETSc to configure my CFD >>> code. I >>> have introduced options describing the size of the domain >>> etc. I have >>> noticed that this consumes a lot of memory. I have found that >>> the amount >>> of memory used scales linearly with the number of MPI >>> processes used. >>> This restricts the number of MPI processes that I can use. >>> >>> >>> There are two statements: >>> >>> 1) The memory scales linearly with P >>> >>> 2) This uses a lot of memory >>> >>> Let's deal with 1) first. This seems to be trivially true. If I >>> want every process to have >>> access to a given option value, that option value must be in the >>> memory of every process. >>> The only alternative would be to communicate with some process in >>> order to get values. >>> Few codes seem to be willing to make this tradeoff, and we do not >>> offer it. >>> >>> Now 2). Looking at the source, for each option we store >>> a PetscOptionItem, which I count >>> as having size 37 bytes (12 pointers/ints and a char). However, >>> there is data behind every >>> pointer, like the name, help text, available values (sometimes), >>> I could see it being as large >>> as 4K. Suppose it is. If I had 256 options, that would be 1M. Is >>> this a large amount of memory? >>> >>> The way I read the SLURM output, 29K was malloced. Is this a >>> large amount of memory? >>> >>> I am trying to get an idea of the scale. >>> >>> Thanks, >>> >>> Matt >>> >>> Is there anything that I can do about this or do I need to >>> configure my >>> code in a different way? >>> >>> I have attached some code extracted from my application which >>> demonstrates this along with the output from a running it on >>> 2 MPI >>> processes. >>> >>> Best wishes, >>> >>> David Scott >>> The University of Edinburgh is a charitable body, registered >>> in Scotland, with registration number SC005336. Is e >>> buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, >>> clàraichte an Alba, àireamh clàraidh SC005336. >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >>> >>> >>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cH8SjJvsuVEK1zv8noUjNUJC0VnHFqems68PjB2E94pqxc3q55YprX1q2JXFvPAzXJkh40J1-erXPWdIvc-xrLkRIgg$ >>> >>> >>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cH8SjJvsuVEK1zv8noUjNUJC0VnHFqems68PjB2E94pqxc3q55YprX1q2JXFvPAzXJkh40J1-erXPWdIvc-xGybRwKU$ >>> > >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cH8SjJvsuVEK1zv8noUjNUJC0VnHFqems68PjB2E94pqxc3q55YprX1q2JXFvPAzXJkh40J1-erXPWdIvc-xrLkRIgg$ >> >> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cH8SjJvsuVEK1zv8noUjNUJC0VnHFqems68PjB2E94pqxc3q55YprX1q2JXFvPAzXJkh40J1-erXPWdIvc-xGybRwKU$ >> >