Yes, I do set outside of R, in shell: R_MAX_VSIZE=100Gb SRC_DATANAME=G1_1e9_2e0_0_0 /usr/bin/time -v Rscript datatable/groupby-datatable.R
I think it might be related to allocations made with malloc rather than R_alloc. Probably malloc allocation is not capped by setting this env var. If so, then I have to limit memory on OS/shell level. As you mentioned before. Best On Tue, Dec 1, 2020 at 6:54 PM <luke-tier...@uiowa.edu> wrote: > > The fact that your max resident size isn't affected looks odd. Are > you setting the environment variable outside R? When I run > > env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0 > > (your code in jg.R). I get a quick failure with 11785524maxresident)k > > Best, > > luke > > On Tue, 1 Dec 2020, Jan Gorecki wrote: > > > Thank you Luke, > > > > I tried your suggestion about R_MAX_VSIZE but I am not able to get the > > error you are getting. > > I tried recent R devel as I have seen you made a change to GC there. > > My machine is 128GB, free -h reports 125GB available. I tried to set > > 128, 125 and 100. In all cases the result is "Command terminated by > > signal 9". Each took around 6-6.5h. > > Details below, if it tells you anything how could I optimize it (or > > raise an exception early) please do let me know. > > > > R 4.0.3 > > > > unset R_MAX_VSIZE > > User time (seconds): 40447.92 > > System time (seconds): 4034.37 > > Percent of CPU this job got: 201% > > Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59 > > Maximum resident set size (kbytes): 127261184 > > Major (requiring I/O) page faults: 72441 > > Minor (reclaiming a frame) page faults: 3315491751 > > Voluntary context switches: 381446 > > Involuntary context switches: 529554 > > File system inputs: 108339200 > > File system outputs: 120 > > > > R-devel 2020-11-27 r79522 > > > > unset R_MAX_VSIZE > > User time (seconds): 40713.52 > > System time (seconds): 4039.52 > > Percent of CPU this job got: 198% > > Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52 > > Maximum resident set size (kbytes): 127254796 > > Major (requiring I/O) page faults: 72810 > > Minor (reclaiming a frame) page faults: 3433589848 > > Voluntary context switches: 384363 > > Involuntary context switches: 609024 > > File system inputs: 108467064 > > File system outputs: 112 > > > > R_MAX_VSIZE=128Gb > > User time (seconds): 40411.13 > > System time (seconds): 4227.99 > > Percent of CPU this job got: 198% > > Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01 > > Maximum resident set size (kbytes): 127249316 > > Major (requiring I/O) page faults: 88500 > > Minor (reclaiming a frame) page faults: 3544520527 > > Voluntary context switches: 384117 > > Involuntary context switches: 545397 > > File system inputs: 111675896 > > File system outputs: 120 > > > > R_MAX_VSIZE=125Gb > > User time (seconds): 40246.83 > > System time (seconds): 4042.76 > > Percent of CPU this job got: 201% > > Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56 > > Maximum resident set size (kbytes): 127254200 > > Major (requiring I/O) page faults: 63867 > > Minor (reclaiming a frame) page faults: 3449493803 > > Voluntary context switches: 370753 > > Involuntary context switches: 614607 > > File system inputs: 106322880 > > File system outputs: 112 > > > > R_MAX_VSIZE=100Gb > > User time (seconds): 41837.10 > > System time (seconds): 3979.57 > > Percent of CPU this job got: 192% > > Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34 > > Maximum resident set size (kbytes): 127256940 > > Major (requiring I/O) page faults: 66829 > > Minor (reclaiming a frame) page faults: 3357778594 > > Voluntary context switches: 391149 > > Involuntary context switches: 646410 > > File system inputs: 106605648 > > File system outputs: 120 > > > > On Fri, Nov 27, 2020 at 10:18 PM <luke-tier...@uiowa.edu> wrote: > >> > >> On Thu, 26 Nov 2020, Jan Gorecki wrote: > >> > >>> Thank you Luke for looking into it. Your knowledge of gc is definitely > >>> helpful here. I put comments inline below. > >>> > >>> Best, > >>> Jan > >>> > >>> On Wed, Nov 25, 2020 at 10:38 PM <luke-tier...@uiowa.edu> wrote: > >>>> > >>>> On Tue, 24 Nov 2020, Jan Gorecki wrote: > >>>> > >>>>> As for other calls to system. I avoid calling system. In the past I > >>>>> had some (to get memory stats from OS), but they were failing with > >>>>> exactly the same issue. So yes, if I would add call to system before > >>>>> calling quit, I believe it would fail with the same error. > >>>>> At the same time I think (although I am not sure) that new allocations > >>>>> made in R are working fine. So R seems to reserve some memory and can > >>>>> continue to operate, while external call like system will fail. Maybe > >>>>> it is like this by design, don't know. > >>>> > >>>> Thanks for the report on quit(). We're exploring how to make the > >>>> cleanup on exit more robust to low memory situations like these. > >>>> > >>>>> > >>>>> Aside from this problem that is easy to report due to the warning > >>>>> message, I think that gc() is choking at the same time. I tried to > >>>>> make reproducible example for that, multiple times but couldn't, let > >>>>> me try one more time. > >>>>> It happens to manifest when there is 4e8+ unique characters/factors in > >>>>> an R session. I am able to reproduce it using data.table and dplyr > >>>>> (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy > >>>>> because of the size. I described briefly problem in: > >>>>> https://github.com/h2oai/db-benchmark/issues/110 > >>>> > >>>> Because of the design of R's character vectors, with each element > >>>> allocated separately, R is never going to be great at handling huge > >>>> numbers of distinct strings. But it can do an adequate job given > >>>> enough memory to work with. > >>>> > >>>> When I run your GitHub issue example on a machine with around 500 Gb > >>>> of RAM it seems to run OK; /usr/bin/time reports > >>>> > >>>> 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata > >>>> 92180796maxresident)k > >>>> 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps > >>>> > >>>> So the memory footprint is quite large. Using gc.time() it looks like > >>>> about 1/3 of the time is in GC. Not ideal, and maybe could be improved > >>>> on a bit, but probably not by much. The GC is basically doing an > >>>> adequate job, given enough RAM. > >>> > >>> Agree, 1/3 is a lot but still acceptable. So this strictly is not > >>> something that requires intervention. > >>> PS. I wasn't aware of gc.time(), it may be worth linking it from > >>> SeeAlso in gc() manual. > >>> > >>>> > >>>> If you run this example on a system without enough RAM, or with other > >>>> programs competing for RAM, you are likely to end up fighting with > >>>> your OS/hardware's virtual memory system. When I try to run it on a > >>>> 16Gb system it churns for an hour or so before getting killed, and > >>>> /usr/bin/time reports a huge number of page faults: > >>>> > >>>> 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps > >>>> > >>>> You are probably experiencing something similar. > >>> > >>> Yes, this is exactly what I am experiencing. > >>> The machine is a bare metal machine of 128GB mem, csv size 50GB, > >>> data.frame size 74GB. > >>> In my case it churns for ~3h before it gets killed with SIGINT from > >>> the parent R process which uses 3h as a timeout for this script. > >>> This is something I would like to be addressed because gc time is far > >>> bigger than actual computation time. This is not really acceptable, I > >>> would prefer to raise an exception instead. > >>> > >>>> > >>>> There may be opportunities for more tuning of the GC to better handle > >>>> running this close to memory limits, but I doubt the payoff would be > >>>> worth the effort. > >>> > >>> If you don't have plans/time to work on that anytime soon, then I can > >>> fill bugzilla for this problem so it won't get lost in the mailing > >>> list. > >> > >> I'm not convinced anything useful can be done that would work well for > >> your application without working badly for others. > >> > >> If you want to drive this close to your memory limits you are probably > >> going to have to take responsibility for some tuning at your end. One > >> option in ?Memory you might try is the R_MAX_VSIZE environment > >> variable. On my 16Gb machine with R_MAX_VSIZE=16Gb your example fails > >> very quickly with > >> > >> Error: vector memory exhausted (limit reached?) > >> > >> rather than churning for an hour trying to make things work. Setting > >> memory and/or virtual memory limits in your shell is another option. > >> > >> Best, > >> > >> luke > >> > >>> > >>> > >>>> > >>>> Best, > >>>> > >>>> luke > >>>> > >>>>> It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print > >>>>> even more information about gc, like how much time the each gc() > >>>>> process took, how many objects it has to check on each level. > >>>>> > >>>>> Best regards, > >>>>> Jan > >>>>> > >>>>> > >>>>> > >>>>> On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera > >>>>> <tomas.kalib...@gmail.com> wrote: > >>>>>> > >>>>>> On 11/24/20 11:27 AM, Jan Gorecki wrote: > >>>>>>> Thanks Bill for checking that. > >>>>>>> It was my impression that warnings are raised from some internal > >>>>>>> system calls made when quitting R. At that point I don't have much > >>>>>>> control over checking the return status of those. > >>>>>>> Your suggestion looks good to me. > >>>>>>> > >>>>>>> Tomas, do you think this could help? could this be implemented? > >>>>>> > >>>>>> I think this is a good suggestion. Deleting files on Unix was changed > >>>>>> from system("rm") to doing that in C, and deleting the session > >>>>>> directory > >>>>>> should follow. > >>>>>> > >>>>>> It might also help diagnosing your problem, but I don't think it would > >>>>>> solve it. If the diagnostics in R works fine and the OS was so > >>>>>> hopelessly out of memory that it couldn't run any more external > >>>>>> processes, then really this is not a problem of R, but of having > >>>>>> exhausted the resources. And it would be a coincidence that just this > >>>>>> particular call to "system" at the end of the session did not work. > >>>>>> Anything else could break as well close to the end of the script. This > >>>>>> seems the most likely explanation to me. > >>>>>> > >>>>>> Do you get this warning repeatedly, reproducibly at least in slightly > >>>>>> different scripts at the very end, with this warning always from > >>>>>> quit()? > >>>>>> So that the "call" part of the warning message has .Internal(quit) like > >>>>>> in the case you posted? Would adding another call to "system" before > >>>>>> the > >>>>>> call to "q()" work - with checking the return value? If it is always > >>>>>> only the last call to "system" in "q()", then it is suspicious, perhaps > >>>>>> an indication that some diagnostics in R is not correct. In that case, > >>>>>> a > >>>>>> reproducible example would be the key - so either if you could diagnose > >>>>>> on your end what is the problem, or create a reproducible example that > >>>>>> someone else can use to reproduce and debug. > >>>>>> > >>>>>> Best > >>>>>> Tomas > >>>>>> > >>>>>>> > >>>>>>> On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap > >>>>>>> <williamwdun...@gmail.com> wrote: > >>>>>>>> The call to system() probably is an internal call used to delete the > >>>>>>>> session's tempdir(). This sort of failure means that a potentially > >>>>>>>> large amount of disk space is not being recovered when R is done. > >>>>>>>> Perhaps R_CleanTempDir() could call R_unlink() instead of having a > >>>>>>>> subprocess call 'rm -rf ...'. Then it could also issue a specific > >>>>>>>> warning if it was impossible to delete all of tempdir(). (That > >>>>>>>> should be very rare.) > >>>>>>>> > >>>>>>>>> q("no") > >>>>>>>> Breakpoint 1, R_system (command=command@entry=0x7fffffffa1e0 "rm -Rf > >>>>>>>> /tmp/RtmppoKPXb") at sysutils.c:311 > >>>>>>>> 311 { > >>>>>>>> (gdb) where > >>>>>>>> #0 R_system (command=command@entry=0x7fffffffa1e0 "rm -Rf > >>>>>>>> /tmp/RtmppoKPXb") at sysutils.c:311 > >>>>>>>> #1 0x00005555557c30ec in R_CleanTempDir () at sys-std.c:1178 > >>>>>>>> #2 0x00005555557c31d7 in Rstd_CleanUp (saveact=<optimized out>, > >>>>>>>> status=0, runLast=<optimized out>) at sys-std.c:1243 > >>>>>>>> #3 0x00005555557c593d in R_CleanUp > >>>>>>>> (saveact=saveact@entry=SA_NOSAVE, status=status@entry=0, > >>>>>>>> runLast=<optimized out>) at system.c:87 > >>>>>>>> #4 0x00005555556cc85e in do_quit (call=<optimized out>, > >>>>>>>> op=<optimized out>, args=0x555557813f90, rho=<optimized out>) at > >>>>>>>> main.c:1393 > >>>>>>>> > >>>>>>>> -Bill > >>>>>>>> > >>>>>>>> On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera > >>>>>>>> <tomas.kalib...@gmail.com> wrote: > >>>>>>>>> On 11/21/20 6:51 PM, Jan Gorecki wrote: > >>>>>>>>>> Dear R-developers, > >>>>>>>>>> > >>>>>>>>>> Some of the more fat scripts (50+ GB mem used by R) that I am > >>>>>>>>>> running, > >>>>>>>>>> when they finish they do quit with q("no", status=0) > >>>>>>>>>> Quite often it happens that there is an extra stderr output > >>>>>>>>>> produced > >>>>>>>>>> at the very end which looks like this: > >>>>>>>>>> > >>>>>>>>>> Warning message: > >>>>>>>>>> In .Internal(quit(save, status, runLast)) : > >>>>>>>>>> system call failed: Cannot allocate memory > >>>>>>>>>> > >>>>>>>>>> Is there any way to avoid this kind of warnings? I am using stderr > >>>>>>>>>> output for detecting failures in scripts and this warning is a > >>>>>>>>>> false > >>>>>>>>>> positive of a failure. > >>>>>>>>>> > >>>>>>>>>> Maybe quit function could wait little bit longer trying to allocate > >>>>>>>>>> before it raises this warning? > >>>>>>>>> If you see this warning, some call to system() or system2() or > >>>>>>>>> similar, > >>>>>>>>> which executes an external program, failed to even run a shell to > >>>>>>>>> run > >>>>>>>>> that external program, because there was not enough memory. You > >>>>>>>>> should > >>>>>>>>> be able to find out where it happens by checking the exit status of > >>>>>>>>> system(). > >>>>>>>>> > >>>>>>>>> Tomas > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Best regards, > >>>>>>>>>> Jan Gorecki > >>>>>>>>>> > >>>>>>>>>> ______________________________________________ > >>>>>>>>>> R-devel@r-project.org mailing list > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>>>>>> ______________________________________________ > >>>>>>>>> R-devel@r-project.org mailing list > >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>>> > >>>>>> > >>>>> > >>>>> ______________________________________________ > >>>>> R-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>> > >>>> > >>>> -- > >>>> Luke Tierney > >>>> Ralph E. Wareham Professor of Mathematical Sciences > >>>> University of Iowa Phone: 319-335-3386 > >>>> Department of Statistics and Fax: 319-335-3017 > >>>> Actuarial Science > >>>> 241 Schaeffer Hall email: luke-tier...@uiowa.edu > >>>> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > >>> > >> > >> -- > >> Luke Tierney > >> Ralph E. Wareham Professor of Mathematical Sciences > >> University of Iowa Phone: 319-335-3386 > >> Department of Statistics and Fax: 319-335-3017 > >> Actuarial Science > >> 241 Schaeffer Hall email: luke-tier...@uiowa.edu > >> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel