>>>>> Peter Haverty <haverty.pe...@gene.com> >>>>> on Mon, 19 Jan 2015 08:50:08 -0800 writes:
> Hi All, This is a very important issue. It would be very > sad to leave most users unaware of a free speedup of this > size. These options don't appear in the R --help > output. They really should be added there. Indeed, I've found that myself and had added them there about 24 hours ago. ((I think they were accidentally dropped a while ago)) > if the garbage collector is working very hard, might it > emit a note about better setting for these variables? > It's not really my place to comment on design philosophy, > but if there is a configure option for small memory > machines I would assume that would be sufficient for the > folks that are not on fairly current hardware. There's quite a few more issues with this, notably how the growth *steps* are done. That has been somewhat experimental and for that reason is _currently_ quite configurable via R_GC_* environment variables, see the code in src/main/memory.c This is currently discussed "privately" within the R core. I'm somewhat confident that R 3.2.0 in April will have changes. And -- coming back to the beginning -- at least the "R-devel" version now shows R --help | grep -e min-.size --min-nsize=N Set min number of fixed size obj's ("cons cells") to N --min-vsize=N Set vector heap minimum to N bytes; '4M' = 4 MegaB -- Martin Maechler, ETH Zurich > On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <n...@verse.com> wrote: >> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence >> <lawrence.mich...@gene.com> wrote: >> > Just wanted to start a discussion on whether R could ship with more >> > appropriate GC parameters. >> >> I've been doing a number of similar measurements, and have come to the >> same conclusion. R is currently very conservative about memory usage, >> and this leads to unnecessarily poor performance on certain problems. >> Changing the defaults to sizes that are more appropriate for modern >> machines can often produce a 2x speedup. >> >> On Sat, Jan 17, 2015 at 8:39 AM, <luke-tier...@uiowa.edu> wrote: >> > Martin Morgan discussed this a year or so ago and as I recall bumped >> > up these values to the current defaults. I don't recall details about >> > why we didn't go higher -- maybe Martin does. >> >> I just checked, and it doesn't seem that any of the relevant values >> have been increased in the last ten years. Do you have a link to the >> discussion you recall so we can see why the changes weren't made? >> >> > I suspect the main concern would be with small memory machines in >> student labs >> > and less developed countries. >> >> While a reasonable concern, I'm doubtful there are many machines for >> which the current numbers are optimal. The current minimum size >> increases for node and vector heaps are 40KB and 80KB respectively. >> This grows as the heap grows (min + .05 * heap), but still means that >> we do many more expensive garbage collections at while growing than we >> need to. Paradoxically, the SMALL_MEMORY compile option (which is >> suggestd for computers with up to 32MB of RAM) has slightly larger at >> 50KB and 100KB. >> >> I think we'd get significant benefit for most users by being less >> conservative about memory consumption. The exact sizes should be >> discussed, but with RAM costing about $10/GB it doesn't seem >> unreasonable to assume most machines running R have multiple GB >> installed, and those that don't will quite likely be running an OS >> that needs a custom compiled binary anyway. >> >> I could be way off, but my suggestion might be a 10MB start with 1MB >> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments >> for NORMAL_MEMORY, and 1GB start with 100MB increments for >> LARGE_MEMORY might be a reasonable spread. >> >> Or one could go even larger, noting that on most systems, >> overcommitted memory is not a problem until it is used. Until we >> write to it, it doesn't actually use physical RAM, just virtual >> address space. Or we could stay small, but make it possible to >> programmatically increase the granularity from within R. >> >> For ease of reference, here are the relevant sections of code: >> >> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217 >> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999) >> 217 #ifndef R_NSIZE >> 218 #define R_NSIZE 350000L >> 219 #endif >> 220 #ifndef R_VSIZE >> 221 #define R_VSIZE 6291456L >> 222 #endif >> >> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169 >> (ripley last authored on Jun 9, 2004) >> 157 Rp->vsize = R_VSIZE; >> 158 Rp->nsize = R_NSIZE; >> 166 #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */ >> 167 #define Max_Vsize R_SIZE_T_MAX /* unlimited */ >> 169 #define Min_Nsize 220000 >> 170 #define Min_Vsize (1*Mega) >> >> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335 >> (luke last authored on Nov 1, 2000) >> #ifdef SMALL_MEMORY >> 336 /* On machines with only 32M of memory (or on a classic Mac OS port) >> 337 it might be a good idea to use settings like these that are more >> 338 aggressive at keeping memory usage down. */ >> 339 static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2; >> 340 static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0; >> 341 static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2; >> 342 static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0; >> 343#else >> 344 static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2; >> 345 static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0; >> 346 static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2; >> 347 static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0; >> 348#endif >> >> static void AdjustHeapSize(R_size_t size_needed) >> { >> R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac); >> R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac); >> R_size_t NNeeded = R_NodesInUse + R_MinNFree; >> R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize + >> size_needed + R_MinVFree; >> double node_occup = ((double) NNeeded) / R_NSize; >> double vect_occup = ((double) VNeeded) / R_VSize; >> >> if (node_occup > R_NGrowFrac) { >> R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac >> * R_NSize); >> if (R_MaxNSize >= R_NSize + change) >> R_NSize += change; >> } >> else if (node_occup < R_NShrinkFrac) { >> R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize); >> if (R_NSize < NNeeded) >> R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize; >> if (R_NSize < orig_R_NSize) >> R_NSize = orig_R_NSize; >> } >> >> if (vect_occup > 1.0 && VNeeded < R_MaxVSize) >> R_VSize = VNeeded; >> if (vect_occup > R_VGrowFrac) { >> R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac >> * R_VSize); >> if (R_MaxVSize - R_VSize >= change) >> R_VSize += change; >> } >> else if (vect_occup < R_VShrinkFrac) { >> R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize; >> if (R_VSize < VNeeded) >> R_VSize = VNeeded; >> if (R_VSize < orig_R_VSize) >> R_VSize = orig_R_VSize; >> } >> >> DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup); >> } >> Rp-> nsize is overridden at startup by environment variable R_NSIZE if >> Min_Nsize <= $R_NSIZE <= Max_Nsize. Rp->vsize is overridden at >> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <= >> Max_Vsize. These are then used to set the global variables R_Nsize >> and R_Vsize with R_SetMaxVSize(Rp->max_vsize). >> ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel