Hi All, This is a very important issue. It would be very sad to leave most users unaware of a free speedup of this size. These options don't appear in the R --help output. They really should be added there. Additionally, if the garbage collector is working very hard, might it emit a note about better setting for these variables?
It's not really my place to comment on design philosophy, but if there is a configure option for small memory machines I would assume that would be sufficient for the folks that are not on fairly current hardware. Regards, Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <n...@verse.com> wrote: > On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence > <lawrence.mich...@gene.com> wrote: > > Just wanted to start a discussion on whether R could ship with more > > appropriate GC parameters. > > I've been doing a number of similar measurements, and have come to the > same conclusion. R is currently very conservative about memory usage, > and this leads to unnecessarily poor performance on certain problems. > Changing the defaults to sizes that are more appropriate for modern > machines can often produce a 2x speedup. > > On Sat, Jan 17, 2015 at 8:39 AM, <luke-tier...@uiowa.edu> wrote: > > Martin Morgan discussed this a year or so ago and as I recall bumped > > up these values to the current defaults. I don't recall details about > > why we didn't go higher -- maybe Martin does. > > I just checked, and it doesn't seem that any of the relevant values > have been increased in the last ten years. Do you have a link to the > discussion you recall so we can see why the changes weren't made? > > > I suspect the main concern would be with small memory machines in > student labs > > and less developed countries. > > While a reasonable concern, I'm doubtful there are many machines for > which the current numbers are optimal. The current minimum size > increases for node and vector heaps are 40KB and 80KB respectively. > This grows as the heap grows (min + .05 * heap), but still means that > we do many more expensive garbage collections at while growing than we > need to. Paradoxically, the SMALL_MEMORY compile option (which is > suggestd for computers with up to 32MB of RAM) has slightly larger at > 50KB and 100KB. > > I think we'd get significant benefit for most users by being less > conservative about memory consumption. The exact sizes should be > discussed, but with RAM costing about $10/GB it doesn't seem > unreasonable to assume most machines running R have multiple GB > installed, and those that don't will quite likely be running an OS > that needs a custom compiled binary anyway. > > I could be way off, but my suggestion might be a 10MB start with 1MB > minimum increments for SMALL_MEMORY, 100MB start with 10MB increments > for NORMAL_MEMORY, and 1GB start with 100MB increments for > LARGE_MEMORY might be a reasonable spread. > > Or one could go even larger, noting that on most systems, > overcommitted memory is not a problem until it is used. Until we > write to it, it doesn't actually use physical RAM, just virtual > address space. Or we could stay small, but make it possible to > programmatically increase the granularity from within R. > > For ease of reference, here are the relevant sections of code: > > https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217 > (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999) > 217 #ifndef R_NSIZE > 218 #define R_NSIZE 350000L > 219 #endif > 220 #ifndef R_VSIZE > 221 #define R_VSIZE 6291456L > 222 #endif > > https://github.com/wch/r-source/blob/master/src/main/startup.c#L169 > (ripley last authored on Jun 9, 2004) > 157 Rp->vsize = R_VSIZE; > 158 Rp->nsize = R_NSIZE; > 166 #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */ > 167 #define Max_Vsize R_SIZE_T_MAX /* unlimited */ > 169 #define Min_Nsize 220000 > 170 #define Min_Vsize (1*Mega) > > https://github.com/wch/r-source/blob/master/src/main/memory.c#L335 > (luke last authored on Nov 1, 2000) > #ifdef SMALL_MEMORY > 336 /* On machines with only 32M of memory (or on a classic Mac OS port) > 337 it might be a good idea to use settings like these that are more > 338 aggressive at keeping memory usage down. */ > 339 static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2; > 340 static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0; > 341 static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2; > 342 static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0; > 343#else > 344 static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2; > 345 static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0; > 346 static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2; > 347 static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0; > 348#endif > > static void AdjustHeapSize(R_size_t size_needed) > { > R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac); > R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac); > R_size_t NNeeded = R_NodesInUse + R_MinNFree; > R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize + > size_needed + R_MinVFree; > double node_occup = ((double) NNeeded) / R_NSize; > double vect_occup = ((double) VNeeded) / R_VSize; > > if (node_occup > R_NGrowFrac) { > R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac > * R_NSize); > if (R_MaxNSize >= R_NSize + change) > R_NSize += change; > } > else if (node_occup < R_NShrinkFrac) { > R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize); > if (R_NSize < NNeeded) > R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize; > if (R_NSize < orig_R_NSize) > R_NSize = orig_R_NSize; > } > > if (vect_occup > 1.0 && VNeeded < R_MaxVSize) > R_VSize = VNeeded; > if (vect_occup > R_VGrowFrac) { > R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac > * R_VSize); > if (R_MaxVSize - R_VSize >= change) > R_VSize += change; > } > else if (vect_occup < R_VShrinkFrac) { > R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize; > if (R_VSize < VNeeded) > R_VSize = VNeeded; > if (R_VSize < orig_R_VSize) > R_VSize = orig_R_VSize; > } > > DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup); > } > > Rp->nsize is overridden at startup by environment variable R_NSIZE if > Min_Nsize <= $R_NSIZE <= Max_Nsize. Rp->vsize is overridden at > startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <= > Max_Vsize. These are then used to set the global variables R_Nsize > and R_Vsize with R_SetMaxVSize(Rp->max_vsize). > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel