On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence <lawrence.mich...@gene.com> wrote: > Just wanted to start a discussion on whether R could ship with more > appropriate GC parameters.
I've been doing a number of similar measurements, and have come to the same conclusion. R is currently very conservative about memory usage, and this leads to unnecessarily poor performance on certain problems. Changing the defaults to sizes that are more appropriate for modern machines can often produce a 2x speedup. On Sat, Jan 17, 2015 at 8:39 AM, <luke-tier...@uiowa.edu> wrote: > Martin Morgan discussed this a year or so ago and as I recall bumped > up these values to the current defaults. I don't recall details about > why we didn't go higher -- maybe Martin does. I just checked, and it doesn't seem that any of the relevant values have been increased in the last ten years. Do you have a link to the discussion you recall so we can see why the changes weren't made? > I suspect the main concern would be with small memory machines in student labs > and less developed countries. While a reasonable concern, I'm doubtful there are many machines for which the current numbers are optimal. The current minimum size increases for node and vector heaps are 40KB and 80KB respectively. This grows as the heap grows (min + .05 * heap), but still means that we do many more expensive garbage collections at while growing than we need to. Paradoxically, the SMALL_MEMORY compile option (which is suggestd for computers with up to 32MB of RAM) has slightly larger at 50KB and 100KB. I think we'd get significant benefit for most users by being less conservative about memory consumption. The exact sizes should be discussed, but with RAM costing about $10/GB it doesn't seem unreasonable to assume most machines running R have multiple GB installed, and those that don't will quite likely be running an OS that needs a custom compiled binary anyway. I could be way off, but my suggestion might be a 10MB start with 1MB minimum increments for SMALL_MEMORY, 100MB start with 10MB increments for NORMAL_MEMORY, and 1GB start with 100MB increments for LARGE_MEMORY might be a reasonable spread. Or one could go even larger, noting that on most systems, overcommitted memory is not a problem until it is used. Until we write to it, it doesn't actually use physical RAM, just virtual address space. Or we could stay small, but make it possible to programmatically increase the granularity from within R. For ease of reference, here are the relevant sections of code: https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217 (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999) 217 #ifndef R_NSIZE 218 #define R_NSIZE 350000L 219 #endif 220 #ifndef R_VSIZE 221 #define R_VSIZE 6291456L 222 #endif https://github.com/wch/r-source/blob/master/src/main/startup.c#L169 (ripley last authored on Jun 9, 2004) 157 Rp->vsize = R_VSIZE; 158 Rp->nsize = R_NSIZE; 166 #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */ 167 #define Max_Vsize R_SIZE_T_MAX /* unlimited */ 169 #define Min_Nsize 220000 170 #define Min_Vsize (1*Mega) https://github.com/wch/r-source/blob/master/src/main/memory.c#L335 (luke last authored on Nov 1, 2000) #ifdef SMALL_MEMORY 336 /* On machines with only 32M of memory (or on a classic Mac OS port) 337 it might be a good idea to use settings like these that are more 338 aggressive at keeping memory usage down. */ 339 static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2; 340 static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0; 341 static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2; 342 static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0; 343#else 344 static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2; 345 static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0; 346 static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2; 347 static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0; 348#endif static void AdjustHeapSize(R_size_t size_needed) { R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac); R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac); R_size_t NNeeded = R_NodesInUse + R_MinNFree; R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize + size_needed + R_MinVFree; double node_occup = ((double) NNeeded) / R_NSize; double vect_occup = ((double) VNeeded) / R_VSize; if (node_occup > R_NGrowFrac) { R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac * R_NSize); if (R_MaxNSize >= R_NSize + change) R_NSize += change; } else if (node_occup < R_NShrinkFrac) { R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize); if (R_NSize < NNeeded) R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize; if (R_NSize < orig_R_NSize) R_NSize = orig_R_NSize; } if (vect_occup > 1.0 && VNeeded < R_MaxVSize) R_VSize = VNeeded; if (vect_occup > R_VGrowFrac) { R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac * R_VSize); if (R_MaxVSize - R_VSize >= change) R_VSize += change; } else if (vect_occup < R_VShrinkFrac) { R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize; if (R_VSize < VNeeded) R_VSize = VNeeded; if (R_VSize < orig_R_VSize) R_VSize = orig_R_VSize; } DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup); } Rp->nsize is overridden at startup by environment variable R_NSIZE if Min_Nsize <= $R_NSIZE <= Max_Nsize. Rp->vsize is overridden at startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <= Max_Vsize. These are then used to set the global variables R_Nsize and R_Vsize with R_SetMaxVSize(Rp->max_vsize). ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel