I have made the change to the threshold calculation (R_VSize instead of R_NSize for the vector heap) in R-patched and R-devel. Seems to have negligible impact on the standard tests and VR scripts.
Best, luke On Thu, 9 Nov 2006, Vladimir Dergachev wrote: > On Thursday 09 November 2006 12:21 pm, Luke Tierney wrote: >> On Wed, 8 Nov 2006, Vladimir Dergachev wrote: >>> On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote: >>>> On Mon, 6 Nov 2006, Vladimir Dergachev wrote: >>> >>> Hi Luke, >>> >>> Yes, I gladly concede the point that for a heuristic algorithm the >>> notion of what is a "bug" is murky (besides crashes, etc, which is not >>> what I am not talking about). >>> >>> Here is why I called this a bug: >>> >>> 1. My understanding is that each time gc() needs to increase memory >>> it performs a full garbage collection run. Right ? >> >> The allocation process does not call gc before every call to malloc. >> It only calls gc if the allocation would cross a threshold level. >> Those theshold levels are adjusted in an effort to compromise between >> keeping memory footprint low and not calling gc too often. The code >> you quote below is part of this adjustment process. If this process >> is working properly then as memory use grows there will initially be >> more gc activity and then less as the thresholds adjust. > > Well, I was seeing it call gc for every large vector. This probably happens be > only for those larger than R_VGrowIncrFrac * R_NSize. On my system R_NSize > is never more than 1e6 so this would explain the problems when using 1e6 (and > larger) vectors. > >> >>> 2. This is not a problem with small memory sizes as they imply >>> (presumably) small number of objects. >>> >>> 3. However, if one wants to allocate many objects (say columns in a >>> data frame or just vectors) this results in large penalty >>> >>> Example 1: This simulates allocation of a data.frame with some character >>> columns which are assumed to be factors. On my system first assignment is >>> nearly instantaneous, why subsequent assignments take slightly less than >>> 0.1 seconds each. >> >> I'm not sure these are quite doing what you intend. You define Chars >> but don't use it. Also, system.time by default calls gc() before >> doing the evaluation. Giving FALSE as the second argument may give you >> a more realistic picture. > > The Chars are defined to create lots of ncells and make gc() run time more > realistic. It also mimics having a data.frame with a few factor columns. > > As for system.time - thank you, I missed that ! > Setting gcFirst=FALSE changes behavior in the first example to be 2 times > faster and makes all the allocations in the second example faster. > > I guess that extra call to gc() caused R_VSize to shrink too fast. > >>> I looked more carefully at your code in src/main/memory.c, function >>> AdjustHeapSize: >>> >>> R_VSize = VNeeded; >>> if (vect_occup > R_VGrowFrac) { >>> R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize; >>> if (R_MaxVSize - R_VSize >= change) >>> R_VSize += change; >>> } >>> >>> Could it be that R_NSize should be R_VSize ? This would explain why I see >>> a problem in case R_VSize>>R_NSize. >> >> That does indeed look like a bug and that R_NSize should be R_VSize -- >> well spotted, thanks. I will need to experiment with this a bit more >> to see if it can safely be changed. It will increase the memory >> footprint a bit. Probaly not by enough to matter but if it does we >> may need to adjust some of the tuning constants. >> > > Would there be something I can help you with ? Is there a script to run > through common usage patterns ? > > thank you ! > > Vladimir Dergachev > > >> Best, >> >> luke >> > > -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: [EMAIL PROTECTED] Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel