Hi Luke, As proposed by you, I have just added this to the wishlist in Bugzilla: https://bugs.r-project.org/show_bug.cgi?id=18233
Best, Andreas 2021-11-04 17:04 GMT+01:00 luke-tier...@uiowa.edu: > Can you please submit this as a wishlist item to bugzilla? it is > easier to keep track of there. You could also submit your threads > based suggestion there, again to keep it easier to keep track of and > possibly get back to in the future. > > I will have a look at your approach when I get a chance, but I am > exploring a different approach to avoid scanning old generations that > may be simpler. > > Best, > > luke > > On Wed, 3 Nov 2021, Andreas Kersting wrote: > >> Hi, >> >> In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I >> proposed to speed up the CHARSXP cache maintenance during GC using >> threading. This was rejected by Luke in >> https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html. >> >> Here I want to propose an alternative approach to significantly speed up >> CHARSXP cache maintenance during partial GCs. A patch which passes `make >> check-devel` is attached. Compared to R devel (revision 81110) I get the >> following performance improvements on my system: >> >> Elapsed time for five non-full gc in a session after >> >> x <- as.character(runif(5e7))[] >> gc(full = TRUE) >> >> +20sec -> ~1sec. >> >> >> This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. >> However, I did not measure dramatic differences: >> >> y <- "old_CHARSXP" >> >> after >> >> x <- "old_CHARSXP"; gc(); gc() >> >> takes a median 32 nanoseconds with and without the patch. >> >> >> gc(full = TRUE) >> >> in a new session takes a median 16 milliseconds with and 14 without the >> patch. >> >> >> The basic idea is to maintain the CHARSXP cache using subtables in >> R_StringHash, one for each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS >> + 1) GC generations. New CHARSXPs are added by mkCharLenCE() to the subtable >> of the youngest generation. After a partial GC, only the chains anchored at >> the subtables of the youngest (num_old_gens_to_collect + 1) generations need >> to be searched for and cleaned of unmarked nodes. Afterwards, these chains >> need to be merged into those of the respective next generation, if any. This >> approach relies on the fact that an object/CHARSXP can never become younger >> again. It is OK though if an object/CHARSXP "skips" a GC generation. >> >> R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), >> is structured such that the chains for the same hashcode but for different >> generations are anchored at slots of R_StringHash which are next to each >> other in memory. This is because we often need to access two or more (i.e. >> currently all three) of them for one operation and this avoids cache misses. >> >> HASHPRI, i.e. the number of occupied primary slots, is computed and stored >> as NUM_GC_GENERATIONS times the number of slots which are occupied in at >> least one of the subtables. This is done because in mkCharLenCE() we need to >> iterate through one or more chains if and only if there is a chain for the >> particular hashcode in at least one subtable. >> >> I tried to keep the patch as minimal as possible. In particular, I did not >> add long vector support to R_StringHash. I rather reduced the max value of >> char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not >> larger than) 2. I also did not yet adjust do_show_cache() and >> do_write_cache(), but I could do so if the patch is accepted. >> >> Thanks for your consideration and feedback. >> >> Regards, >> Andreas >> >> >> P.S. I had a hard time to get the indentation right in the patch due the mix >> of tabs and spaces. Sorry, if I screwed this up. > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel