Hi,
In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed
to speed up the CHARSXP cache maintenance during GC using threading. This was
rejected by Luke in
https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html.
Here I want to propose an alternative approach to significantly speed up
CHARSXP cache maintenance during partial GCs. A patch which passes `make
check-devel` is attached. Compared to R devel (revision 81110) I get the
following performance improvements on my system:
Elapsed time for five non-full gc in a session after
x <- as.character(runif(5e7))[]
gc(full = TRUE)
+20sec -> ~1sec.
This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs.
However, I did not measure dramatic differences:
y <- "old_CHARSXP"
after
x <- "old_CHARSXP"; gc(); gc()
takes a median 32 nanoseconds with and without the patch.
gc(full = TRUE)
in a new session takes a median 16 milliseconds with and 14 without the patch.
The basic idea is to maintain the CHARSXP cache using subtables in
R_StringHash, one for each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS +
1) GC generations. New CHARSXPs are added by mkCharLenCE() to the subtable of
the youngest generation. After a partial GC, only the chains anchored at the
subtables of the youngest (num_old_gens_to_collect + 1) generations need to be
searched for and cleaned of unmarked nodes. Afterwards, these chains need to be
merged into those of the respective next generation, if any. This approach
relies on the fact that an object/CHARSXP can never become younger again. It is
OK though if an object/CHARSXP "skips" a GC generation.
R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is
structured such that the chains for the same hashcode but for different
generations are anchored at slots of R_StringHash which are next to each other
in memory. This is because we often need to access two or more (i.e. currently
all three) of them for one operation and this avoids cache misses.
HASHPRI, i.e. the number of occupied primary slots, is computed and stored as
NUM_GC_GENERATIONS times the number of slots which are occupied in at least one
of the subtables. This is done because in mkCharLenCE() we need to iterate
through one or more chains if and only if there is a chain for the particular
hashcode in at least one subtable.
I tried to keep the patch as minimal as possible. In particular, I did not add
long vector support to R_StringHash. I rather reduced the max value of
char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not
larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(),
but I could do so if the patch is accepted.
Thanks for your consideration and feedback.
Regards,
Andreas
P.S. I had a hard time to get the indentation right in the patch due the mix of
tabs and spaces. Sorry, if I screwed this up.
Index: src/include/Defn.h
===================================================================
--- src/include/Defn.h (revision 81110)
+++ src/include/Defn.h (working copy)
@@ -94,6 +94,7 @@
*/
#define NAMED_BITS 16
+#define NUM_GC_GENERATIONS 3
/* Flags */
Index: src/main/envir.c
===================================================================
--- src/main/envir.c (revision 81110)
+++ src/main/envir.c (working copy)
@@ -4079,7 +4079,7 @@
void attribute_hidden InitStringHash()
{
- R_StringHash = R_NewHashTable(char_hash_size);
+ R_StringHash = R_NewHashTable(char_hash_size * NUM_GC_GENERATIONS);
}
/* #define DEBUG_GLOBAL_STRING_HASH 1 */
@@ -4089,7 +4089,7 @@
{
SEXP old_table = R_StringHash;
SEXP new_table, chain, new_chain, val, next;
- unsigned int counter, new_hashcode, newmask;
+ unsigned int counter, new_slot, newmask;
#ifdef DEBUG_GLOBAL_STRING_HASH
unsigned int oldsize = HASHSIZE(R_StringHash);
unsigned int oldpri = HASHPRI(R_StringHash);
@@ -4103,27 +4103,38 @@
/* When using the ATTRIB fields to maintain the chains the chain
moving is destructive and does not involve allocation. This is
therefore the only point where GC can occur. */
- new_table = R_NewHashTable(newsize);
+ new_table = R_NewHashTable(newsize * NUM_GC_GENERATIONS);
newmask = newsize - 1;
- /* transfer chains from old table to new table */
- for (counter = 0; counter < LENGTH(old_table); counter++) {
- chain = VECTOR_ELT(old_table, counter);
- while (!ISNULL(chain)) {
- val = CXHEAD(chain);
- next = CXTAIL(chain);
- new_hashcode = char_hash(CHAR(val), LENGTH(val)) & newmask;
- new_chain = VECTOR_ELT(new_table, new_hashcode);
- /* If using a primary slot then increase HASHPRI */
- if (ISNULL(new_chain))
- SET_HASHPRI(new_table, HASHPRI(new_table) + 1);
- /* move the current chain link to the new chain */
- /* this is a destructive modification */
- new_chain = SET_CXTAIL(val, new_chain);
- SET_VECTOR_ELT(new_table, new_hashcode, new_chain);
- chain = next;
+ /* transfer chains from old table to new table */
+ for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+ for (counter = gen; counter < LENGTH(old_table); counter = counter + NUM_GC_GENERATIONS) {
+ chain = VECTOR_ELT(old_table, counter);
+ while (!ISNULL(chain)) {
+ val = CXHEAD(chain);
+ next = CXTAIL(chain);
+ new_slot = (char_hash(CHAR(val), LENGTH(val)) & newmask) * NUM_GC_GENERATIONS + gen;
+ new_chain = VECTOR_ELT(new_table, new_slot);
+ /* move the current chain link to the new chain */
+ /* this is a destructive modification */
+ new_chain = SET_CXTAIL(val, new_chain);
+ SET_VECTOR_ELT(new_table, new_slot, new_chain);
+ chain = next;
+ }
}
+ }
+
+ /* compute and set HASHPRI */
+ int nc = 0;
+ for (int i = 0; i < LENGTH(new_table); i = i + NUM_GC_GENERATIONS) {
+ int null_cntr = 0;
+ for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+ null_cntr += VECTOR_ELT(new_table, i + gen) == R_NilValue;
+ }
+ if (null_cntr != NUM_GC_GENERATIONS) nc++;
}
+ SET_HASHPRI(new_table, nc * NUM_GC_GENERATIONS);
+
R_StringHash = new_table;
char_hash_size = newsize;
char_hash_mask = newmask;
@@ -4189,21 +4200,27 @@
default: need_enc = 0;
}
- hashcode = char_hash(name, len) & char_hash_mask;
+ hashcode = (char_hash(name, len) & char_hash_mask) * NUM_GC_GENERATIONS;
/* Search for a cached value */
cval = R_NilValue;
- chain = VECTOR_ELT(R_StringHash, hashcode);
- for (; !ISNULL(chain) ; chain = CXTAIL(chain)) {
- SEXP val = CXHEAD(chain);
- if (TYPEOF(val) != CHARSXP) break; /* sanity check */
- if (need_enc == (ENC_KNOWN(val) | IS_BYTES(val)) &&
- LENGTH(val) == len && /* quick pretest */
- (!len || (memcmp(CHAR(val), name, len) == 0))) { // called with len = 0
- cval = val;
- break;
+ int null_cntr = 0;
+ for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+ chain = VECTOR_ELT(R_StringHash, hashcode + gen);
+ if (ISNULL(chain)) null_cntr++;
+ for (; !ISNULL(chain) ; chain = CXTAIL(chain)) {
+ SEXP val = CXHEAD(chain);
+ if (TYPEOF(val) != CHARSXP) break; /* sanity check */
+ if (need_enc == (ENC_KNOWN(val) | IS_BYTES(val)) &&
+ LENGTH(val) == len && /* quick pretest */
+ (!len || (memcmp(CHAR(val), name, len) == 0))) { // called with len = 0
+ cval = val;
+ break;
+ }
+ }
+ if (cval != R_NilValue) break;
}
- }
+
if (cval == R_NilValue) {
/* no cached value; need to allocate one and add to the cache */
PROTECT(cval = allocCharsxp(len));
@@ -4227,8 +4244,8 @@
SET_CACHED(cval); /* Mark it */
/* add the new value to the cache */
chain = VECTOR_ELT(R_StringHash, hashcode);
- if (ISNULL(chain))
- SET_HASHPRI(R_StringHash, HASHPRI(R_StringHash) + 1);
+ if (null_cntr == NUM_GC_GENERATIONS)
+ SET_HASHPRI(R_StringHash, HASHPRI(R_StringHash) + NUM_GC_GENERATIONS);
/* this is a destructive modification */
chain = SET_CXTAIL(cval, chain);
SET_VECTOR_ELT(R_StringHash, hashcode, chain);
@@ -4237,9 +4254,14 @@
protected.
Maximum possible power of two is 2^30 for a VECSXP.
FIXME: this has changed with long vectors.
+ We have an actual limit of 2^29 since we need to store 3
+ subtables - one for each gc generation - in the VECSXP.
*/
+#if NUM_GC_GENERATIONS > 3
+# error if the total number of gc generations is larger than 3, the resizing of R_StringHash must be adjusted
+#endif
if (R_HashSizeCheck(R_StringHash)
- && char_hash_size < 1073741824 /* 2^30 */)
+ && char_hash_size < 536870912 /* 2^29 */)
R_StringHash_resize(char_hash_size * 2);
UNPROTECT(1);
Index: src/main/memory.c
===================================================================
--- src/main/memory.c (revision 81110)
+++ src/main/memory.c (working copy)
@@ -527,6 +527,10 @@
# error number of old generations must be 1 or 2
#endif
+#if NUM_OLD_GENERATIONS + 1 != NUM_GC_GENERATIONS
+# error number of old generations must be 1 less than the total number of generations
+#endif
+
#define NODE_GENERATION(s) ((s)->sxpinfo.gcgen)
#define SET_NODE_GENERATION(s,g) ((s)->sxpinfo.gcgen=(g))
@@ -1852,27 +1856,44 @@
if (R_StringHash != NULL) /* in case of GC during initialization */
{
SEXP t;
+ for (gen = num_old_gens_to_collect; gen >= 0; gen--) {
+ for (i = gen; i < LENGTH(R_StringHash); i = i + NUM_GC_GENERATIONS) {
+ s = VECTOR_ELT(R_StringHash, i);
+ t = R_NilValue;
+ while (s != R_NilValue) {
+ if (! NODE_IS_MARKED(CXHEAD(s))) { /* remove unused CHARSXP and cons cell */
+ if (t == R_NilValue) /* head of list */
+ VECTOR_ELT(R_StringHash, i) = CXTAIL(s);
+ else
+ CXTAIL(t) = CXTAIL(s);
+ s = CXTAIL(s);
+ continue;
+ }
+ FORWARD_NODE(s);
+ FORWARD_NODE(CXHEAD(s));
+ t = s;
+ s = CXTAIL(s);
+ }
+ /* merge current chain into that of the next generation by putting
+ the young in front of the old; t is the end of the current chain */
+ if (gen != NUM_OLD_GENERATIONS && VECTOR_ELT(R_StringHash, i) != R_NilValue) {
+ CXTAIL(t) = VECTOR_ELT(R_StringHash, i + 1);
+ VECTOR_ELT(R_StringHash, i + 1) = VECTOR_ELT(R_StringHash, i);
+ VECTOR_ELT(R_StringHash, i) = R_NilValue;
+ }
+ }
+ }
+
+ /* compute and set HASHPRI */
int nc = 0;
- for (i = 0; i < LENGTH(R_StringHash); i++) {
- s = VECTOR_ELT(R_StringHash, i);
- t = R_NilValue;
- while (s != R_NilValue) {
- if (! NODE_IS_MARKED(CXHEAD(s))) { /* remove unused CHARSXP and cons cell */
- if (t == R_NilValue) /* head of list */
- VECTOR_ELT(R_StringHash, i) = CXTAIL(s);
- else
- CXTAIL(t) = CXTAIL(s);
- s = CXTAIL(s);
- continue;
+ for (i = 0; i < LENGTH(R_StringHash); i = i + NUM_GC_GENERATIONS) {
+ int null_cntr = 0;
+ for (gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+ null_cntr += VECTOR_ELT(R_StringHash, i + gen) == R_NilValue;
}
- FORWARD_NODE(s);
- FORWARD_NODE(CXHEAD(s));
- t = s;
- s = CXTAIL(s);
- }
- if(VECTOR_ELT(R_StringHash, i) != R_NilValue) nc++;
+ if (null_cntr != NUM_GC_GENERATIONS) nc++;
}
- SET_TRUELENGTH(R_StringHash, nc); /* SET_HASHPRI, really */
+ SET_TRUELENGTH(R_StringHash, nc * NUM_GC_GENERATIONS); /* SET_HASHPRI, really */
}
/* chains are known to be marked so don't need to scan again */
FORWARD_AND_PROCESS_ONE_NODE(R_StringHash, VECSXP);
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel