Hi,

In https://stat.ethz.ch/pipermail/r-devel/2021-October/081147.html I proposed 
to speed up the CHARSXP cache maintenance during GC using threading. This was 
rejected by Luke in 
https://stat.ethz.ch/pipermail/r-devel/2021-October/081172.html.

Here I want to propose an alternative approach to significantly speed up 
CHARSXP cache maintenance during partial GCs. A patch which passes `make 
check-devel` is attached. Compared to R devel (revision 81110) I get the 
following performance improvements on my system:

Elapsed time for five non-full gc in a session after

x <- as.character(runif(5e7))[]
gc(full = TRUE)

+20sec -> ~1sec.


This patch introduces (theoretical) overheads to mkCharLenCE() and full GCs. 
However, I did not measure dramatic differences:

y <- "old_CHARSXP" 

after

x <- "old_CHARSXP"; gc(); gc()

takes a median 32 nanoseconds with and without the patch.


gc(full = TRUE)

in a new session takes a median 16 milliseconds with and 14 without the patch.


The basic idea is to maintain the CHARSXP cache using subtables in 
R_StringHash, one for each of the (NUM_GC_GENERATIONS := NUM_OLD_GENERATIONS + 
1) GC generations. New CHARSXPs are added by mkCharLenCE() to the subtable of 
the youngest generation. After a partial GC, only the chains anchored at the 
subtables of the youngest (num_old_gens_to_collect + 1) generations need to be 
searched for and cleaned of unmarked nodes. Afterwards, these chains need to be 
merged into those of the respective next generation, if any. This approach 
relies on the fact that an object/CHARSXP can never become younger again. It is 
OK though if an object/CHARSXP "skips" a GC generation.

R_StringHash, which is now of length (NUM_GC_GENERATIONS * char_hash_size), is 
structured such that the chains for the same hashcode but for different 
generations are anchored at slots of R_StringHash which are next to each other 
in memory. This is because we often need to access two or more (i.e. currently 
all three) of them for one operation and this avoids cache misses.

HASHPRI, i.e. the number of occupied primary slots, is computed and stored as 
NUM_GC_GENERATIONS times the number of slots which are occupied in at least one 
of the subtables. This is done because in mkCharLenCE() we need to iterate 
through one or more chains if and only if there is a chain for the particular 
hashcode in at least one subtable.

I tried to keep the patch as minimal as possible. In particular, I did not add 
long vector support to R_StringHash. I rather reduced the max value of 
char_hash_size from 2^30 to 2^29, assuming that NUM_OLD_GENERATIONS is (not 
larger than) 2. I also did not yet adjust do_show_cache() and do_write_cache(), 
but I could do so if the patch is accepted.

Thanks for your consideration and feedback.

Regards,
Andreas


P.S. I had a hard time to get the indentation right in the patch due the mix of 
tabs and spaces. Sorry, if I screwed this up.
Index: src/include/Defn.h
===================================================================
--- src/include/Defn.h	(revision 81110)
+++ src/include/Defn.h	(working copy)
@@ -94,6 +94,7 @@
  */
 
 #define NAMED_BITS 16
+#define NUM_GC_GENERATIONS 3
 
 /* Flags */
 
Index: src/main/envir.c
===================================================================
--- src/main/envir.c	(revision 81110)
+++ src/main/envir.c	(working copy)
@@ -4079,7 +4079,7 @@
 
 void attribute_hidden InitStringHash()
 {
-    R_StringHash = R_NewHashTable(char_hash_size);
+    R_StringHash = R_NewHashTable(char_hash_size * NUM_GC_GENERATIONS);
 }
 
 /* #define DEBUG_GLOBAL_STRING_HASH 1 */
@@ -4089,7 +4089,7 @@
 {
     SEXP old_table = R_StringHash;
     SEXP new_table, chain, new_chain, val, next;
-    unsigned int counter, new_hashcode, newmask;
+    unsigned int counter, new_slot, newmask;
 #ifdef DEBUG_GLOBAL_STRING_HASH
     unsigned int oldsize = HASHSIZE(R_StringHash);
     unsigned int oldpri = HASHPRI(R_StringHash);
@@ -4103,27 +4103,38 @@
     /* When using the ATTRIB fields to maintain the chains the chain
        moving is destructive and does not involve allocation.  This is
        therefore the only point where GC can occur. */
-    new_table = R_NewHashTable(newsize);
+    new_table = R_NewHashTable(newsize * NUM_GC_GENERATIONS);
     newmask = newsize - 1;
 
-    /* transfer chains from old table to new table */
-    for (counter = 0; counter < LENGTH(old_table); counter++) {
-	chain = VECTOR_ELT(old_table, counter);
-	while (!ISNULL(chain)) {
-	    val = CXHEAD(chain);
-	    next = CXTAIL(chain);
-	    new_hashcode = char_hash(CHAR(val), LENGTH(val)) & newmask;
-	    new_chain = VECTOR_ELT(new_table, new_hashcode);
-	    /* If using a primary slot then increase HASHPRI */
-	    if (ISNULL(new_chain))
-		SET_HASHPRI(new_table, HASHPRI(new_table) + 1);
-	    /* move the current chain link to the new chain */
-	    /* this is a destructive modification */
-	    new_chain = SET_CXTAIL(val, new_chain);
-	    SET_VECTOR_ELT(new_table, new_hashcode, new_chain);
-	    chain = next;
+	/* transfer chains from old table to new table */
+	for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+	for (counter = gen; counter < LENGTH(old_table); counter = counter + NUM_GC_GENERATIONS) {
+		chain = VECTOR_ELT(old_table, counter);
+		while (!ISNULL(chain)) {
+		    val = CXHEAD(chain);
+		    next = CXTAIL(chain);
+		    new_slot = (char_hash(CHAR(val), LENGTH(val)) & newmask) * NUM_GC_GENERATIONS + gen;
+		    new_chain = VECTOR_ELT(new_table, new_slot);
+		    /* move the current chain link to the new chain */
+		    /* this is a destructive modification */
+		    new_chain = SET_CXTAIL(val, new_chain);
+		    SET_VECTOR_ELT(new_table, new_slot, new_chain);
+		    chain = next;
+		}
 	}
+	}
+
+    /* compute and set HASHPRI */
+    int nc = 0;
+    for (int i = 0; i < LENGTH(new_table); i = i + NUM_GC_GENERATIONS) {
+          int null_cntr = 0;
+          for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+                null_cntr += VECTOR_ELT(new_table, i + gen) == R_NilValue;
+          }
+          if (null_cntr != NUM_GC_GENERATIONS) nc++;
     }
+    SET_HASHPRI(new_table, nc * NUM_GC_GENERATIONS);
+
     R_StringHash = new_table;
     char_hash_size = newsize;
     char_hash_mask = newmask;
@@ -4189,21 +4200,27 @@
     default: need_enc = 0;
     }
 
-    hashcode = char_hash(name, len) & char_hash_mask;
+    hashcode = (char_hash(name, len) & char_hash_mask) * NUM_GC_GENERATIONS;
 
     /* Search for a cached value */
     cval = R_NilValue;
-    chain = VECTOR_ELT(R_StringHash, hashcode);
-    for (; !ISNULL(chain) ; chain = CXTAIL(chain)) {
-	SEXP val = CXHEAD(chain);
-	if (TYPEOF(val) != CHARSXP) break; /* sanity check */
-	if (need_enc == (ENC_KNOWN(val) | IS_BYTES(val)) &&
-	    LENGTH(val) == len &&  /* quick pretest */
-	    (!len || (memcmp(CHAR(val), name, len) == 0))) { // called with len = 0
-	    cval = val;
-	    break;
+    int null_cntr = 0;
+	for (int gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+		chain = VECTOR_ELT(R_StringHash, hashcode + gen);
+		if (ISNULL(chain)) null_cntr++;
+		for (; !ISNULL(chain) ; chain = CXTAIL(chain)) {
+			SEXP val = CXHEAD(chain);
+			if (TYPEOF(val) != CHARSXP) break; /* sanity check */
+			if (need_enc == (ENC_KNOWN(val) | IS_BYTES(val)) &&
+			    LENGTH(val) == len &&  /* quick pretest */
+			    (!len || (memcmp(CHAR(val), name, len) == 0))) { // called with len = 0
+			    cval = val;
+			    break;
+			}
+		}
+		if (cval != R_NilValue) break;
 	}
-    }
+
     if (cval == R_NilValue) {
 	/* no cached value; need to allocate one and add to the cache */
 	PROTECT(cval = allocCharsxp(len));
@@ -4227,8 +4244,8 @@
 	SET_CACHED(cval);  /* Mark it */
 	/* add the new value to the cache */
 	chain = VECTOR_ELT(R_StringHash, hashcode);
-	if (ISNULL(chain))
-	    SET_HASHPRI(R_StringHash, HASHPRI(R_StringHash) + 1);
+	if (null_cntr == NUM_GC_GENERATIONS)
+	    SET_HASHPRI(R_StringHash, HASHPRI(R_StringHash) + NUM_GC_GENERATIONS);
 	/* this is a destructive modification */
 	chain = SET_CXTAIL(cval, chain);
 	SET_VECTOR_ELT(R_StringHash, hashcode, chain);
@@ -4237,9 +4254,14 @@
 	   protected.
 	   Maximum possible power of two is 2^30 for a VECSXP.
 	   FIXME: this has changed with long vectors.
+	   We have an actual limit of 2^29 since we need to store 3
+	   subtables - one for each gc generation - in the VECSXP.
 	*/
+#if NUM_GC_GENERATIONS > 3
+# error if the total number of gc generations is larger than 3, the resizing of R_StringHash must be adjusted
+#endif
 	if (R_HashSizeCheck(R_StringHash)
-	    && char_hash_size < 1073741824 /* 2^30 */)
+	    && char_hash_size < 536870912 /* 2^29 */)
 	    R_StringHash_resize(char_hash_size * 2);
 
 	UNPROTECT(1);
Index: src/main/memory.c
===================================================================
--- src/main/memory.c	(revision 81110)
+++ src/main/memory.c	(working copy)
@@ -527,6 +527,10 @@
 # error number of old generations must be 1 or 2
 #endif
 
+#if NUM_OLD_GENERATIONS + 1 != NUM_GC_GENERATIONS
+# error number of old generations must be 1 less than the total number of generations
+#endif
+
 #define NODE_GENERATION(s) ((s)->sxpinfo.gcgen)
 #define SET_NODE_GENERATION(s,g) ((s)->sxpinfo.gcgen=(g))
 
@@ -1852,27 +1856,44 @@
     if (R_StringHash != NULL) /* in case of GC during initialization */
     {
 	SEXP t;
+	for (gen = num_old_gens_to_collect; gen >= 0; gen--) {
+	for (i = gen; i < LENGTH(R_StringHash); i = i + NUM_GC_GENERATIONS) {
+		s = VECTOR_ELT(R_StringHash, i);
+		t = R_NilValue;
+		while (s != R_NilValue) {
+			if (! NODE_IS_MARKED(CXHEAD(s))) { /* remove unused CHARSXP and cons cell */
+				if (t == R_NilValue) /* head of list */
+					VECTOR_ELT(R_StringHash, i) = CXTAIL(s);
+				else
+					CXTAIL(t) = CXTAIL(s);
+				s = CXTAIL(s);
+				continue;
+			}
+			FORWARD_NODE(s);
+			FORWARD_NODE(CXHEAD(s));
+			t = s;
+			s = CXTAIL(s);
+		}
+		/* merge current chain into that of the next generation by putting
+		   the young in front of the old; t is the end of the current chain */
+		if (gen != NUM_OLD_GENERATIONS && VECTOR_ELT(R_StringHash, i) != R_NilValue) {
+			CXTAIL(t) = VECTOR_ELT(R_StringHash, i + 1);
+			VECTOR_ELT(R_StringHash, i + 1) = VECTOR_ELT(R_StringHash, i);
+			VECTOR_ELT(R_StringHash, i) = R_NilValue;
+		}
+	}
+	}
+
+	/* compute and set HASHPRI */
 	int nc = 0;
-	for (i = 0; i < LENGTH(R_StringHash); i++) {
-	    s = VECTOR_ELT(R_StringHash, i);
-	    t = R_NilValue;
-	    while (s != R_NilValue) {
-		if (! NODE_IS_MARKED(CXHEAD(s))) { /* remove unused CHARSXP and cons cell */
-		    if (t == R_NilValue) /* head of list */
-			VECTOR_ELT(R_StringHash, i) = CXTAIL(s);
-		    else
-			CXTAIL(t) = CXTAIL(s);
-		    s = CXTAIL(s);
-		    continue;
+	for (i = 0; i < LENGTH(R_StringHash); i = i + NUM_GC_GENERATIONS) {
+		int null_cntr = 0;
+		for (gen = 0; gen < NUM_GC_GENERATIONS; gen++) {
+			null_cntr += VECTOR_ELT(R_StringHash, i + gen) == R_NilValue;
 		}
-		FORWARD_NODE(s);
-		FORWARD_NODE(CXHEAD(s));
-		t = s;
-		s = CXTAIL(s);
-	    }
-	    if(VECTOR_ELT(R_StringHash, i) != R_NilValue) nc++;
+		if (null_cntr != NUM_GC_GENERATIONS) nc++;
 	}
-	SET_TRUELENGTH(R_StringHash, nc); /* SET_HASHPRI, really */
+	SET_TRUELENGTH(R_StringHash, nc * NUM_GC_GENERATIONS); /* SET_HASHPRI, really */
     }
     /* chains are known to be marked so don't need to scan again */
     FORWARD_AND_PROCESS_ONE_NODE(R_StringHash, VECSXP);
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to