[Rd] "bug" and patch: quadratic running time for strsplit(..., fixed=TRUE) (PR#9902)
Full_Name: John Brzustowski Version: R-devel-trunk, R-2.4.0 OS: linux, gcc 4.0.3 Submission from: (NULL) (206.248.157.184) This isn't a bug, but an easily-remedied performance issue. SYMPTOM > for (i in 1000 * (1:20)) { y <- paste(rep("asdf", times=i), collapse=" ") t <- system.time(strsplit(y, " ", fixed=TRUE)) cat(sprintf("i=%5d time=%5d msec\n",i, round(1000*t[1]))) } i= 1000 time=2 msec i= 2000 time=9 msec i= 3000 time= 20 msec i= 4000 time= 34 msec i= 5000 time= 57 msec i= 6000 time= 77 msec i= 7000 time= 107 msec i= 8000 time= 136 msec i= 9000 time= 177 msec i=1 time= 230 msec i=11000 time= 275 msec i=12000 time= 308 msec i=13000 time= 371 msec i=14000 time= 446 msec i=15000 time= 544 msec i=16000 time= 639 msec i=17000 time= 726 msec i=18000 time= 864 msec i=19000 time= 944 msec i=2 time= 1106 msec DIAGNOSIS strsplit() uses strlen() in the bounds check clause of a for(;;) statement, which forces a full scan of the source string for each character in the source string. Unlike R's LENGTH() macro, strlen for C strings is an expensive operation, and in this case (at least), gcc 4.0.3's -O2 level optimizer is not able to recognize the call as a loop invariant, despite the declaration "const char *buf". REMEDIED BEHAVIOUR i= 1000 time=0 msec i= 2000 time=1 msec i= 3000 time=1 msec i= 4000 time=0 msec i= 5000 time=1 msec i= 6000 time=1 msec i= 7000 time=1 msec i= 8000 time=2 msec i= 9000 time=2 msec i=1 time=2 msec i=11000 time=2 msec i=12000 time=2 msec i=13000 time=2 msec i=14000 time=2 msec i=15000 time=4 msec i=16000 time=3 msec i=17000 time=3 msec i=18000 time=4 msec i=19000 time=3 msec i=2 time=4 msec RELATED ISSUES A simple search turns up other instances of this usage in R's source. For completeness, I'm submitting patches for all of them, but have not tested whether they in fact cause a detectable performance problem. In the case of modules/X11/dataentry.c, the patch also fixes a presumably ineffectual "bug". $ grep -nR "for *([^;]*;[^;]*strlen *(" * main/rlocale.c:137: for (i = 0; i < strlen(lc_str) && i < sizeof(lc_str); i++) main/printutils.c:486: for(j = 0; j < strlen(buf); j++) *q++ = buf[j]; main/sysutils.c:493:for(j = 0; j < strlen(sub); j++) *outbuf++ = sub[j]; modules/X11/rotated.c:608: for(i=0; i 1 && strncmp(bufp, split, slen))) continue; ntok++; @@ -480,7 +481,7 @@ /* This is UTF-8 safe since it compares whole strings, but it would be more efficient to skip along by chars. */ - for(; bufp - buf < strlen(buf); bufp++) { + for(; bufp < ebuf; bufp++) { if((slen == 1 && *bufp != *split) || (slen > 1 && strncmp(bufp, split, slen))) continue; if(slen) { Index: src/main/rlocale.c === --- src/main/rlocale.c (revision 42792) +++ src/main/rlocale.c (working copy) @@ -127,14 +127,14 @@ int Ri18n_wcwidth(wchar_t c) { char lc_str[128]; -unsigned int i; +unsigned int i, j; static char *lc_cache = ""; static int lc = 0; if (0 != strcmp(setlocale(LC_CTYPE, NULL), lc_cache)) { strncpy(lc_str, setlocale(LC_CTYPE, NULL), sizeof(lc_str)); - for (i = 0; i < strlen(lc_str) && i < sizeof(lc_str); i++) + for (i = 0, j = strlen(lc_str); i < j && i < sizeof(lc_str); i++) lc_str[i] = toupper(lc_str[i]); for (i = 0; i < (sizeof(cjk_locale_name)/sizeof(cjk_locale_name_t)); i++) { Index: src/main/printutils.c === --- src/main/printutils.c (revision 42792) +++ src/main/printutils.c (working copy) @@ -483,7 +483,8 @@ else #endif snprintf(buf, 11, "\\u%04x", k); - for(j = 0; j < strlen(buf); j++) *q++ = buf[j]; + memcpy(q, buf, j = strlen(buf)); + q += j; p += res; } i += (res - 1); Index: src/main/sysutils.c === --- src/main/sysutils.c (revision 42792) +++ src/main/sysutils.c (working copy) @@ -490,8 +490,9 @@ R_AllocStringBuffer(2*cbuff.bufsize, &cbuff); goto top_of_loop; } - for(j = 0; j < strlen(sub); j++) *outbuf++ = sub[j]; - outb -= strlen(sub); + memcpy(outbuf, sub, j = strlen(sub)); + outbuf += j; + outb -= j; } inbuf++; inb--; goto next_char; Index: src/modules/
Re: [Rd] orthographic mistake in ?citation (PR#9901)
On 07/09/2007 3:31 AM, [EMAIL PROTECTED] wrote: > from ?citation > > > Details: > > The R core development team and the very active community of > package authors have invested a lot of time and effort in creating > R as it is today. Please give credit where credit is due and cite > R and R packages when you use them for data anlysis. Fixed, thanks. BTW, no need to report spelling errors to the bug list: reporting them on R-devel is simpler. Duncan Murdoch > ^^^ > > > --> change "anlysis" to "analysis". > > > Bye > > > > <> > > > > --please do not edit the information below-- > > Version: > platform = i486-pc-linux-gnu > arch = i486 > os = linux-gnu > system = i486, linux-gnu > status = > major = 2 > minor = 5.1 > year = 2007 > month = 06 > day = 27 > svn rev = 42083 > language = R > version.string = R version 2.5.1 (2007-06-27) > > Locale: > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C > > > Search Path: > .GlobalEnv, package:datasets, package:rcompgen, package:grDevices, > package:graphics, package:utils, package:stats, package:methods, Autoloads, > package:base > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] orthographic mistake in ?citation (PR#9901)
from ?citation Details: The R core development team and the very active community of package authors have invested a lot of time and effort in creating R as it is today. Please give credit where credit is due and cite R and R packages when you use them for data anlysis. ^^^ --> change "anlysis" to "analysis". Bye <> --please do not edit the information below-- Version: platform = i486-pc-linux-gnu arch = i486 os = linux-gnu system = i486, linux-gnu status = major = 2 minor = 5.1 year = 2007 month = 06 day = 27 svn rev = 42083 language = R version.string = R version 2.5.1 (2007-06-27) Locale: LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=it_IT.UTF-8;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:datasets, package:rcompgen, package:grDevices, package:graphics, package:utils, package:stats, package:methods, Autoloads, package:base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rcmd install on Vista
Has anyone successfully used Rcmd install or Rcmd check on Windows Vista? I have been successfully been running R itself, just not Rcmd install and Rcmd check. Rcmd check fails, the Ryacas.Rcheck it creates is read-only, I don't have permission to delete it and I have to reset the permissions on it just to delete it. With Rcmd install I am getting the following which looks like I am running into permissions problems and, of course, these also show up in Rcmd check. I had installed R into C:\Program Files\R\R-2.6.0 and my library location is in C:/Users/... as seen below. I found I had to set tmpdir and R_LIBS appropriately to run Rcmd check and install to even get this far. --- Making package Ryacas adding build stamp to DESCRIPTION installing R files installing demos installing inst files find: C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc: Permission denied find: C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/yacdir: Permission denied ... make[2]: *** [C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/inst] Error 1 make[1]: *** [all] Error 2 make: *** [pkg-Ryacas] Error 2 *** installation of Ryacas failed *** Removing 'C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas' Can't read C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc: Invalid argument at C:\PROGRA~1\R\R-26~1.0/bin/install line 434 Can't remove directory C:/Users/ggroth/Documents/R/win-library/2.6/Ryacas/doc: Directory not empty at C:\PROGRA~1\R\R-26~1.0/bin/install line 434 ... and additionl similar lines ... __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel