[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
In November, we had a "bug repository conversation" with Peter Hagerty and myself: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 where the bug report title started with --->> "exists" is a bottleneck for dispatch and package loading, ... Peter proposed an extra simplified and henc faster version of exists(), and I commented > --- Comment #2 from Martin Maechler --- > I'm very grateful that you've started exploring the bottlenecks of loading > packages with many S4 classes (and methods)... > and I hope we can make real progress there rather sooner than later. > OTOH, your `summaryRprof()` in your vignette indicates that exists() may use > upto 10% of the time spent in library(reportingTools), and your speedup > proposals of exist() may go up to ca 30% which is good and well worth > considering, but still we can only expect 2-3% speedup for package loading > which unfortunately is not much. > Still I agree it is worth looking at exists() as you did ... and > consider providing a fast simplified version of it in addition to current > exists() [I think]. > BTW, as we talk about enhancements here, maybe consider a further possibility: > My subjective guess is that probably more than half of exists() uses are of the > form > if(exists(name, where, ...)) { >get(name, whare, ) >.. > } else { > NULL / error() / .. or similar > } > i.e. many exists() calls when returning TRUE are immediately followed by the > corresponding get() call which repeats quite a bit of the lookup that exists() > has done. > Instead, I'd imagine a function, say getifexists(name, ...) that does both at > once in the "exists is TRUE" case but in a way we can easily keep the if(.) .. > else clause above. One already existing approach would use > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), "error")) { > ... (( work with xx )) ... > } else { >NULL / error() / .. or similar > } > but of course our C implementation would be more efficient and use more concise > syntax {which should not look like error handling}. Follow ups to this idea > should really go to R-devel (the mailing list). and now I do follow up here myself : I found that 'getifexists()' is actually very simple to implement, I have already tested it a bit, but not yet committed to R-devel (the "R trunk" aka "master branch") because I'd like to get public comments {RFC := Request For Comments}. My version of the help file {for both exists() and getifexists()} rendered in text is -- help(getifexists) --- Is an Object Defined? Description: Look for an R object of the given name and possibly return it Usage: exists(x, where = -1, envir = , frame, mode = "any", inherits = TRUE) getifexists(x, where = -1, envir = as.environment(where), mode = "any", inherits = TRUE, value.if.not = NULL) Arguments: x: a variable name (given as a character string). where: where to look for the object (see the details section); if omitted, the function will search as if the name of the object appeared unquoted in an expression. envir: an alternative way to specify an environment to look in, but it is usually simpler to just use the ‘where’ argument. frame: a frame in the calling list. Equivalent to giving ‘where’ as ‘sys.frame(frame)’. mode: the mode or type of object sought: see the ‘Details’ section. inherits: should the enclosing frames of the environment be searched? value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not exist. Details: The ‘where’ argument can specify the environment in which to look for the object in any of several ways: as an integer (the position in the ‘search’ list); as the character string name of an element in the search list; or as an ‘environment’ (including using ‘sys.frame’ to access the currently active function calls). The ‘envir’ argument is an alternative way to specify an environment, but is primarily there for back compatibility. This function looks to see if the name ‘x’ has a value bound to it in the specified environment. If ‘inherits’ is ‘TRUE’ and a value is not found for ‘x’ in the specified environment, the enclosing frames of the environment are searched until the name ‘x’ is encountered. See ‘environment’ and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures. *Warning:* ‘inherits = TRUE’ is the default behaviour for R but not for S. If ‘mode’ is specified then only objects of that type are sought. The ‘mode’ may specify one of the collections ‘"numeric"’ and ‘"
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On 08/01/2015 4:16 AM, Martin Maechler wrote: > In November, we had a "bug repository conversation" > with Peter Hagerty and myself: > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > where the bug report title started with > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > Peter proposed an extra simplified and henc faster version of exists(), > and I commented > > > --- Comment #2 from Martin Maechler --- > > I'm very grateful that you've started exploring the bottlenecks of > loading > > packages with many S4 classes (and methods)... > > and I hope we can make real progress there rather sooner than later. > > > OTOH, your `summaryRprof()` in your vignette indicates that exists() > may use > > upto 10% of the time spent in library(reportingTools), and your speedup > > proposals of exist() may go up to ca 30% which is good and well worth > > considering, but still we can only expect 2-3% speedup for package > loading > > which unfortunately is not much. > > > Still I agree it is worth looking at exists() as you did ... and > > consider providing a fast simplified version of it in addition to > current > > exists() [I think]. > > > BTW, as we talk about enhancements here, maybe consider a further > possibility: > > My subjective guess is that probably more than half of exists() uses > are of the > > form > > > if(exists(name, where, ...)) { > >get(name, whare, ) > >.. > > } else { > > NULL / error() / .. or similar > > } > > > i.e. many exists() calls when returning TRUE are immediately followed > by the > > corresponding get() call which repeats quite a bit of the lookup that > exists() > > has done. > > > Instead, I'd imagine a function, say getifexists(name, ...) that does > both at > > once in the "exists is TRUE" case but in a way we can easily keep the > if(.) .. > > else clause above. One already existing approach would use > > > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), > "error")) { > > > ... (( work with xx )) ... > > > } else { > >NULL / error() / .. or similar > > } > > > but of course our C implementation would be more efficient and use more > concise > > syntax {which should not look like error handling}. Follow ups to > this idea > > should really go to R-devel (the mailing list). > > and now I do follow up here myself : > > I found that 'getifexists()' is actually very simple to implement, > I have already tested it a bit, but not yet committed to R-devel > (the "R trunk" aka "master branch") because I'd like to get > public comments {RFC := Request For Comments}. > I don't like the name -- I'd prefer getIfExists. As Baath (2012, R Journal) pointed out, R names are very inconsistent in naming conventions, but lowerCamelCase is the most common choice. Second most common is period.separated, so an argument could be made for get.if.exists, but there's still the possibility of confusion with S3 methods, and users of other languages where "." is an operator find it a little strange. If you don't like lowerCamelCase (and a lot of people don't), then I think underscore_separated is the next best choice, so would use get_if_exists. Another possibility is to make no new name at all, and just add an optional parameter to get() (which if present acts as your value.if.not parameter, if not present keeps the current "object not found" error). Duncan Murdoch > My version of the help file {for both exists() and getifexists()} > rendered in text is > > -- help(getifexists) --- > Is an Object Defined? > > Description: > > Look for an R object of the given name and possibly return it > > Usage: > > exists(x, where = -1, envir = , frame, mode = "any", > inherits = TRUE) > > getifexists(x, where = -1, envir = as.environment(where), > mode = "any", inherits = TRUE, value.if.not = NULL) > > Arguments: > >x: a variable name (given as a character string). > >where: where to look for the object (see the details section); if > omitted, the function will search as if the name of the > object appeared unquoted in an expression. > >envir: an alternative way to specify an environment to look in, but > it is usually simpler to just use the ‘where’ argument. > >frame: a frame in the calling list. Equivalent to giving ‘where’ as > ‘sys.frame(frame)’. > > mode: the mode or type of object sought: see the ‘Details’ section. > > inherits: should the enclosing frames of the environment be searched? > > value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not > exist. > > Details: > > The ‘where’ argument can specify the environme
Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Why are you reporting that your PCRE library does not have something which the R-admin manual says it should preferably have? To wit, footnote 37 says 'and not PCRE2, which started at version 10.0. PCRE must be built with UTF-8 support (not the default) and support for Unicode properties is assumed by some R packages. Neither are tested by configure. JIT support is desirable.' That certainly does not fail on my Linux, Windows and OS X builds of R-devel. (Issues about pre-built binaries, if that is what you used, should be reported to their maintainers, not here.) And the help does say in ?regex In UTF-8 mode, some Unicode properties may be supported via ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without property ‘xx’ respectively. Note the 'may'. On 07/01/2015 23:25, Dan Tenenbaum wrote: The following code: res <- gsub("(*UCP)\\b(i)\\b", "", "nhgrimelanomaclass", perl = TRUE) results in: Error in gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : invalid regular expression '(*UCP)\b(i)\b' In addition: Warning message: In gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : PCRE pattern compilation error 'this version of PCRE is not compiled with Unicode property support' at '(*UCP)\b(i)\b' on R Under development (unstable) (2015-01-01 r67290) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base And also on the same version of R-devel on Snow Leopard, Windows, and Linux. But it does not produce an error on R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Dan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] On base::rank
Have a look at the following, taken from base::rank: ... if (!is.na(na.last) && any(nas)) { yy <- integer(length(x)) # <~ storage.mode(yy) <- storage.mode(y) # < yy <- NA NAkeep <- (na.last == "keep") if (NAkeep || na.last) { yy[!nas] <- y if (!NAkeep) yy[nas] <- (length(y) + 1L):length(yy) } ... Alternatively, look at lines 36 and 37 here: https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36 There seems to be no need for those lines, IIUC. Isn't it? 'yy' is replaced with NA in the ver next line. Best, Arun. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 After doing some checking (for example see [4]), I asked Duncan about the problem, and he suggested moving the #ifndef _W64 in compat.c up above the offending lines (65-75). That did not work, so, I figured (it seems mistakenly from the other thread) that if those functions are included from stdio already, I can just delete them from compat.c. The specific lines are: int snprintf(char *buffer, size_t max, const char *format, ...) { int res; va_list(ap); va_start(ap, format); res = trio_vsnprintf(buffer, max, format, ap); va_end(ap); return res; } int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) { return trio_vsnprintf(buffer, bufferSize, format, args); } Continuing the build using 4.9.2 crashed again at the following point: gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o malloc.o windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 As all those files existed in their correct places, the only reason I could think of that this would fail here is that GCC version 4.9 did make some changes to enhance link-time optimization [5], and probably something isn't com
Re: [Rd] New version of Rtools for Windows
On 2015-01-08 14:18, Avraham Adler wrote: Very timely, as this is how I got into the problem I posted about earlier; maybe some of the problems I ran into will mean more to the you and the experts on this thread, Dr. Murdoch.For reference, I run Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. As we discussed offline, Dr. Murdoch, I've been trying to build R using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen (rubenvb) told me he is no longer developing his own builds of GCC, but is focusing on MSYS2 and the mingw64 personal builds. So, similar to what Jeroen said, I first installed MSYS2, whose initial installation on windows is not so simple[1]. After the initial install, the following packages need to be manually installed: make, tar, zip, unzip, zlib, and rsync. I also installed base-devel, which is way more than necessary, but there may be packages in there which are necessary. I originally installed the most up-to-date version of GCC (4.9.2)[2], and I did pick the -seh version, as since I install (almost) all packages from source (the one exception being nloptr for now), the exception handling should be consistent and it is supposed to up to ~15% faster[3]. The initial build crashed with the following error: gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o tre-mem.o tre-parse.o tre-stack.o xmalloc.o gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o compat.o compat.c:65:5: error: redefinition of 'snprintf' int snprintf(char *buffer, size_t max, const char *format, ...) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous definition of 'snprintf' was here int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) ^ compat.c:75:5: error: redefinition of 'vsnprintf' int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) ^ In file included from compat.c:3:0: F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous definition of 'vsnprintf' was here int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) ^ ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed make[4]: *** [compat.o] Error 1 Makefile:120: recipe for target 'rlibs' failed make[3]: *** [rlibs] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 After doing some checking (for example see [4]), I asked Duncan about the problem, and he suggested moving the #ifndef _W64 in compat.c up above the offending lines (65-75). That did not work, so, I figured (it seems mistakenly from the other thread) that if those functions are included from stdio already, I can just delete them from compat.c. The specific lines are: int snprintf(char *buffer, size_t max, const char *format, ...) { int res; va_list(ap); va_start(ap, format); res = trio_vsnprintf(buffer, max, format, ap); va_end(ap); return res; } int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list args) { return trio_vsnprintf(buffer, bufferSize, format, args); } Continuing the build using 4.9.2 crashed again at the following point: gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o malloc.o windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 As all those files existed in their correct places, the only reason I could think of that this would fail here is that GCC version 4.9 did make some changes to enhance
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
Adding an optional argument to get (and mget) like val <- get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get <- function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a <- getifexists(x,... ) if (!a$found) error("x not found") } else { a <- getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -"R-devel" wrote: - To: Martin Maechler , R-devel@r-project.org From: Duncan Murdoch Sent by: "R-devel" Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: > In November, we had a "bug repository conversation" > with Peter Hagerty and myself: > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > where the bug report title started with > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > Peter proposed an extra simplified and henc faster version of exists(), > and I commented > > > --- Comment #2 from Martin Maechler --- > > I'm very grateful that you've started exploring the bottlenecks of > loading > > packages with many S4 classes (and methods)... > > and I hope we can make real progress there rather sooner than later. > > > OTOH, your `summaryRprof()` in your vignette indicates that exists() > may use > > upto 10% of the time spent in library(reportingTools), and your speedup > > proposals of exist() may go up to ca 30% which is good and well worth > > considering, but still we can only expect 2-3% speedup for package > loading > > which unfortunately is not much. > > > Still I agree it is worth looking at exists() as you did ... and > > consider providing a fast simplified version of it in addition to > current > > exists() [I think]. > > > BTW, as we talk about enhancements here, maybe consider a further > possibility: > > My subjective guess is that probably more than half of exists() uses > are of the > > form > > > if(exists(name, where, ...)) { > >get(name, whare, ) > >.. > > } else { > > NULL / error() / .. or similar > > } > > > i.e. many exists() calls when returning TRUE are immediately followed > by the > > corresponding get() call which repeats quite a bit of the lookup that > exists() > > has done. > > > Instead, I'd imagine a function, say getifexists(name, ...) that does > both at > > once in the "exists is TRUE" case but in a way we can easily keep the > if(.) .. > > else clause above. One already existing approach would use > > > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), > "error")) { > > > ... (( work with xx )) ... > > > } else { > >NULL / error() / .. or similar > > } > > > but of course our C implementation would be more efficient and use more > concise > > syntax {which should not look like error handling}. Follow ups to > this idea > > should really go to R-devel (the mailing list). > > and now I do follow up here myself : > > I found that 'getifexists()' is actually very simple to implement, > I have already tested it a bit, but not yet committed to R-devel > (the "R trunk" aka "master branch") because I'd like to get > public comments {RFC := Request For Comments}. > I don't like the name -- I'd prefer getIfExists. As Baath (2012, R Journal) pointed out, R names are very inconsistent in naming conventions, but lowerCamelCase is the most common choice. Second most common is period.separated, so an argument could be made for get.if.exists, but there's still the possibility of confusion with S3 methods, and users of other languages wher
Re: [Rd] On base::rank
> Arunkumar Srinivasan > on Thu, 8 Jan 2015 13:46:57 +0100 writes: > Have a look at the following, taken from base::rank: > ... > if (!is.na(na.last) && any(nas)) { > yy <- integer(length(x)) # <~ > storage.mode(yy) <- storage.mode(y) # < > yy <- NA > NAkeep <- (na.last == "keep") > if (NAkeep || na.last) { > yy[!nas] <- y > if (!NAkeep) > yy[nas] <- (length(y) + 1L):length(yy) > } > ... > Alternatively, look at lines 36 and 37 here: > https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/library/base/R/rank.R#L36 > There seems to be no need for those lines, IIUC. Isn't it? > 'yy' is replaced with NA in the ver next line. Indeed. Interesting that nobody has noticed till now, even though that part has been world readable since at least 2008-08-25. Note that the R source code is at http://svn.r-project.org/R/ and the file in question at http://svn.r-project.org/R/trunk/src/library/base/R/rank.R where you can already see the new code (given that 'x' was no longer needed, there's no need for 'xx'). Martin Maechler, ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On base::rank
> Indeed. Interesting that nobody has noticed till now, > even though that part has been world readable since at least 2008-08-25. That was what made me a bit unsure :-). > Note that the R source code is at > http://svn.r-project.org/R/ > and the file in question at > http://svn.r-project.org/R/trunk/src/library/base/R/rank.R Okay, thanks. > where you can already see the new code > (given that 'x' was no longer needed, there's no need for 'xx'). Great! thanks again. > Martin Maechler, > ETH Zurich Best, Arun. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On 08/01/2015 9:03 AM, John Nolan wrote: Adding an optional argument to get (and mget) like val <- get(name, where, ..., value.if.not.found=NULL ) (*) That would be a bad idea, as it would change behaviour of existing uses of get(). What I suggested would not give a default. If the arg was missing, we'd get the old behaviour, if the arg was present, we'd use it. I'm not sure this is preferable to the separate function implementation. This makes the documentation and implementation of get() more complicated, and it would probably be slower for everyone. Duncan Murdoch would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get <- function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a <- getifexists(x,... ) if (!a$found) error("x not found") } else { a <- getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -"R-devel" wrote: - To: Martin Maechler , R-devel@r-project.org From: Duncan Murdoch Sent by: "R-devel" Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: > In November, we had a "bug repository conversation" > with Peter Hagerty and myself: > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > where the bug report title started with > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > Peter proposed an extra simplified and henc faster version of exists(), > and I commented > > > --- Comment #2 from Martin Maechler --- > > I'm very grateful that you've started exploring the bottlenecks of loading > > packages with many S4 classes (and methods)... > > and I hope we can make real progress there rather sooner than later. > > > OTOH, your `summaryRprof()` in your vignette indicates that exists() may use > > upto 10% of the time spent in library(reportingTools), and your speedup > > proposals of exist() may go up to ca 30% which is good and well worth > > considering, but still we can only expect 2-3% speedup for package loading > > which unfortunately is not much. > > > Still I agree it is worth looking at exists() as you did ... and > > consider providing a fast simplified version of it in addition to current > > exists() [I think]. > > > BTW, as we talk about enhancements here, maybe consider a further possibility: > > My subjective guess is that probably more than half of exists() uses are of the > > form > > > if(exists(name, where, ...)) { > >get(name, whare, ) > >.. > > } else { > > NULL / error() / .. or similar > > } > > > i.e. many exists() calls when returning TRUE are immediately followed by the > > corresponding get() call which repeats quite a bit of the lookup that exists() > > has done. > > > Instead, I'd imagine a function, say getifexists(name, ...) that does both at > > once in the "exists is TRUE" case but in a way we can easily keep the if(.) .. > > else clause above. One already existing approach would use > > > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), "error")) { > > > ... (( work with xx )) ... > > > } else { > >NULL / error() / .. or similar > > } > > > but of course our C implementation would be more efficient and use more concise > > syntax {which should not look like error handling}. Follow ups to this idea > > should really go to R-devel (the mailing list). > > and now I do follow up here myself : > > I found that 'getifexists()' is actually very simple to implement, > I have already tested it a bit, but not yet committed to R-devel > (the "R trunk" aka "master branch") because I'd like to get
Re: [Rd] gsub with perl=TRUE results in 'this version of PCRE is not compiled with Unicode property support' in R-devel
Dan, for OS X, there is a new pcre library posted at http://r.research.att.com/libs/ with a date stamp of Dec 28. This fixes this problem. You can test for this by running make check post compilation. It'll bang out with a failure if this is not in order. (And I know that all of this is described in R-admin). It would be helpful (time saving) if a message is posted to r-sig-mac whenever a new (version of a) library is added to http://r.research.att.com/libs/ I know it is adding more work to the helpful people who are doing all the heavy lifting. Kasper On Thu, Jan 8, 2015 at 7:06 AM, Prof Brian Ripley wrote: > Why are you reporting that your PCRE library does not have something which > the R-admin manual says it should preferably have? To wit, footnote 37 says > > 'and not PCRE2, which started at version 10.0. PCRE must be built with > UTF-8 support (not the default) and support for Unicode properties is > assumed by some R packages. Neither are tested by configure. JIT support is > desirable.' > > That certainly does not fail on my Linux, Windows and OS X builds of > R-devel. (Issues about pre-built binaries, if that is what you used, > should be reported to their maintainers, not here.) > > And the help does say in ?regex > > In UTF-8 mode, some Unicode properties may be supported via > ‘\p{xx}’ and ‘\P{xx}’ which match characters with and without > property ‘xx’ respectively. > > Note the 'may'. > > > > > > On 07/01/2015 23:25, Dan Tenenbaum wrote: > >> The following code: >> >> res <- gsub("(*UCP)\\b(i)\\b", >> "", "nhgrimelanomaclass", perl = TRUE) >> >> results in: >> >> Error in gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", >> "nhgrimelanomaclass", : >>invalid regular expression '(*UCP)\b(i)\b' >> In addition: Warning message: >> In gsub(sprintf("(*UCP)\\b(%s)\\b", "i"), "", "nhgrimelanomaclass", : >>PCRE pattern compilation error >> 'this version of PCRE is not compiled with Unicode property >> support' >> at '(*UCP)\b(i)\b' >> >> on >> >> R Under development (unstable) (2015-01-01 r67290) >> Platform: x86_64-apple-darwin13.4.0 (64-bit) >> Running under: OS X 10.9.5 (Mavericks) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> And also on the same version of R-devel on Snow Leopard, Windows, and >> Linux. But it does not produce an error on >> >> R version 3.1.2 (2014-10-31) >> Platform: x86_64-apple-darwin13.4.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> Dan >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > -- > Brian D. Ripley, rip...@stats.ox.ac.uk > Emeritus Professor of Applied Statistics, University of Oxford > 1 South Parks Road, Oxford OX1 3TG, UK > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding <- getBinding("x", env) if (hasValue(binding)) { x <- value(binding) # throws an error if none message(name(binding), "has value", x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a "named object". For example, when iterating over an environment. Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan wrote: > Adding an optional argument to get (and mget) like > > val <- get(name, where, ..., value.if.not.found=NULL ) (*) > > would be useful for many. HOWEVER, it is possible that there could be > some confusion here: (*) can give a NULL because either x exists and > has value NULL, or because x doesn't exist. If that matters, the user > would need to be careful about specifying a value.if.not.found that cannot > be confused with a valid value of x. > > To avoid this difficulty, perhaps we want both: have Martin's getifexists( > ) > return a list with two values: > - a boolean variable 'found' # = value returned by exists( ) > - a variable 'value' > > Then implement get( ) as: > > get <- function(x,...,value.if.not.found ) { > > if( missing(value.if.not.found) ) { > a <- getifexists(x,... ) > if (!a$found) error("x not found") > } else { > a <- getifexists(x,...,value.if.not.found ) > } > return(a$value) > } > > Note that value.if.not.found has no default value in above. > It behaves exactly like current get does if value.if.not.found > is not specified, and if it is specified, it would be faster > in the common situation mentioned below: > if(exists(x,...)) { get(x,...) } > > John > > P.S. if you like dromedaries call it valueIfNotFound ... > > .. > John P. Nolan > Math/Stat Department > 227 Gray Hall, American University > 4400 Massachusetts Avenue, NW > Washington, DC 20016-8050 > > jpno...@american.edu voice: 202.885.3140 > web: academic2.american.edu/~jpnolan > .. > > > -"R-devel" wrote: - > To: Martin Maechler , R-devel@r-project.org > From: Duncan Murdoch > Sent by: "R-devel" > Date: 01/08/2015 06:39AM > Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} > > On 08/01/2015 4:16 AM, Martin Maechler wrote: > > In November, we had a "bug repository conversation" > > with Peter Hagerty and myself: > > > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > > > where the bug report title started with > > > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > > > Peter proposed an extra simplified and henc faster version of exists(), > > and I commented > > > > > --- Comment #2 from Martin Maechler > --- > > > I'm very grateful that you've started exploring the bottlenecks of > loading > > > packages with many S4 classes (and methods)... > > > and I hope we can make real progress there rather sooner than > later. > > > > > OTOH, your `summaryRprof()` in your vignette indicates that > exists() may use > > > upto 10% of the time spent in library(reportingTools), and your > speedup > > > proposals of exist() may go up to ca 30% which is good and well > worth > > > considering, but still we can only expect 2-3% speedup for > package loading > > > which unfortunately is not much. > > > > > Still I agree it is worth looking at exists() as you did ... and > > > consider providing a fast simplified version of it in addition to > current > > > exists() [I think]. > > > > > BTW, as we talk about enhancements here, maybe consider a further > possibility: > > > My subjective guess is that probably more than half of exists() > uses are of the > > > form > > > > > if(exists(name, where, ...)) { > > >get(name, whare, ) > > >.. > > > } else { > > > NULL / error() / .. or similar > > > } > > > > > i.e. many exists() calls when returning TRUE are immediately > followed by the > > > corresponding get() call which repeats quite a bit of the lookup > that exists() > > > has done. > > > > > Instead, I'd imagine a function, say getifexists(name, ...) that > does both at > > > once in the "exist
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
> Adding an optional argument to get (and mget) like > val <- get(name, where, ..., value.if.not.found=NULL ) (*) > would be useful for many. HOWEVER, it is possible that there could be > some confusion here: (*) can give a NULL because either x exists and > has value NULL, or because x doesn't exist. If that matters, the user > would need to be careful about specifying a value.if.not.found that cannot > be confused with a valid value of x. Exactly -- well, of course: That problem { NULL can be the legit value of what you want to get() } was the only reason to have a 'value.if.not' argument at all. Note that this is not about a universal replacement of the if(exists(..)) { .. get(..) } idiom, but rather a replacement of these in the cases where speed matters very much, which is e.g. in the low level support code for S4 method dispatch. 'value.if.not.found': Note that CRAN checks requires all arguments to be written in full length. Even though we have auto completion in ESS, Rstudio or other good R IDE's, I very much like to keep function calls somewhat compact. And yes, as you mention the dromedars aka 2-hump camels: getIfExist is already horrible to my taste (and "_" is not S-like; yes that's all very much a matter of taste and yes I'm from the 20th century). > To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) > return a list with two values: > - a boolean variable 'found' # = value returned by exists( ) > - a variable 'value' > Then implement get( ) as: > get <- function(x,...,value.if.not.found ) { > if( missing(value.if.not.found) ) { > a <- getifexists(x,... ) > if (!a$found) error("x not found") > } else { > a <- getifexists(x,...,value.if.not.found ) > } > return(a$value) > } Interesting... Note that the above get() implementation would just be "conceptually", as all of this is also quite a bit about speed, and we do the different cases in C anyway [via 'op' code]. > Note that value.if.not.found has no default value in above. > It behaves exactly like current get does if value.if.not.found > is not specified, and if it is specified, it would be faster > in the common situation mentioned below: > if(exists(x,...)) { get(x,...) } Good... Let's talk about your getifexists() as I argue we'd keep get() exactly as it is now anyway, if we use a new 3rd function (I keep calling 'getifexists()' for now): I think in that case, getifexists() would not even *need* an argument 'value.if.not' (or 'value.if.not.found'); it rather would return a list(found = *, value = *) in any case. Alternatively, it could return structure(, value = *) In the first case, our main use case would be if((r <- getifexists(x, *))$found) { ## work with r$value } in the 2nd case {structure} : if((r <- getifexists(x, *))) { ## work with attr(r,"value") } I think that (both cases) would still be a bit slower (for the above most important use case) but probably not much and it would like slightly more readable than my if (!is.null(r <- getifexists(x, *))) { ## work with r } After all of this, I think I'd still somewhat prefer my original proposal, but not strongly -- I had originally also thought of returning the two parts explicitly, but then tended to prefer the version that behaved exactly like get() in the case the object is found. ... Nice interesting ideas! ... let the proposals and consideration flow ... Martin > John > P.S. if you like dromedaries call it valueIfNotFound ... :-) ;-) I don't .. as I said above, I already strongly dislike more than one hump. [ Each capital is one key stroke ("Shift") more , and each "_" is two key strokes more on most key boards..., and I do like identifiers that I can also quickly pronounce on the phone or in teaching .. ] > .. > John P. Nolan > Math/Stat Department > 227 Gray Hall, American University > 4400 Massachusetts Avenue, NW > Washington, DC 20016-8050 > .. > -"R-devel" wrote: - > To: Martin Maechler , R-devel@r-project.org > From: Duncan Murdoch > Sent by: "R-devel" > Date: 01/08/2015 06:39AM > Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} > On 08/01/2015 4:16 AM, Martin Maechler wrote: > > In November, we had a "bug repository conversation" > > with Peter Hagerty and myself: > > > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > > > where the bug report title started with > > > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > > > Peter proposed an extra simplified and henc faster version of exists(), > > and I commented > > > > > --- Comment #2 from Martin Maechler --- > > > I'm very grateful that you've started exploring the bottlenecks of > > loading > > > packages w
[Rd] unloadNamespace
In the documentation the closed thing I see to an explanation of this is that ?detach says "Unloading some namespaces has undesirable side effects" Can anyone explain why unloading tseries will load zoo? I don't think this behavior is specific to tseries, it's just an example. I realize one would not usually unload something that is not loaded, but I would expect it to do nothing or give an error. I only discovered this when trying to clean up to debug another problem. R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" and R Under development (unstable) (2015-01-02 r67308) -- "Unsuffered Consequences" ... Type 'q()' to quit R. > loadedNamespaces() [1] "base" "datasets" "graphics" "grDevices" "methods" "stats" [7] "utils" > unloadNamespace("tseries") # loads zoo ? > loadedNamespaces() [1] "base" "datasets" "graphics" "grDevices" "grid" "lattice" [7] "methods" "quadprog" "stats" "utils" "zoo" > Somewhat related, is there an easy way to get back to a "clean" state for loaded and attached things, as if R had just been started? I'm trying to do this in a vignette so it is not easy to stop and restart R. Paul __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib for ranlib. I also posted a patch to fix the check failure for stack probing, as lto optimizes away the stack probing code, as it should. yes, lto build's speed gain is very impressive. -- On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: >On 2015-01-08 14:18, Avraham Adler wrote: > >> Very timely, as this is how I got into the problem I posted about >> earlier; maybe some of the problems I ran into will mean more to the >> you and the experts on this thread, Dr. Murdoch.For reference, I run >> Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. >> >> As we discussed offline, Dr. Murdoch, I've been trying to build R >> using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen >> (rubenvb) told me he is no longer developing his own builds of GCC, >> but is focusing on MSYS2 and the mingw64 personal builds. So, similar >> to what Jeroen said, I first installed MSYS2, whose initial >> installation on windows is not so simple[1]. After the initial >> install, the following packages need to be manually installed: make, >> tar, zip, unzip, zlib, and rsync. I also installed base-devel, which >> is way more than necessary, but there may be packages in there which >> are necessary. >> >> I originally installed the most up-to-date version of GCC (4.9.2)[2], >> and I did pick the -seh version, as since I install (almost) all >> packages from source (the one exception being nloptr for now), the >> exception handling should be consistent and it is supposed to up to >> ~15% faster[3]. >> >> The initial build crashed with the following error: >> >> gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall >> -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o >> ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o >> tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o >> tre-mem.o tre-parse.o tre-stack.o xmalloc.o >> gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o >> compat.o >> compat.c:65:5: error: redefinition of 'snprintf' >> int snprintf(char *buffer, size_t max, const char *format, ...) >> ^ >> In file included from compat.c:3:0: >> F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous >> definition of 'snprintf' was here >> int snprintf (char * __restrict__ __stream, size_t __n, const char * >> __restrict__ __format, ...) >> ^ >> compat.c:75:5: error: redefinition of 'vsnprintf' >> int vsnprintf(char *buffer, size_t bufferSize, const char *format, >> va_list args) >> ^ >> In file included from compat.c:3:0: >> F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous >> definition of 'vsnprintf' was here >> int vsnprintf (char * __restrict__ __stream, size_t __n, const char >> * __restrict__ __format, va_list __local_argv) >> ^ >> ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed >> make[4]: *** [compat.o] Error 1 >> Makefile:120: recipe for target 'rlibs' failed >> make[3]: *** [rlibs] Error 1 >> Makefile:179: recipe for target '../../bin/x64/R.dll' failed >> make[2]: *** [../../bin/x64/R.dll] Error 2 >> Makefile:104: recipe for target 'rbuild' failed >> make[1]: *** [rbuild] Error 2 >> Makefile:14: recipe for target 'all' failed >> make: *** [all] Error 2 >> >> After doing some checking (for example see [4]), I asked Duncan about >> the problem, and he suggested moving the #ifndef _W64 in compat.c up >> above the offending lines (65-75). That did not work, so, I figured >> (it seems mistakenly from the other thread) that if those functions >> are included from stdio already, I can just delete them from compat.c. >> The specific lines are: >> >> int snprintf(char *buffer, size_t max, const char *format, ...) >> { >> int res; >> va_list(ap); >> va_start(ap, format); >> res = trio_vsnprintf(buffer, max, format, ap); >> va_end(ap); >> return res; >> } >> >> int vsnprintf(char *buffer, size_t bufferSize, const char *format, va_list >> args) >> { >> return trio_vsnprintf(buffer, bufferSize, format, args); >> } >> >> Continuing the build using 4.9.2 crashed again at the following point: >> >> gcc -std=gnu99 -m64 -I../include -I. -I../extra -DHAVE_CONFIG_H >> -DR_DLL_BUILD -O3 -Wall -pedantic -mtune=core2 -c malloc.c -o >> malloc.o >> windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o >> gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o >> dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o >> psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o >> system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a >> ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a >> ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a >> ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a >> ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.
Re: [Rd] unloadNamespace
Paul, My switchr package (https://github.com/gmbecker/switchr) has the flushSession function which does what you want and seems to work (on my test machine at least). I havent tested it under a recent Rdevel, or with that specific package, however I will soon, as the overarching model of switchr relies on this working. If you do try it before me with that package, please let me know whether it works or not. ~G On Thu, Jan 8, 2015 at 7:45 AM, Paul Gilbert wrote: > In the documentation the closed thing I see to an explanation of this is > that ?detach says "Unloading some namespaces has undesirable side effects" > > Can anyone explain why unloading tseries will load zoo? I don't think this > behavior is specific to tseries, it's just an example. I realize one would > not usually unload something that is not loaded, but I would expect it to > do nothing or give an error. I only discovered this when trying to clean up > to debug another problem. > > R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" > and > R Under development (unstable) (2015-01-02 r67308) -- "Unsuffered > Consequences" > ... > Type 'q()' to quit R. > > > loadedNamespaces() > [1] "base" "datasets" "graphics" "grDevices" "methods" "stats" > [7] "utils" > > unloadNamespace("tseries") # loads zoo ? > > loadedNamespaces() > [1] "base" "datasets" "graphics" "grDevices" "grid" "lattice" > [7] "methods" "quadprog" "stats" "utils" "zoo" > > > > Somewhat related, is there an easy way to get back to a "clean" state for > loaded and attached things, as if R had just been started? I'm trying to do > this in a vignette so it is not easy to stop and restart R. > > Paul > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Gabriel Becker, PhD Alumnus Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung wrote: > > The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib > for ranlib. I also posted a patch to fix the check failure for stack probing, > as lto optimizes away the stack probing code, as it should. > > yes, lto build's speed gain is very impressive. > > I apologize for my ignorance, but how would I do that? I tried by changing the following in src/gnuwin32/MkRules.local: # prefix for 64-bit: path or x86_64-w64-mingw32- BINPREF64 = x86_64-w64-mingw32-gcc- I added the gcc- as the suffix there, but I guess that is insufficient as I still get the following error using 4.9.2: windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a ../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a ../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a ../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a ../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. -lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv -lcomctl32 -lversion collect2.exe: error: ld returned 5 exit status Makefile:150: recipe for target 'R.dll' failed make[3]: *** [R.dll] Error 1 Makefile:179: recipe for target '../../bin/x64/R.dll' failed make[2]: *** [../../bin/x64/R.dll] Error 2 Makefile:104: recipe for target 'rbuild' failed make[1]: *** [rbuild] Error 2 Makefile:14: recipe for target 'all' failed make: *** [all] Error 2 I still had to delete those lines in compat.c, so this build, were it to have completed, is still subject to the non-conformance of scientfic notation printing that was discussed earlier. Hin-tak, any suggestions for this error (and the compat.c for that matter) that you, or any reader of this list, may have would be greatly appreciated. Thank you! Avi > -- > On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: > >>On 2015-01-08 14:18, Avraham Adler wrote: >> >>> Very timely, as this is how I got into the problem I posted about >>> earlier; maybe some of the problems I ran into will mean more to the >>> you and the experts on this thread, Dr. Murdoch.For reference, I run >>> Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. >>> >>> As we discussed offline, Dr. Murdoch, I've been trying to build R >>> using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen >>> (rubenvb) told me he is no longer developing his own builds of GCC, >>> but is focusing on MSYS2 and the mingw64 personal builds. So, similar >>> to what Jeroen said, I first installed MSYS2, whose initial >>> installation on windows is not so simple[1]. After the initial >>> install, the following packages need to be manually installed: make, >>> tar, zip, unzip, zlib, and rsync. I also installed base-devel, which >>> is way more than necessary, but there may be packages in there which >>> are necessary. >>> >>> I originally installed the most up-to-date version of GCC (4.9.2)[2], >>> and I did pick the -seh version, as since I install (almost) all >>> packages from source (the one exception being nloptr for now), the >>> exception handling should be consistent and it is supposed to up to >>> ~15% faster[3]. >>> >>> The initial build crashed with the following error: >>> >>> gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall >>> -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o >>> ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o >>> tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o >>> tre-mem.o tre-parse.o tre-stack.o xmalloc.o >>> gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o >>> compat.o >>> compat.c:65:5: error: redefinition of 'snprintf' >>> int snprintf(char *buffer, size_t max, const char *format, ...) >>> ^ >>> In file included from compat.c:3:0: >>> F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:553:5: note: previous >>> definition of 'snprintf' was here >>> int snprintf (char * __restrict__ __stream, size_t __n, const char * >>> __restrict__ __format, ...) >>> ^ >>> compat.c:75:5: error: redefinition of 'vsnprintf' >>> int vsnprintf(char *buffer, size_t bufferSize, const char *format, >>> va_list args) >>> ^ >>> In file included from compat.c:3:0: >>> F:/MinGW64/x86_64-w64-mingw32/include/stdio.h:543:7: note: previous >>> definition of 'vsnprintf' was here >>> int vsnprintf (char * __restrict__ __stream, size_t __n, const char >>> * __restrict__ __format, va_list __local_argv) >>> ^ >>> ../../gnuwin32/MkRules:218: recipe for target 'compat.o' failed >>> make[4]: *** [compat.o] Error 1 >>> Makefile:120: recipe for target 'rlibs' failed >>> make[3]: *** [rlibs] Error 1 >>> Makefile:179: recipe for ta
Re: [Rd] New version of Rtools for Windows
Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing fix, you also need a very up to date binutils. 2.25 was out in december. Even with that , if you linker's default is not what you are compiling for (i.e. a multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not enough. Some fix to autodetect non-default targets went in after christmas before the new year, but I am not brave enough to try that on a daily basis yet (only tested it and reported it, then reverting the change - how gcc invokes the linker is rather complicated and it is not easy to have two binutils installed...)- setting GNUTARGET seems safer :-). Whether you need that depends on whether you are compiling for your toolchain's default target architecture. AR, RANLIB, GNUTARGET are all environment variables - you set them the usual way. The stack probing fix is for passing "make check", when you finish make. -- On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote: >On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung > wrote: >> >> The r.dll crash is easy - you need to be using gcc-ar for ar, and gcc-ranlib >> for ranlib. I also posted a patch to fix the check failure for stack >> probing, as lto optimizes away the stack probing code, as it should. >> >> yes, lto build's speed gain is very impressive. >> > > >I apologize for my ignorance, but how would I do that? I tried by >changing the following in src/gnuwin32/MkRules.local: > ># prefix for 64-bit: path or x86_64-w64-mingw32- >BINPREF64 = x86_64-w64-mingw32-gcc- > >I added the gcc- as the suffix there, but I guess that is insufficient >as I still get the following error using 4.9.2: > >windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o >gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o >dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o >psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o >system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a >../nmath/libnmath.a getline/gl.a ../extra/xdr/libxdr.a >../extra/pcre/libpcre.a ../extra/bzip2/libbz2.a >../extra/intl/libintl.a ../extra/trio/libtrio.a ../extra/tzone/libtz.a >../extra/tre/libtre.a ../extra/xz/liblzma.a dllversion.o -fopenmp -L. >-lgfortran -lRblas -L../../bin/x64 -lRzlib -lRgraphapp -lRiconv >-lcomctl32 -lversion >collect2.exe: error: ld returned 5 exit status >Makefile:150: recipe for target 'R.dll' failed >make[3]: *** [R.dll] Error 1 >Makefile:179: recipe for target '../../bin/x64/R.dll' failed >make[2]: *** [../../bin/x64/R.dll] Error 2 >Makefile:104: recipe for target 'rbuild' failed >make[1]: *** [rbuild] Error 2 >Makefile:14: recipe for target 'all' failed >make: *** [all] Error 2 > >I still had to delete those lines in compat.c, so this build, were it >to have completed, is still subject to the non-conformance of >scientfic notation printing that was discussed earlier. > >Hin-tak, any suggestions for this error (and the compat.c for that >matter) that you, or any reader of this list, may have would be >greatly appreciated. > >Thank you! > >Avi > > >> -- >> On Thu, Jan 8, 2015 2:01 PM GMT Henric Winell wrote: >> >>On 2015-01-08 14:18, Avraham Adler wrote: >> >>> Very timely, as this is how I got into the problem I posted about >>> earlier; maybe some of the problems I ran into will mean more to the >>> you and the experts on this thread, Dr. Murdoch.For reference, I run >>> Windows 7 64bit, and I am trying to build a 64 bit version of R-3.1.2. >>> >>> As we discussed offline, Dr. Murdoch, I've been trying to build R >>> using more recent tools than GCC4.6.3 prerelease. Ruben Von Boxen >>> (rubenvb) told me he is no longer developing his own builds of GCC, >>> but is focusing on MSYS2 and the mingw64 personal builds. So, similar >>> to what Jeroen said, I first installed MSYS2, whose initial >>> installation on windows is not so simple[1]. After the initial >>> install, the following packages need to be manually installed: make, >>> tar, zip, unzip, zlib, and rsync. I also installed base-devel, which >>> is way more than necessary, but there may be packages in there which >>> are necessary. >>> >>> I originally installed the most up-to-date version of GCC (4.9.2)[2], >>> and I did pick the -seh version, as since I install (almost) all >>> packages from source (the one exception being nloptr for now), the >>> exception handling should be consistent and it is supposed to up to >>> ~15% faster[3]. >>> >>> The initial build crashed with the following error: >>> >>> gcc -std=gnu99 -m64 -I../../include -I. -DHAVE_CONFIG_H -O3 -Wall >>> -pedantic -mtune=core2 -c xmalloc.c -o xmalloc.o >>> ar crs libtre.a regcomp.o regerror.o regexec.o tre-ast.o tre-compile.o >>> tre-match -approx.o tre-match-backtrack.o tre-match-parallel.o >>> tre-mem.o tre-parse.o tre-stack.o xmalloc.o >>> gcc -std=gnu99 -m64 -O3 -Wall -pedantic -mtune=core2 -c compat.c -o >>> c
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On Thu, 8 Jan 2015, Michael Lawrence wrote: If we do add an argument to get(), then it should be named consistently with the ifnotfound argument of mget(). As mentioned, the possibility of a NULL value is problematic. One solution is a sentinel value that indicates an unbound value (like R_UnboundValue). A null default is fine -- it's a default; if it isn't right for a particular case you can provide something else. But another idea (and one pretty similar to John's) is to follow the SYMSXP design at the C level, where there is a structure that points to the name and a value. We already have SYMSXPs at the R level of course (name objects) but they do not provide access to the value, which is typically R_UnboundValue. But this does not even need to be implemented with SYMSXP. The design would allow something like: binding <- getBinding("x", env) if (hasValue(binding)) { x <- value(binding) # throws an error if none message(name(binding), "has value", x) } That I think it is a bit verbose but readable and could be made fast. And I think binding objects would be useful in other ways, as they are essentially a "named object". For example, when iterating over an environment. This would need a lot more thought. Directly exposing the internals is definitely not something we want to do as we may well want to change that design. But there are lots of other corner issues that would have to be thought through before going forward, such as what happens if an rm occurs between obtaining a binding object and doing something with it. Serialization would also need thinking through. This doesn't seem like a worthwhile place to spend our efforts to me. Adding getIfExists, or .get, or get0, or whatever seems fine. Adding an argument to get() with missing giving current behavior may be OK too. Rewriting exists and get as .Primitives may be sufficient though. Best, luke Michael On Thu, Jan 8, 2015 at 6:03 AM, John Nolan wrote: Adding an optional argument to get (and mget) like val <- get(name, where, ..., value.if.not.found=NULL ) (*) would be useful for many. HOWEVER, it is possible that there could be some confusion here: (*) can give a NULL because either x exists and has value NULL, or because x doesn't exist. If that matters, the user would need to be careful about specifying a value.if.not.found that cannot be confused with a valid value of x. To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) return a list with two values: - a boolean variable 'found' # = value returned by exists( ) - a variable 'value' Then implement get( ) as: get <- function(x,...,value.if.not.found ) { if( missing(value.if.not.found) ) { a <- getifexists(x,... ) if (!a$found) error("x not found") } else { a <- getifexists(x,...,value.if.not.found ) } return(a$value) } Note that value.if.not.found has no default value in above. It behaves exactly like current get does if value.if.not.found is not specified, and if it is specified, it would be faster in the common situation mentioned below: if(exists(x,...)) { get(x,...) } John P.S. if you like dromedaries call it valueIfNotFound ... .. John P. Nolan Math/Stat Department 227 Gray Hall, American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpno...@american.edu voice: 202.885.3140 web: academic2.american.edu/~jpnolan .. -"R-devel" wrote: - To: Martin Maechler , R-devel@r-project.org From: Duncan Murdoch Sent by: "R-devel" Date: 01/08/2015 06:39AM Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} On 08/01/2015 4:16 AM, Martin Maechler wrote: > In November, we had a "bug repository conversation" > with Peter Hagerty and myself: > > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 > > where the bug report title started with > > --->> "exists" is a bottleneck for dispatch and package loading, ... > > Peter proposed an extra simplified and henc faster version of exists(), > and I commented > > > --- Comment #2 from Martin Maechler --- > > I'm very grateful that you've started exploring the bottlenecks of loading > > packages with many S4 classes (and methods)... > > and I hope we can make real progress there rather sooner than later. > > > OTOH, your `summaryRprof()` in your vignette indicates that exists() may use > > upto 10% of the time spent in library(reportingTools), and your speedup > > proposals of exist() may go up to ca 30% which is good and well worth > > considering, but still we can only expect 2-3% speedup for package loading > > which unfortunately is not much. > > > Still I agree it is worth looking at exists() as you did ... and > > consider providing a fast simplified version of it in addition to current > > exists() [I think]. > > > BTW,
[Rd] Testing R packages on Solaris Studio
I have setup a Solaris server to test packages before submitting to CRAN, in order to catch problems that might not reveal themselves on Fedora, Debian, OSX or Windows. The machine runs a Solaris 11.2 vm with Solaris Studio 12.3. I was able to compile current r-devel using the suggested environment variables from "R Installation and Administration" and: ./configure --prefix=/opt/R-devel --with-blas='-library=sunperf' --with-lapack All works great (fast too), except for some CRAN packages with c++ code won't build. The compiler itself works, most packages (including e.g. MCMCpack) build OK. However packages like Rcpp and RJSONIO fail with errors shown here: https://gist.github.com/jeroenooms/f1b6a172320a32f59c82. I tried installing with GNU make, but that does not seem to be the problem configure.vars = "MAKE=/opt/csw/bin/gmake" I am aware that I can work around it by compiling with gcc instead of solaris studio, but I would specifically like to replicate the setup from CRAN. Which additional args/vars/dependencies do I need to make Rcpp and RJSONIO build as they do on the CRAN Solaris server? > sessionInfo() R Under development (unstable) (2015-01-07 r67351) Platform: i386-pc-solaris2.11 (32-bit) Running under: Solaris 11 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tcltk_3.2.0 tools_3.2.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On Thu, Jan 8, 2015 at 6:36 AM, Duncan Murdoch wrote: >> val <- get(name, where, ..., value.if.not.found=NULL ) (*) > > That would be a bad idea, as it would change behaviour of existing uses of > get(). Another approach would be if the "not found" behavior consists of a callback, e.g. an expression or function: get(name, where, ..., not.found=stop("object ", name, " not found")) This would cover the case of not.found=NULL, but also allows for writing code with syntax similar to tryCatch obj <- get("foo", not.found = someDefaultValue()) Not sure what this would do to performance though. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The && idea does have some merit, though. Apropos, why is there no setcontains()? -pd > On 06 Jan 2015, at 22:02 , Hervé Pagès wrote: > > Hi, > > Current implementation: > > setequal <- function (x, y) > { > x <- as.vector(x) > y <- as.vector(y) > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) > } > > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L' > with 'x %in% y' and 'y %in% x', respectively. They're strictly > equivalent but the latter form is a lot more readable than the former > (isn't this the "raison d'être" of %in%?): > > setequal <- function (x, y) > { > x <- as.vector(x) > y <- as.vector(y) > all(c(x %in% y, y %in% x)) > } > > Furthermore, replacing 'all(c(x %in% y, y %in x))' with > 'all(x %in% y) && all(y %in% x)' improves readability even more and, > more importantly, reduces memory footprint significantly on big vectors > (e.g. by 15% on integer vectors with 15M elements): > > setequal <- function (x, y) > { > x <- as.vector(x) > y <- as.vector(y) > all(x %in% y) && all(y %in% x) > } > > It also seems to speed up things a little bit (not in a significant > way though). > > Cheers, > H. > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax:(206) 667-1319 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
For what it's worth, I think we would need a new function if the default behavior changes. Since we already have "get" and "mget", maybe "cget" for "conditional get"? "if get", "safe get", ... I like the idea of keeping the original "not found" behavior if the "if.not.found" arg is missing. However, it will be important to keep the number of arguments down. (I noticed that Martin's example lacks a "frame" argument.) I've heard rumors that there are plans to reduce the function call overhead, so perhaps this matters less now. I like Luke's idea of making exists/get/etc. .Primitives. I think that will be necessary in order to go fast. For my two cents, I also think get/assign should just be synonyms for the "[[" .Primitive. That could actually simplify things a bit. One might add "inherits=FALSE" and "if.not.found" arguments to the environment "[[" code, for example. Regards, Pete Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 11:57 AM, wrote: > On Thu, 8 Jan 2015, Michael Lawrence wrote: > > If we do add an argument to get(), then it should be named consistently >> with the ifnotfound argument of mget(). As mentioned, the possibility of a >> NULL value is problematic. One solution is a sentinel value that indicates >> an unbound value (like R_UnboundValue). >> > > A null default is fine -- it's a default; if it isn't right for a > particular case you can provide something else. > > >> But another idea (and one pretty similar to John's) is to follow the >> SYMSXP >> design at the C level, where there is a structure that points to the name >> and a value. We already have SYMSXPs at the R level of course (name >> objects) but they do not provide access to the value, which is typically >> R_UnboundValue. But this does not even need to be implemented with SYMSXP. >> The design would allow something like: >> >> binding <- getBinding("x", env) >> if (hasValue(binding)) { >> x <- value(binding) # throws an error if none >> message(name(binding), "has value", x) >> } >> >> That I think it is a bit verbose but readable and could be made fast. And >> I >> think binding objects would be useful in other ways, as they are >> essentially a "named object". For example, when iterating over an >> environment. >> > > This would need a lot more thought. Directly exposing the internals is > definitely not something we want to do as we may well want to change > that design. But there are lots of other corner issues that would have > to be thought through before going forward, such as what happens if an > rm occurs between obtaining a binding object and doing something with > it. Serialization would also need thinking through. This doesn't seem > like a worthwhile place to spend our efforts to me. > > Adding getIfExists, or .get, or get0, or whatever seems fine. Adding > an argument to get() with missing giving current behavior may be OK > too. Rewriting exists and get as .Primitives may be sufficient though. > > Best, > > luke > > > Michael >> >> >> >> >> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan wrote: >> >> Adding an optional argument to get (and mget) like >>> >>> val <- get(name, where, ..., value.if.not.found=NULL ) (*) >>> >>> would be useful for many. HOWEVER, it is possible that there could be >>> some confusion here: (*) can give a NULL because either x exists and >>> has value NULL, or because x doesn't exist. If that matters, the user >>> would need to be careful about specifying a value.if.not.found that >>> cannot >>> be confused with a valid value of x. >>> >>> To avoid this difficulty, perhaps we want both: have Martin's >>> getifexists( >>> ) >>> return a list with two values: >>> - a boolean variable 'found' # = value returned by exists( ) >>> - a variable 'value' >>> >>> Then implement get( ) as: >>> >>> get <- function(x,...,value.if.not.found ) { >>> >>> if( missing(value.if.not.found) ) { >>> a <- getifexists(x,... ) >>> if (!a$found) error("x not found") >>> } else { >>> a <- getifexists(x,...,value.if.not.found ) >>> } >>> return(a$value) >>> } >>> >>> Note that value.if.not.found has no default value in above. >>> It behaves exactly like current get does if value.if.not.found >>> is not specified, and if it is specified, it would be faster >>> in the common situation mentioned below: >>> if(exists(x,...)) { get(x,...) } >>> >>> John >>> >>> P.S. if you like dromedaries call it valueIfNotFound ... >>> >>> .. >>> John P. Nolan >>> Math/Stat Department >>> 227 Gray Hall, American University >>> 4400 Massachusetts Avenue, NW >>> Washington, DC 20016-8050 >>> >>> jpno...@american.edu voice: 202.885.3140 >>> web: academic2.american.edu/~jpnolan >>> .. >>> >>> >>> -"R-devel" wrote: - >>> To: Martin Maechler , R-devel@r-project.org >>> From: Duncan Murdoc
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
Michael's idea has an interesting bonus that he and I discussed earlier. It would be very convenient to have a container of key/value pairs. I imagine many people often write this: x - mapply( names(x), x, FUN=function(k,v) { # work with key and value } especially ex perl people accustomed to while ( ($key, $value) = each( some_hash ) { } Perhaps there is room for additional discussion of using lists of SYMSXPs in this manner. (If SYMSXPs are not that safe, perhaps a looping construct for named vectors that gave the illusion iterating over a list of two-tuples.) Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 11:57 AM, wrote: > On Thu, 8 Jan 2015, Michael Lawrence wrote: > > If we do add an argument to get(), then it should be named consistently >> with the ifnotfound argument of mget(). As mentioned, the possibility of a >> NULL value is problematic. One solution is a sentinel value that indicates >> an unbound value (like R_UnboundValue). >> > > A null default is fine -- it's a default; if it isn't right for a > particular case you can provide something else. > > >> But another idea (and one pretty similar to John's) is to follow the >> SYMSXP >> design at the C level, where there is a structure that points to the name >> and a value. We already have SYMSXPs at the R level of course (name >> objects) but they do not provide access to the value, which is typically >> R_UnboundValue. But this does not even need to be implemented with SYMSXP. >> The design would allow something like: >> >> binding <- getBinding("x", env) >> if (hasValue(binding)) { >> x <- value(binding) # throws an error if none >> message(name(binding), "has value", x) >> } >> >> That I think it is a bit verbose but readable and could be made fast. And >> I >> think binding objects would be useful in other ways, as they are >> essentially a "named object". For example, when iterating over an >> environment. >> > > This would need a lot more thought. Directly exposing the internals is > definitely not something we want to do as we may well want to change > that design. But there are lots of other corner issues that would have > to be thought through before going forward, such as what happens if an > rm occurs between obtaining a binding object and doing something with > it. Serialization would also need thinking through. This doesn't seem > like a worthwhile place to spend our efforts to me. > > Adding getIfExists, or .get, or get0, or whatever seems fine. Adding > an argument to get() with missing giving current behavior may be OK > too. Rewriting exists and get as .Primitives may be sufficient though. > > Best, > > luke > > > Michael >> >> >> >> >> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan wrote: >> >> Adding an optional argument to get (and mget) like >>> >>> val <- get(name, where, ..., value.if.not.found=NULL ) (*) >>> >>> would be useful for many. HOWEVER, it is possible that there could be >>> some confusion here: (*) can give a NULL because either x exists and >>> has value NULL, or because x doesn't exist. If that matters, the user >>> would need to be careful about specifying a value.if.not.found that >>> cannot >>> be confused with a valid value of x. >>> >>> To avoid this difficulty, perhaps we want both: have Martin's >>> getifexists( >>> ) >>> return a list with two values: >>> - a boolean variable 'found' # = value returned by exists( ) >>> - a variable 'value' >>> >>> Then implement get( ) as: >>> >>> get <- function(x,...,value.if.not.found ) { >>> >>> if( missing(value.if.not.found) ) { >>> a <- getifexists(x,... ) >>> if (!a$found) error("x not found") >>> } else { >>> a <- getifexists(x,...,value.if.not.found ) >>> } >>> return(a$value) >>> } >>> >>> Note that value.if.not.found has no default value in above. >>> It behaves exactly like current get does if value.if.not.found >>> is not specified, and if it is specified, it would be faster >>> in the common situation mentioned below: >>> if(exists(x,...)) { get(x,...) } >>> >>> John >>> >>> P.S. if you like dromedaries call it valueIfNotFound ... >>> >>> .. >>> John P. Nolan >>> Math/Stat Department >>> 227 Gray Hall, American University >>> 4400 Massachusetts Avenue, NW >>> Washington, DC 20016-8050 >>> >>> jpno...@american.edu voice: 202.885.3140 >>> web: academic2.american.edu/~jpnolan >>> .. >>> >>> >>> -"R-devel" wrote: - >>> To: Martin Maechler , R-devel@r-project.org >>> From: Duncan Murdoch >>> Sent by: "R-devel" >>> Date: 01/08/2015 06:39AM >>> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} >>> >>> On 08/01/2015 4:16 AM, Martin Maechler wrote: >>> > In November, we had a "bug repository conversation" >>> > with Peter Hagerty and myself: >>> > >>> > https://bugs.r-project.org/bugzilla/s
Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
On Thu, Jan 8, 2015 at 11:57 AM, wrote: > On Thu, 8 Jan 2015, Michael Lawrence wrote: > > If we do add an argument to get(), then it should be named consistently >> with the ifnotfound argument of mget(). As mentioned, the possibility of a >> NULL value is problematic. One solution is a sentinel value that indicates >> an unbound value (like R_UnboundValue). >> > > A null default is fine -- it's a default; if it isn't right for a > particular case you can provide something else. > > >> But another idea (and one pretty similar to John's) is to follow the >> SYMSXP >> design at the C level, where there is a structure that points to the name >> and a value. We already have SYMSXPs at the R level of course (name >> objects) but they do not provide access to the value, which is typically >> R_UnboundValue. But this does not even need to be implemented with SYMSXP. >> The design would allow something like: >> >> binding <- getBinding("x", env) >> if (hasValue(binding)) { >> x <- value(binding) # throws an error if none >> message(name(binding), "has value", x) >> } >> >> That I think it is a bit verbose but readable and could be made fast. And >> I >> think binding objects would be useful in other ways, as they are >> essentially a "named object". For example, when iterating over an >> environment. >> > > This would need a lot more thought. Directly exposing the internals is > definitely not something we want to do as we may well want to change > that design. But there are lots of other corner issues that would have > to be thought through before going forward, such as what happens if an > rm occurs between obtaining a binding object and doing something with > it. Serialization would also need thinking through. This doesn't seem > like a worthwhile place to spend our efforts to me. > > Just wanted to be clear that I was not suggesting to expose any internals. We could implement the behavior using SYMSXP, or not. Nor would the binding need to be mutable. The binding would be considered independent of the environment from which it was retrieved. As Pete has mentioned, it could be a useful abstraction to have in general. > Adding getIfExists, or .get, or get0, or whatever seems fine. Adding > an argument to get() with missing giving current behavior may be OK > too. Rewriting exists and get as .Primitives may be sufficient though. > > Best, > > luke > > > Michael >> >> >> >> >> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan wrote: >> >> Adding an optional argument to get (and mget) like >>> >>> val <- get(name, where, ..., value.if.not.found=NULL ) (*) >>> >>> would be useful for many. HOWEVER, it is possible that there could be >>> some confusion here: (*) can give a NULL because either x exists and >>> has value NULL, or because x doesn't exist. If that matters, the user >>> would need to be careful about specifying a value.if.not.found that >>> cannot >>> be confused with a valid value of x. >>> >>> To avoid this difficulty, perhaps we want both: have Martin's >>> getifexists( >>> ) >>> return a list with two values: >>> - a boolean variable 'found' # = value returned by exists( ) >>> - a variable 'value' >>> >>> Then implement get( ) as: >>> >>> get <- function(x,...,value.if.not.found ) { >>> >>> if( missing(value.if.not.found) ) { >>> a <- getifexists(x,... ) >>> if (!a$found) error("x not found") >>> } else { >>> a <- getifexists(x,...,value.if.not.found ) >>> } >>> return(a$value) >>> } >>> >>> Note that value.if.not.found has no default value in above. >>> It behaves exactly like current get does if value.if.not.found >>> is not specified, and if it is specified, it would be faster >>> in the common situation mentioned below: >>> if(exists(x,...)) { get(x,...) } >>> >>> John >>> >>> P.S. if you like dromedaries call it valueIfNotFound ... >>> >>> .. >>> John P. Nolan >>> Math/Stat Department >>> 227 Gray Hall, American University >>> 4400 Massachusetts Avenue, NW >>> Washington, DC 20016-8050 >>> >>> jpno...@american.edu voice: 202.885.3140 >>> web: academic2.american.edu/~jpnolan >>> .. >>> >>> >>> -"R-devel" wrote: - >>> To: Martin Maechler , R-devel@r-project.org >>> From: Duncan Murdoch >>> Sent by: "R-devel" >>> Date: 01/08/2015 06:39AM >>> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...} >>> >>> On 08/01/2015 4:16 AM, Martin Maechler wrote: >>> > In November, we had a "bug repository conversation" >>> > with Peter Hagerty and myself: >>> > >>> > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065 >>> > >>> > where the bug report title started with >>> > >>> > --->> "exists" is a bottleneck for dispatch and package loading, ... >>> > >>> > Peter proposed an extra simplified and henc faster version of exists(), >>> > and I commented >>> > >>> > > --- Comment #2 from Martin Maechler >>
[Rd] latex warning
Dear all, I am getting an R CMD check warning about the PDF manual. I am having a hard time finding out what is wrong, here is the log of the Rd2pdf call. The full check (and other) log is at https://api.travis-ci.org/jobs/46373922/log.txt?deansi=true if anybody is interested, and the package itself is here: https://github.com/metacran/r-builder/tree/bintex/rbuildertest Thanks, Best, Gabor +cat ./rbuildertest.Rcheck/Rdlatex.log Hmm ... looks like a package This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) (preloaded format=pdflatex) restricted \write18 enabled. kpathsea: Running mktexfmt pdflatex.fmt fmtutil: running `pdftex -ini -jobname=pdflatex -progname=pdflatex -translate-file=cp227.tcx *pdflatex.ini' ... This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) (INITEX) restricted \write18 enabled. (/home/travis/R-bin/texlive/texmf-dist/web2c/cp227.tcx) entering extended mode (/home/travis/R-bin/texlive/texmf-dist/tex/latex/latexconfig/pdflatex.ini (/home/travis/R-bin/texlive/texmf-config/tex/generic/config/pdftexconfig.tex) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/latex.ltx (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/texsys.cfg) ./texsys.aux found \@currdir set to: ./. Assuming \openin and \input have the same search path. Defining UNIX/DOS style filename parser. catcodes, registers, compatibility for TeX 2, parameters, LaTeX2e <2014/05/01> hacks, control, par, spacing, files, font encodings, lengths, Local config file fonttext.cfg used (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/fonttext.cfg (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/fonttext.ltx === Don't modify this file, use a .cfg file instead === (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/omlenc.def) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/t1enc.def) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ot1enc.def) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/omsenc.def) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/t1cmr.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ot1cmr.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ot1cmss.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ot1cmtt.fd))) Local config file fontmath.cfg used (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/fontmath.cfg (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/fontmath.ltx === Don't modify this file, use a .cfg file instead === (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/omlcmm.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/omscmsy.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/omxcmex.fd) (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ucmr.fd))) Local config file preload.cfg used = (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/preload.cfg (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/preload.ltx)) page nos., x-ref, environments, center, verbatim, math definitions, boxes, title, sectioning, contents, floats, footnotes, index, bibliography, output, === Local configuration file hyphen.cfg used === (/home/travis/R-bin/texlive/texmf-dist/tex/generic/babel/hyphen.cfg (/home/travis/R-bin/texlive/texmf-dist/tex/generic/babel/switch.def) (/home/travis/R-bin/texlive/texmf-dist/tex/generic/hyphen/hyphen.tex) (/home/travis/R-bin/texlive/texmf-dist/tex/generic/hyphen/dumyhyph.tex) (/home/travis/R-bin/texlive/texmf-dist/tex/generic/hyphen/zerohyph.tex)) = Applying patch file ltpatch.ltx = (/home/travis/R-bin/texlive/texmf-dist/tex/latex/base/ltpatch.ltx) ) ) Beginning to dump on file pdflatex.fmt (preloaded format=pdflatex 2015.1.8) 4976 strings of total length 68991 45099 memory locations dumped; current usage is 144&43215 3320 multiletter control sequences \font\nullfont=nullfont \font\OMX/cmex/m/n/10=cmex10 \font\tenln=line10 \font\tenlnw=linew10 \font\tencirc=lcircle10 \font\tencircw=lcirclew10 \font\OT1/cmr/m/n/5=cmr5 \font\OT1/cmr/m/n/7=cmr7 \font\OT1/cmr/m/n/10=cmr10 \font\OML/cmm/m/it/5=cmmi5 \font\OML/cmm/m/it/7=cmmi7 \font\OML/cmm/m/it/10=cmmi10 \font\OMS/cmsy/m/n/5=cmsy5 \font\OMS/cmsy/m/n/7=cmsy7 \font\OMS/cmsy/m/n/10=cmsy10 3633 words of font info for 14 preloaded fonts 14 hyphenation exceptions Hyphenation trie of length 6081 has 183 ops out of 35111 2 for language 1 181 for language 0 0 words of pdfTeX memory 0 indirect objects No pages of output. Transcript written on pdflatex.log. fmtutil: /home/travis/.texlive2014/texmf-var/web2c/pdftex/pdflatex.fmt installed. fmtutil: No errors, exiting successfully. entering extended mode (./Rd2.tex LaTeX2e <2014/05/01> Babel <3.9l> and hyphe
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
> why is there no setcontains()? Several packages define is.subset(), which I am assuming is what you are proposing, but it its arguments reversed. E.g., package:algstat has is.subset <- function(x, y) all(x %in% y) containsQ <- function(y, x) all(x %in% y) and package:rje has essentially the same is.subset. package:arulesSequences and package:arules have an S4 generic called is.subset, which is entirely different (it is not a predicate, but returns a matrix). Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > If you look at the definition of %in%, you'll find that it is implemented > using match, so if we did as you suggest, I give it about three days before > someone suggests to inline the function call... Readability of source code > is not usually our prime concern. > > The && idea does have some merit, though. > > Apropos, why is there no setcontains()? > > -pd > > > On 06 Jan 2015, at 22:02 , Hervé Pagès wrote: > > > > Hi, > > > > Current implementation: > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) > > } > > > > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > > 0L' > > with 'x %in% y' and 'y %in% x', respectively. They're strictly > > equivalent but the latter form is a lot more readable than the former > > (isn't this the "raison d'être" of %in%?): > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(c(x %in% y, y %in% x)) > > } > > > > Furthermore, replacing 'all(c(x %in% y, y %in x))' with > > 'all(x %in% y) && all(y %in% x)' improves readability even more and, > > more importantly, reduces memory footprint significantly on big vectors > > (e.g. by 15% on integer vectors with 15M elements): > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(x %in% y) && all(y %in% x) > > } > > > > It also seems to speed up things a little bit (not in a significant > > way though). > > > > Cheers, > > H. > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fredhutch.org > > Phone: (206) 667-5791 > > Fax:(206) 667-1319 > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > If you look at the definition of %in%, you'll find that it is implemented > using match, so if we did as you suggest, I give it about three days before > someone suggests to inline the function call... Readability of source code > is not usually our prime concern. > > The && idea does have some merit, though. > > Apropos, why is there no setcontains()? > > -pd > > > On 06 Jan 2015, at 22:02 , Herv� Pag�s wrote: > > > > Hi, > > > > Current implementation: > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) > > } > > > > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > > 0L' > > with 'x %in% y' and 'y %in% x', respectively. They're strictly > > equivalent but the latter form is a lot more readable than the former > > (isn't this the "raison d'�tre" of %in%?): > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(c(x %in% y, y %in% x)) > > } > > > > Furthermore, replacing 'all(c(x %in% y, y %in x))' with > > 'all(x %in% y) && all(y %in% x)' improves readability even more and, > > more importantly, reduces memory footprint significantly on big vectors > > (e.g. by 15% on integer vectors with 15M elements): > > > > setequal <- function (x, y) > > { > > x <- as.vector(x) > > y <- as.vector(y) > > all(x %in% y) && all(y %in% x) > > } > > > > It also seems to speed up things a little bit (not in a significant > > way though). > > > > Cheers, > > H. > > > > -- > > Herv� Pag�s > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fredhutch.org > > Phone: (206) 667-5791 > > Fax:(206) 667-1319 > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
Currently unique() does duplicated() internally and then extracts. One could make a countUnique that simply counts, rather than allocate the logical return value of duplicated(). But so much of the cost is in the hash operation that it probably won't help much, but that might depend on the sizes of things. The more unique elements, the better it would perform. On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty wrote: > How about unique them both and compare the lengths? It's less work, > especially allocation. > > > > Pete > > > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > > > If you look at the definition of %in%, you'll find that it is implemented > > using match, so if we did as you suggest, I give it about three days > before > > someone suggests to inline the function call... Readability of source > code > > is not usually our prime concern. > > > > The && idea does have some merit, though. > > > > Apropos, why is there no setcontains()? > > > > -pd > > > > > On 06 Jan 2015, at 22:02 , Hervé Pagès wrote: > > > > > > Hi, > > > > > > Current implementation: > > > > > > setequal <- function (x, y) > > > { > > > x <- as.vector(x) > > > y <- as.vector(y) > > > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) > > > } > > > > > > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > > > > 0L' > > > with 'x %in% y' and 'y %in% x', respectively. They're strictly > > > equivalent but the latter form is a lot more readable than the former > > > (isn't this the "raison d'être" of %in%?): > > > > > > setequal <- function (x, y) > > > { > > > x <- as.vector(x) > > > y <- as.vector(y) > > > all(c(x %in% y, y %in% x)) > > > } > > > > > > Furthermore, replacing 'all(c(x %in% y, y %in x))' with > > > 'all(x %in% y) && all(y %in% x)' improves readability even more and, > > > more importantly, reduces memory footprint significantly on big vectors > > > (e.g. by 15% on integer vectors with 15M elements): > > > > > > setequal <- function (x, y) > > > { > > > x <- as.vector(x) > > > y <- as.vector(y) > > > all(x %in% y) && all(y %in% x) > > > } > > > > > > It also seems to speed up things a little bit (not in a significant > > > way though). > > > > > > Cheers, > > > H. > > > > > > -- > > > Hervé Pagès > > > > > > Program in Computational Biology > > > Division of Public Health Sciences > > > Fred Hutchinson Cancer Research Center > > > 1100 Fairview Ave. N, M1-B514 > > > P.O. Box 19024 > > > Seattle, WA 98109-1024 > > > > > > E-mail: hpa...@fredhutch.org > > > Phone: (206) 667-5791 > > > Fax:(206) 667-1319 > > > > > > __ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
Try this out. It looks like a 2X speedup for some cases and a wash in others. "unique" does two allocations, but skipping the "> 0L" allocation could make up for it. library(microbenchmark) library(RUnit) x = sample.int(1e4, 1e5, TRUE) y = sample.int(1e4, 1e5, TRUE) set_equal <- function(x, y) { xu = .Internal(unique(x, FALSE, FALSE, NA)) yu = .Internal(unique(y, FALSE, FALSE, NA)) if (length(xu) != length(yu)) { return(FALSE); } return( all(match(xu, yu, 0L) > 0L) ) } set_equal2 <- function(x, y) { xu = .Internal(unique(x, FALSE, FALSE, NA)) yu = .Internal(unique(y, FALSE, FALSE, NA)) if (length(xu) != length(yu)) { return(FALSE); } return( !anyNA(match(xu, yu)) ) } microbenchmark( a = setequal(x, y), b = set_equal(x, y), c = set_equal2(x, y) ) checkIdentical(setequal(x, y), set_equal(x, y)) checkIdentical(setequal(x, y), set_equal2(x, y)) x = y microbenchmark( a = setequal(x, y), b = set_equal(x, y), c = set_equal2(x, y) ) checkIdentical(setequal(x, y), set_equal(x, y)) checkIdentical(setequal(x, y), set_equal2(x, y)) Sorry, I'm probably over-posting today. Regards, [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
I was thinking something like: setequal <- function(x,y) { xu = unique(x) yu = unique(y) if (length(xu) != length(yu)) { return FALSE; } return (all( match( xu, yu, 0L ) > 0L ) ) } This lets you fail early for cheap (skipping the allocation from the ">0L"s). Whether or not this goes fast depends a lot on the uniqueness of x and y and whether or not you want to optimize for the TRUE or FALSE case. You'd do much better to make some real hashes in C and compare the keys, but it's probably not worth the complexity. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty wrote: > How about unique them both and compare the lengths? It's less work, > especially allocation. > > > > Pete > > > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > >> If you look at the definition of %in%, you'll find that it is implemented >> using match, so if we did as you suggest, I give it about three days before >> someone suggests to inline the function call... Readability of source code >> is not usually our prime concern. >> >> The && idea does have some merit, though. >> >> Apropos, why is there no setcontains()? >> >> -pd >> >> > On 06 Jan 2015, at 22:02 , Herv� Pag�s wrote: >> > >> > Hi, >> > >> > Current implementation: >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) >> > } >> > >> > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) >> > 0L' >> > with 'x %in% y' and 'y %in% x', respectively. They're strictly >> > equivalent but the latter form is a lot more readable than the former >> > (isn't this the "raison d'�tre" of %in%?): >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(c(x %in% y, y %in% x)) >> > } >> > >> > Furthermore, replacing 'all(c(x %in% y, y %in x))' with >> > 'all(x %in% y) && all(y %in% x)' improves readability even more and, >> > more importantly, reduces memory footprint significantly on big vectors >> > (e.g. by 15% on integer vectors with 15M elements): >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(x %in% y) && all(y %in% x) >> > } >> > >> > It also seems to speed up things a little bit (not in a significant >> > way though). >> > >> > Cheers, >> > H. >> > >> > -- >> > Herv� Pag�s >> > >> > Program in Computational Biology >> > Division of Public Health Sciences >> > Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N, M1-B514 >> > P.O. Box 19024 >> > Seattle, WA 98109-1024 >> > >> > E-mail: hpa...@fredhutch.org >> > Phone: (206) 667-5791 >> > Fax:(206) 667-1319 >> > >> > __ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd@cbs.dk Priv: pda...@gmail.com >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup
On 01/08/2015 01:30 PM, peter dalgaard wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... But you wouldn't bet money on that right? Because you know you would loose. Readability of source code is not usually our prime concern. Don't sacrifice readability if you do not have a good reason for it. What's your reason here? Are you seriously suggesting that inlining makes a significant difference? As Michael pointed out, the expensive operation here is the hashing. But sadly some people like inlining and want to use it everywhere: it's easy and they feel good about it, even if it hurts readability and maintainability (if you use x %in% y instead of the inlined version, the day someone changes the implementation of x %in% y for something faster, or fixes a bug in it, your code will automatically benefit, right now it won't). More simply put: good readability generally leads to better code. The && idea does have some merit, though. Apropos, why is there no setcontains()? Wait... shouldn't everybody use all(match(x, y, nomatch = 0L) > 0L) ? H. -pd On 06 Jan 2015, at 22:02 , Hervé Pagès wrote: Hi, Current implementation: setequal <- function (x, y) { x <- as.vector(x) y <- as.vector(y) all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) } First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) > 0L' with 'x %in% y' and 'y %in% x', respectively. They're strictly equivalent but the latter form is a lot more readable than the former (isn't this the "raison d'être" of %in%?): setequal <- function (x, y) { x <- as.vector(x) y <- as.vector(y) all(c(x %in% y, y %in% x)) } Furthermore, replacing 'all(c(x %in% y, y %in x))' with 'all(x %in% y) && all(y %in% x)' improves readability even more and, more importantly, reduces memory footprint significantly on big vectors (e.g. by 15% on integer vectors with 15M elements): setequal <- function (x, y) { x <- as.vector(x) y <- as.vector(y) all(x %in% y) && all(y %in% x) } It also seems to speed up things a little bit (not in a significant way though). Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] New version of Rtools for Windows
Regarding the redefinition error, I've asked on StackOverflow for advice [1], but I have noticed the following; perhaps someone here can understand what changed between the stdio.h of 4.6.3 and the stdio.h of 4.8.4. In GCC 4.8.4, the section of stdio.h which is referenced in the errors is the following: #if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0 /* this is here to deal with software defining * vsnprintf as _vsnprintf, eg. libxml2. */ #pragma push_macro("snprintf") #pragma push_macro("vsnprintf") # undef snprintf # undef vsnprintf int __cdecl __ms_vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) __MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN; __mingw_ovr __MINGW_ATTRIB_NONNULL(3) int vsnprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, va_list __local_argv) { return __ms_vsnprintf (__stream, __n, __format, __local_argv); } int __cdecl __ms_snprintf(char * __restrict__ s, size_t n, const char * __restrict__ format, ...); #ifndef __NO_ISOCEXT __mingw_ovr __MINGW_ATTRIB_NONNULL(3) int snprintf (char * __restrict__ __stream, size_t __n, const char * __restrict__ __format, ...) { register int __retval; __builtin_va_list __local_argv; __builtin_va_start( __local_argv, __format ); __retval = __ms_vsnprintf (__stream, __n, __format, __local_argv); __builtin_va_end( __local_argv ); return __retval; } #endif /* !__NO_ISOCEXT */ #pragma pop_macro ("vsnprintf") #pragma pop_macro ("snprintf") #endif The corresponding section in 4.6.3 as found in the Rtools for Windows installation is: #if !defined (__USE_MINGW_ANSI_STDIO) || __USE_MINGW_ANSI_STDIO == 0 /* this is here to deal with software defining * vsnprintf as _vsnprintf, eg. libxml2. */ #pragma push_macro("snprintf") #pragma push_macro("vsnprintf") # undef snprintf # undef vsnprintf int __cdecl vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) __MINGW_ATTRIB_DEPRECATED_MSVC2005 __MINGW_ATTRIB_DEPRECATED_SEC_WARN; #ifndef __NO_ISOCEXT int __cdecl snprintf(char * __restrict__ s, size_t n, const char * __restrict__ format, ...); #ifndef __CRT__NO_INLINE __CRT_INLINE int __cdecl vsnprintf(char * __restrict__ d,size_t n,const char * __restrict__ format,va_list arg) { return _vsnprintf (d, n, format, arg); } #endif /* !__CRT__NO_INLINE */ #endif /* !__NO_ISOCEXT */ #pragma pop_macro ("vsnprintf") #pragma pop_macro ("snprintf") #endif The latter does not have a direct redefinition of the two functions. I still don't know why the #undef calls do not work [1]. Thank you, Avi [1] https://stackoverflow.com/questions/27853225/is-there-a-way-to-include-stdio-h-but-ignore-some-of-the-functions-therein On Thu, Jan 8, 2015 at 2:27 PM, Hin-Tak Leung wrote: > Oh, I forgot to mention that besides setting AR, RANLIB and the stack probing > fix, you also need a very up to date binutils. 2.25 was out in december. Even > with that , if you linker's default is not what you are compiling for (i.e. a > multiarch toolchain), you need to set GNUTARGET also, i.e. -m32/-m64 is not > enough. Some fix to autodetect non-default targets went in after christmas > before the new year, but I am not brave enough to try that on a daily basis > yet (only tested it and reported it, then reverting the change - how gcc > invokes the linker is rather complicated and it is not easy to have two > binutils installed...)- setting GNUTARGET seems safer :-). > Whether you need that depends on whether you are compiling for your > toolchain's default target architecture. > > AR, RANLIB, GNUTARGET are all environment variables - you set them the usual > way. The stack probing fix is for passing "make check", when you finish make. > > -- > On Thu, Jan 8, 2015 6:14 PM GMT Avraham Adler wrote: > >>On Thu, Jan 8, 2015 at 10:48 AM, Hin-Tak Leung >> wrote: >>> >>> The r.dll crash is easy - you need to be using gcc-ar for ar, and >>> gcc-ranlib for ranlib. I also posted a patch to fix the check failure for >>> stack probing, as lto optimizes away the stack probing code, as it should. >>> >>> yes, lto build's speed gain is very impressive. >>> >> >> >>I apologize for my ignorance, but how would I do that? I tried by >>changing the following in src/gnuwin32/MkRules.local: >> >># prefix for 64-bit: path or x86_64-w64-mingw32- >>BINPREF64 = x86_64-w64-mingw32-gcc- >> >>I added the gcc- as the suffix there, but I guess that is insufficient >>as I still get the following error using 4.9.2: >> >>windres -F pe-x86-64 -I../include -i dllversion.rc -o dllversion.o >>gcc -std=gnu99 -m64 -shared -s -mwindows -o R.dll R.def console.o >>dynload.o editor.o embeddedR.o extra.o opt.o pager.o preferences.o >>psignal.o rhome.o rt_complete.o rui.o run.o shext.o sys-win32.o >>system.o dos_wglob.o malloc.o ../main/libmain.a ../appl/libappl.a >>../nmath/libnmath.a getline/gl.a ../