Re: [Rd] Attributes of 1st argument in ...
Hi Daniel, On 02.07.2010, at 23:26, Daniel Murphy wrote: > I am trying to get an attribute of the first argument in a call to a > function whose formal arguments consist of dots only and do something, e.g., > call 'cbind', based on the attribute > f<- function(...) {get first attribute; maybe or maybe not call 'cbind'} > > I thought of (ignoring "deparse.level" for the moment) > > f<-function(...) {x <- attr(list(...)[[1L]], "foo"); if (x=="bar") > cbind(...) else x} what about using the somewhat obscure ..1 syntax? This version runs quite a bit faster for me: g <- function(...) { x <- attr(..1, "foo") if (x == "bar") cbind(...) else x } but it will be hard to quantify how this pans out for your unless we know how many and what size and type the arguments are. Cheers, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Attributes of 1st argument in ...
Hi Daniel, On 02.07.2010, at 23:26, Daniel Murphy wrote: > I am trying to get an attribute of the first argument in a call to a > function whose formal arguments consist of dots only and do something, e.g., > call 'cbind', based on the attribute > f<- function(...) {get first attribute; maybe or maybe not call 'cbind'} > > I thought of (ignoring "deparse.level" for the moment) > > f<-function(...) {x <- attr(list(...)[[1L]], "foo"); if (x=="bar") > cbind(...) else x} what about using the somewhat obscure ..1 syntax? This version runs quite a bit faster for me: g <- function(...) { x <- attr(..1, "foo") if (x == "bar") cbind(...) else x } but it will be hard to quantify how this pans out for your unless we know how many and what size and type the arguments are. Cheers, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] transpose of complex matrices in R
Hi, On 30.07.2010, at 11:35, Robin Hankin wrote: > 3. Try to define a t.complex() function: > t.complex <- function(x){t(Conj(x))} > (also fails because of recursion) Try this version: t.complex <- function(x) { xx <- Conj(x) .Internal(t.default(xx)) } You get infinite recursion in your example because you keep dispatching on the (complex) result of Conj(x) in t(Conj(x)). I'm not sure if the use of .Internal in user code is sanctioned but it does work for me. Cheers, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Surprising behavior of Negate()
Dear R-developers, I find the current behavior of Negate() somewhat confusing. It does not match the passed function 'f' until the returned function is called for the first time. To see an example of what this can do see the following (contrived) example: f <- function(x) is.integer(x) not_f <- Negate(f) f <- function(x) is.character(x) ## Both should, in my mind, return TRUE: not_f(1) == !is.integer(1) not_f(1L) == !is.integer(1L) I propose to change Negate() in the following way: ## Easy 'fix': Negate <- function(f) { f <- match.fun(f) function(...) !f(...) } This matches 'f' when Negate() is called and not the first time the return value is used. If the current behavior is desired, maybe a note in the documentation could be added to clarify this peculiarity. Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Speed improvement for Find() and Position()
Dear R-developers, both Find() and Position() (as the documentation mentions) are currently not optimized in any way. I have rewritten both functions in a more efficient manner by replacing the sapply() with a for() loop that terminates early if a match is found. Here is a patch against the current subversion HEAD http://www.statistik.tu-dortmund.de/~olafm/temp/fp.patch and here are some numbers to show that this change is worth while: % cat fp_bench.R set.seed(42) pred <- function(z) z == 1 for (n in c(10^(2:4))) { x <- sample(1:n, 2*n, replace=TRUE) tf <- system.time(replicate(1000L, Find(pred, x))) message(sprintf("Find: n=%5i user=%6.3f system=%6.3f", 2*n, tf[1], tf[2])) tp <- system.time(replicate(1000L, Find(pred, x))) message(sprintf("Position: n=%5i user=%6.3f system=%6.3f", 2*n, tp[1], tp[2])) } ## Unpatched R: % Rscript fp_bench.R Find: n= 200 user= 0.491 system= 0.015 Position: n= 200 user= 0.477 system= 0.014 Find: n= 2000 user= 4.450 system= 0.083 Position: n= 2000 user= 4.507 system= 0.094 Find: n=2 user=63.435 system= 1.497 Position: n=2 user=63.130 system= 1.328 ## Patched R: % ./bin/Rscript fp_bench.R Find: n= 200 user= 0.101 system= 0.013 Position: n= 200 user= 0.085 system= 0.003 Find: n= 2000 user= 0.781 system= 0.002 Position: n= 2000 user= 0.809 system= 0.012 Find: n=2 user=20.537 system= 0.394 Position: n=2 user=20.502 system= 0.404 Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tabulate() does not check for input bounds
Dear Simone, On 04.10.2010, at 01:01, Simone Giannerini wrote: > it looks like that tabulate() does not check for the bounds of the input. > Reproducible example: > >> b <- 1:2 >> tabulate(b[1:100]) > [1] 1 1 this looks perfectly reasonable. Consider the result of > b <- 1:2 > b[1:100] [1] 1 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA and check the help page for tabulate (esp. the na.rm argument). What was your expected result? Cheers, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] list comprehension to create an arbitrary-sized list with arbitrary names/values
Hi, On 13.10.2010, at 21:26, Steve Kim wrote: > mydict = dict([(keyfun(x), valfun(x)) for x in mylist]) > > to create a dictionary with whatever keys and values we want from an > input list of arbitrary size. In R, I want to similarly create a list > with names/values that are generated by some keyfun and valfun > (assuming that keyfun is guaranteed to return something suitable as a > name). How can I do this? Try something like this: mydict <- lapply(mylist, valfun) names(mydict) <- sapply(mylist, keyfun) or mydict <- structure(lapply(mylist, valfun), names=sapply(mylist, keyfun)) Cheers Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Possible bug in R parser
Dear R developers, A recent typo led me to discover, that R is happy to accept > 20x2 [1] 20 as input. This appears to be related to the parsing of hexadecimal constants, since there must be a zero before the 'x' (i.e. 2x2 or 02x02 gives the expected error). All this is under R 2.12.1 on both OS X and Linux. Is this expected behavior? Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] function call overhead
Dear Hadly, dear list, On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham wrote: > I wondered about this statement too but: > >> system.time(replicate(1e4, base::print)) > user system elapsed > 0.539 0.001 0.541 >> system.time(replicate(1e4, print)) > user system elapsed > 0.013 0.000 0.012 These timings are skewed. Because I too have wondered about this in the past, I recently published the microbenchmark package which tries hard to accurately time it takes to evaluate some expression(s). Using this package I get: > library("microbenchmark") > res <- microbenchmark(print, base::print, times=1) > res Unit: nanoeconds ## I've fixed the typo, but not pushed to CRAN minlq medianuq max print 576568.069 48389 base::print 41763 43357 44278.5 48403 4749851 A better way to look at this is by converting to evaluations per second: > print(res, unit="eps") Unit: evaluations per second min lq median uqmax print 17543859.65 15384615.38 14705882.35 14492753.62 20665.8538 base::print23944.6423064.3322584.3220659.88 210.5329 Resolving 23000 names per second or ~15M ist quite a dramatic difference in my world. The timings obtained by > system.time(replicate(1e4, base::print)) User System verstrichen 0.475 0.006 0.483 > system.time(replicate(1e4, print)) User System verstrichen 0.011 0.001 0.014 are skewed by the overhead of replicate() in this case because the execution time of the expression under test is so short. Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] model.matrix memory problem (PR#13838)
Hi, Excerpts from Torsten.Hothorn's message of Thu Jul 16 17:20:10 +0200 2009: > `model.matrix' might kill R with a segfault (on a illposed problem, but > anyway): > > mydf <- as.data.frame(sapply(1:40, function(i) gl(2, 100))) > f <- as.formula(paste("~ - 1 + ", paste(names(mydf), collapse = ":"), sep = > "")) > X <- model.matrix(f, data = mydf) > > *** caught segfault *** > address 0x18, cause 'memory not mapped' > Segmentation fault I've taken a look at this. The problem lies in lines 1784 - 1798 of src/main/model.c. What happens is that 'k' overflows (signed int). That means k is 0 after the loop an nc is set to 0. That means the allocated model matrix 'x' is too small which results in the observed segfault. I can provide a patch which checks for overflow and throws an error if that is the desired behaviour. Greetings, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improvement of [dpq]wilcox functions
Hi Ivo, Excerpts from Ivo Ugrina's message of Thu Jul 09 17:05:27 +0200 2009: > I believe I have significantly improved [dpq]wilcox > functions by implementing Harding's algorithm: > Harding, E.F. (1984): An Efficient, Minimal-storage Procedure > for Calculating the Mann-Whitney U, Generalized U and Similar > Distributions, App. Statist., 33, 1-6 I've looked at your code and it is indeed quite a bit faster. Sadly in some cases it produces inaccurate results. See for example: R 2.9.1: > sum(dwilcox(1:(500*100), 500, 100)) [1] 1 R 2.9.1 + your patch: > sum(dwilcox(1:(500*100), 500, 100)) [1] 1.001377 > sum(dwilcox(1:(500*200), 500, 200)) [1] -132443.2 > sum(dwilcox(1:(1000*200), 1000, 200)) [1] 3.412391e+13 The last two examples run out of memory on my machine in an unpatched R. I think if you can sort out the numerical issuses your patch would be interesting since it is indeed quite a bit faster than the current implementation. Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Using svSocket with data.table
Hi Matthew, Excerpts from Matthew Dowle's message of Sat Jul 25 09:07:44 +0200 2009: > So I'm looking to do the same as the demo, but with a binary socket. Does > anyone have any ideas? I've looked a bit at Rserve, bigmemory, biocep, nws > but although all those packages are great, I didn't find anything that > worked in exactly this way i.e. i) R to R ii) CLI non-blocking and iii) no > need to startup R in a special way Don't be fooled. R does not handle multiple requests in parallel internally. Also I suspect that, depending on what you do on the CLI, this will interact badly with svSocket. As far as binary transfer of R objects goes, you are probably looking for serialize() and unserialize(). Not sure if these are guaranteed to work across differen versions of R and different word sizes. See the Warnings section in the serialize manual page. Cheers Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Small documentation fix for [.data.frame
Hello, in the manual page for [.data.frame it reads: ... There is a method for replacement which checks \code{value} for the corrupt number of row, and replicates it if necessary. ... This should probably read ... There is a method for replacement which checks \code{value} for the correct number of rows, and replicates it if necessary. ... A trivial patch changing this is can be found here: http://www.statistik.tu-dortmund.de/~olafm/temp/edf_doc.patch Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fix for incorrect use of restrict in xz third party code
Hello, the included XZ Utils source code contains an incorrect use of the restrict keyword. This leads to data corruption under certain circumstances. For a short discussion of the problem see http://sourceforge.net/projects/lzmautils/forums/forum/708858/topic/3306733 This was fixed in the XZ Utils git repository in commit commit 49cfc8d392cf535f8dd10233225b1fc726fec9ef Author: Lasse Collin Date: Tue Sep 15 21:07:23 2009 +0300 Fix incorrect use of "restrict". Since then, there has not been a proper release of the XZ Utils so I have applied said patch to the sources included in R and added a note to the R_changes file in the src/extra/xz/ directory detailing the changes. This 'bug' is only triggered if the Intel C or gcc 4.4 is used to compile R and the included liblzma is used instead of a system wide one, so it might not be worth the trouble of patching the sources instead of waiting for a new release. If anyone wants to apply a fix, I have prepared a patch with all the changes which can be found here http://www.statistik.tu-dortmund.de/~olafm/temp/xz_restrict.patch Cheers, Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug in memDecompress()
Dear R developers, I have discovered a bug in the implementation of lzma decompression in memDecompress(). It is only triggered if the uncompressed size of the content is more than 3 times as large as the compressed content. Here's a simple example to reproduce it: n <- 200 char <- paste(replicate(n, "1234567890"), collapse="") char.comp <- memCompress(char, type="xz") char.dec <- memDecompress(char.comp, type="xz", asChar=TRUE) nchar(char.dec) == nchar(char) raw <- serialize(char, connection=NULL) raw.comp <- memCompress(raw, type="xz") raw.dec <- memDecompress(raw.comp, type="xz") length(raw.dec) == length(raw) char.uns <- unserialize(raw.dec) The root cause seems to be, that lzma_code() will return LZMA_OK even if it could not decompress the whole content. In this case strm.avail_in will be greater than zero. The following patch changes the respective if statements: http://www.statistik.tu-dortmund.de/~olafm/temp/memdecompress.patch It also contains a small fix from the xz upstream for an uninitialized field in lzma_stream. Cheers, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] DTrace probes for R
I've integrated some DTrace [1] probes into R. Namely a probe which fires on fuction entry and return and one which fires before / after a garbage collection. Is there any interest in merging something like this into R-devel? If yes, I'd like to discuss which probes and what data would be useful / interesting from a developers standpoint. Greetings from Dortmund, Olaf Mersmann [1] http://www.sun.com/bigadmin/content/dtrace/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Patch to fix small bug in do_External and do_dotcall
I've stumbled upon a small bug/inconsitency in do_External and do_dotcall: Here's an example: % LC_ALL=C R --vanilla < symname-bug.R R version 2.8.0 (2008-10-20) *snip* > options(error=expression(0)) > ## Call 'R_GD_nullDevice' with incorrect parameter count: > .Call("R_GD_nullDevice", 1) Error in .Call("R_GD_nullDevice", 1) : Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice > > ## Same call made via a NativeSymbolInfo object: > sym <- getDLLRegisteredRoutines("grDevices")$.Call[["R_GD_nullDevice"]] > .Call(sym$address, 1) Error: 'getEncChar' must be called on a CHARSXP The error stems from the fact, that both do_External and do_dotcall expect CAR(args) to be a string, while it might be a NativeSymbolInfo object. checkValidSymbolId() already handles this, so the fix is to use the symbol name returned from resolveNativeRoutine(). After applying the attached patch (against R-trunk revision 47348) the output looks like this: % LC_ALL=C bin/R --vanilla < symname-bug.R R version 2.9.0 Under development (unstable) (2008-12-26 r47348) *snip* > options(error=expression(0)) > ## Call 'R_GD_nullDevice' with incorrect parameter count: > .Call("R_GD_nullDevice", 1) Error in .Call("R_GD_nullDevice", 1) : Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice > > ## Same call made via a NativeSymbolInfo object: > sym <- getDLLRegisteredRoutines("grDevices")$.Call[["R_GD_nullDevice"]] > .Call(sym$address, 1) Error in .Call(sym$address, 1) : Incorrect number of arguments (1), expecting 0 for R_GD_nullDevice Greetings from Dortmund Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Patch to fix small bug in do_External and do_dotcall
Excerpts from Prof Brian Ripley's message of Sat Dec 27 06:59:24 +0100 2008: > Thank you, but can we see the patch please (no attachement arrived)? I've posted them online: http://www.statistik.tu-dortmund.de/~olafm/files/symname-bug.R http://www.statistik.tu-dortmund.de/~olafm/files/symname-bug.patch Sorry for the inconvenience, not sure why the attachments got lost. Greetings from Dortmund, Olaf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Patch to fix small bug in do_External and do_dotcall
Excerpts from Prof Brian Ripley's message of Sun Dec 28 15:03:28 +0100 2008: > Thank you. You do realize that your example is not passing a > NativeSymbolInfo object, don't you? I believe the intention is that you > pass 'sym'. Yes, I had originally used sym and later changed it to sym$address to verify, that the fix also works for case b) described in the comment above checkValidSymboldId(), namly passing in a pointer to a function. I'm not sure if this is currently a 'supported' method of calling a function, since it is only mentioned in the man page for getNativeSymbolInfo() and not in the .Call() man page. > I'll incoporate the patch once I have worked out an accuracy description > of what it does The behavior before the patch is to assume that the head of args is a string. When it is anything else, the call to translateChar() when deriving the function name for the error message fails. Instead of dealing with each possible type, the patch reuses the function name that was returned by resolveNativeRoutine() which in turn calls checkValidSymbolId() (all defined in dotcode.c). If CAR(args) is a string, checkValidSymboldId simply returns and resolveNativeRoutine() copies the name into buf. If CAR(args) is a NativeSymbolInfo object, checkValidSymbolId() recalls itself with the second element of the NativeSymbolInfo object (its address member). Lastly if CAR(args) is a EXTRPTRSXP (the type of the address member of a NativeSymbolInfo object) checkValidSymboldId() extracts the symbol name from the EXTPTRSXP if it is a registered symbol. This is the only loophole. If I where to pass an address to .Call() or .External() which was a "native symbol" but not a "registered native symbol", the buffer holding the function name would never be filled. I'm not sure how to deal with this corner case. One option would be to copy some (un)descriptive like '' into the buffer. If this is acceptable I can add it and post a revised patch. Greetings from Dortmund Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Logical Error? (PR#13516)
Excerpts from camey's message of Tue Feb 10 15:55:04 +0100 2009: > Using the commands bellow I expected that the answer is TRUE, but it is FALSE! > > P_exposicao=.9 > (1-P_exposicao)==.1 Look at the difference of the two, it is much smaller than .Machine$double.eps on my computer. This is not a bug, it's due to the limited precision of floating point numbers. Sincerely Olaf Mersmann __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel