[Rd] Postscript can be very slow in R 2.13-0
Viewing certain postscript files under R 2.13-0 is very slow, at least using GSview on my 64 bit Windows 7 system. To see this, compare how long it takes to display these two files (fast.ps was created by removing the /srgb stuff in .ps.prolog): www.milbo.org/postscript/slow.ps www.milbo.org/postscript/fast.ps The code to produce the above two files is here: www.milbo.org/postscript/slow-postscript.R This is not exactly a bug, but is it a known issue? The hack used to produce fast.ps above is fragile and not suitable for regular use. Is there a option to the postscript() function (or could one be added) that would sacrifice sRGB support for speed? Stephen Milborrow www.milbo.users.sonic.net Version info: I'm using GSview 4.9 and Ghostscript 8.64. R.version _ platform x86_64-pc-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 2 minor 13.0 year 2011 month 04 day13 svn rev55427 language R version.string R version 2.13.0 (2011-04-13) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Guidelines for S3 regression models
Given how much documentation is available on R coding in general, it is surprising how little is available specifically on writing model code. Researchers who come up with a new method of regression, and who want to write an S3 model for that method, must currently go all the way back to the Venables and Ripley S programming book. On 26.06.2015 14:09, Stephen Milborrow wrote: > Once we have built a regression model, we typically want to use the > model for further processing, such as making predictions from the model > or plotting the residuals. Unfortunately, for many packages on CRAN > this can be difficult. > > For example, some models don't have a residuals method and don't save > the call or data --- so you can't tell how to generate the residuals > from the model object itself. > > A common snag is that for some models the new data for predict() has to > be a matrix; for others it has to be a data.frame. This places an > unnecessary burden on the user when both data.frames and matrices can > easily be supported by predict. > > To mitigate such issues, I'm going out on a limb and presenting some > guidelines for writers of S3 regression model functions (this document > is currently part of the plotmo package): > http://www.milbo.org/doc/modguide.pdf On 26.06.2015 16:41, Achim Zeileis wrote: I think this is a nice and useful starting point. It's probably not comprehensive (yet) but will surely help. You could add something more about writing the formula interface and the correct processing of model.frame, terms, model.response, model.matrix, model.weights, model.offset. Especially for models with linear predictors the latter two can be very useful and are often not hard to implement. In case the model has multiple parts or multiple responses, the "Formula" package (and its vignette) might also be helpful. As for the S3 methods, I would omit coefficients, fitted.values, and resid from the list. These dispatch to coef, fitted, and residuals anyway. For inference it would also be very useful to add nobs(), df.residual(), vcov(), and logLik() and/or deviance() where applicable. An overview which lists some (but not all) useful methods is in Table 1 of vignette("betareg", package = "betareg"). For coef() and vcov() it is useful/important that the names and dimension match. Then Wald tests can be easily computed in functions like car::linearHypothesis(), car::deltaMethod(), lmtest::waldtest(), or lmtest::coeftest(). Thanks for these, I'll update the document. Stephen Milborrow __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] legitimate use of :::
To avoid the NOTEs (which often triggers a 'pls fix' upon submission to CRAN), I simply copied/pasted these functions to my package, but this seems wasteful. An issue is how one acknowledges the author of the cut and pasted code. Assume that for one reason or another the original function can't easily be made available via NAMESPACEs and so to avoid CRAN submission complaints you are forced into cut and copying. If the copied code is somewhat substantial a short acknowledgment on a man page is insufficient. As far as I know the only alternative to list the author of the copied code in the Author field of your DESCRIPTION file. This makes the Author field ungainly (especially if the copied-from package has itself a lengthy Author field) and gives disproportionate credit to the author of the copied code over those in Depends. It would be good if there was a better mechanism in the DESCRIPTION file for acknowledging authors you copied code from. Tricky, lots of grey areas. On balance I think I'm in the camp of those who say ::: should allowed. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Modifying R_CheckStack for a speed increase
Greetings R developers, R will run a little faster when executing "pure R" code if the function R_CheckStack() is modified. With the modification, the following code for example runs 15% faster (compared to a virgin R-2.5.1 on my Windows XP machine): N = 1e7 foo <- function(x) { for (i in 1:N) x <- x + 1 x } foo(0) The crux of the modification is to change the following line in R_CheckStack() if(R_CStackLimit != -1 && usage > 0.95 * R_CStackLimit) {... to if(usage > R_CStackLen) { ... Details and modified sources can be found at ftp://ftp.sonic.net/pub/users/milbo. Regards, Stephen http://milbo.users.sonic.net __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Just-in-time compiler for R
Greetings R developers! I have been working on a just-in-time compiler for R arithmetic expressions. An example: jit(1) # turn on just-in-time compilation for(i in 1:na)# example convolution code for(j in 1:nb) ab[i + j] <- ab[i + j] + a[i] * b[j] The loop will run about 30% faster. That's not much of a speedup, but the code is still in early development and the figure will get much better. If you are interested there is more information at www.milbo.users.sonic.net/ra. Stephen Milborrow www.milbo.users.sonic.net __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] gctorture and proc.time (PR#10600)
I'm not sure if this is connected but in R2.6.1 do_proctime is missing some PROTECTs. The current code is SEXP ans = allocVector(REALSXP, 5), nm = allocVector(STRSXP, 5); and should be SEXP ans, nm; PROTECT(ans = allocVector(REALSXP, 5)); PROTECT(nm = allocVector(STRSXP, 5)); A good way of finding these missing PROTECTs is to write say 0xEE to memory as it is freed (i.e. one must modify the source of memory.c). Then with gc torture enabled it more obvious if someone uses a just freed SEXP. There is a similar issue in a few other places which I haven't tracked down yet which I will submit to r-bugs in due course. These were found using the 0xEE trick and running make check-all. Steve www.milbo.users.sonic.net - Original Message - From: "Peter Dalgaard" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, January 22, 2008 9:56 AM Subject: Re: [Rd] gctorture and proc.time (PR#10600) [EMAIL PROTECTED] wrote: > In R version 2.6.1 (2007-11-26) > and R version 2.6.1 Patched (2008-01-19 r44061) > on openSUSE 10.2 (X86-64) > > >> gctorture() >> proc.time() >> > Error: protect(): protection stack overflow > > The problem with this is that then > > R CMD check --use-gct foo > > ALWAYS FAILS with > > >> cat("Time elapsed: ", proc.time() - get("ptime", pos = >> 'CheckExEnv'),"\n") >> > Error in proc.time() - get("ptime", pos = "CheckExEnv") : > non-numeric argument to binary operator > > This does not happen in R version 2.4.1 (2006-12-18) > > I was going to have my computing class try out --use-gct. > I guess not until this is fixed. > > I can reproduce this on SUSE 10.2 64 bit and Fedora 7 64 bit, but not on SUSE 10.3 32 bit and Fedora 8 32 bit. (The OS versions are likely not relevant, I bet it is a 64 bit issue somewhere). -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ NOD32 2813 (20080122) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Just-in-time compiler for R
There is a new version of the just-in-time compiler for R at www.milbo.users.sonic.net/ra/index.html With just-in-time compilation enabled, the convolution example from the "Extending R" manual now runs about 30 times faster. The web page has more information. Stephen Milborrow www.milbo.users.sonic.net __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] HOW TO AVOID LOOPS
Le sam. 12 avr. à 12:47, carlos martinez a écrit : Looking for a simple, effective a minimum execution time solution. For a vector as: c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) To transform it to the following vector without using any loops: (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6) Here is a fast solution using the Ra just-in-time compiler www.milbo.users.sonic.net/ra. jit(1) if (length(x) > 1) for (i in 2:length(x)) if (x[i]) x[i] <- x[i-1] + 1 The times in seconds for various solutions mailed to r-devel are listed below. There is some variation between runs and with the contents of x. The times shown are for set.seed(1066); x <- as.double(runif(1e6) > .5) This was tested on a WinXP 3 GHz Pentium D with Ra 1.0.7 (based on R 2.6.2). The code to generate these results is attached. vin 24 greg 11 had3.9 dan1.4 dan2 1.4 jit 0.25# code is shown above, 7 secs with standard R 2.6.2> Stephen Milborrow www.milbo.users.sonic.net # cm-post.R: compare solutions to the following post to #r-devel from carlos martinez 12 apr 2008: # Looking for a simple, effective a minimum execution time solution. # For a vector as: # c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) # To transform it to the following vector without using any loops: # c(0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6) set.seed(1066) # for reproducibility N <- 1e6 x <- as.double(runif(N) > .5) x[1] <- 0 # seems to be needed for fhad (and fvin?) fvin <- function(x) { ind <- which(x == 0) unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))), function(y) cumsum(x[y]))) } fdan <- function(x) { d <- diff(c(0,x,0)) starts <- which(d == 1) ends <- which(d == -1) x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n)) x } fdan2 <- function(x) { runs <- rle(x) runlengths <- runs$lengths[runs$values == 1] x[x == 1] <- unlist(lapply(runlengths, function(n) 1:n)) x } fhad <- function(x) unlist(lapply(split(x, cumsum(x == 0)), seq_along)) - 1 # following requires "ra" for fast times www.milbo.users.sonic.net/ra library(jit) fjit <- function(x) { jit(1) if (length(x) > 1) for (i in 2:length(x)) if (x[i]) x[i] <- x[i-1] + 1 x } fgreg <- function(x) Reduce( function(x,y) x*y + y, x, accumulate=TRUE ) fanon <- function(x) x * unlist(lapply(rle(x)$lengths, seq)) cat("times with N =", N, "\n") cat("dan", system.time(ydan <- fdan(x))[3], "\n") cat("dan2", system.time(ydan2 <- fdan2(x))[3], "\n") cat("had", system.time(yhad <- fhad(x))[3], "\n") cat("vin", system.time(yvin <- fvin(x))[3], "\n") cat("jit", system.time(yjit <- fjit(x))[3], "\n") cat("greg", system.time(ygreg <- fgreg(x))[3], "\n") # very slow cat("anon", system.time(yanon <- fanon(x))[3], "\n") stopifnot(identical(ydan2, ydan)) stopifnot(identical(as.numeric(yhad), ydan)) stopifnot(identical(yvin, ydan)) stopifnot(identical(yjit, ydan)) stopifnot(identical(ygreg, ydan)) # stopifnot(identical(yanon, ydan)) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 0 ^ NaN == Inf, why?
In R, 0 ^ NaN yields Inf. I would have expected NaN or perhaps 0. Is this behaviour intended? >sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Steve Milborrow www.milbo.users.sonic.net __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 0 ^ NaN == Inf, why?
It may be too late to make such changes now, but would it not be simpler if R ^ and R pow (and related functions like log) when the return type is double simply call the standard C library routine pow? That way we would be compatible with IEEE 754, with the fair assumption that the C library's pow is compatible with IEEE 754 (this I believe is a fair assumption in 2008, may not have been historically). Having special cases in the code for x==0 etc. is unnecessarily complicated, and returning NA in some cases and NaN in others introduces further complication. But I may be missing something. I think this is basically what you are saying with "Other things being equal, `^` should follow the C pow() function for numeric arguments." When I was building my just-in-time compiler for R, I looked into some discrepancies between R and IEEE 754. My tests were not exhaustive because the jit compiler in many cases uses the same C math routines as standard R, so I didn't bother to check those for compatibility. But if you are interested, the results are summarized in tables in the jit man page www.milbo.users.sonic.net/ra/jit.html. I would be prepared to sign up for doing a full test of incompatiblities between R math and IEEE 754, if people think time spent doing that is worth it. Steve www.milbo.users.sonic.net - Original Message - From: "John Chambers" <[EMAIL PROTECTED]> To: "Stephen Milborrow" <[EMAIL PROTECTED]> Cc: Sent: Saturday, October 25, 2008 7:55 PM Subject: Re: [Rd] 0 ^ NaN == Inf, why? A small PS: John Chambers wrote: > > Along the line, notice that both R_pow and pow give 0^0 as 1. (Just at > a guess, C might give 0^-0 as Inf, but I don't know how to test that in > R.) > I tried a little harder, and apparently the guess is wrong. It seems that pow(0, -0) is 1 in C. Would seem better to either have pow(0,0) and pow(0,-0) both be NaN or else 1 and Inf, but ... > John ----- Original Message - From: "John Chambers" <[EMAIL PROTECTED]> To: "Stephen Milborrow" <[EMAIL PROTECTED]> Cc: Subject: Re: [Rd] 0 ^ NaN == Inf, why? Stephen Milborrow wrote: In R, 0 ^ NaN yields Inf. I would have expected NaN or perhaps 0. Is this behaviour intended? Well, it certainly follows from the implementation. In the R_pow C routine, double R_pow(double x, double y) /* = x ^ y */ { if(x == 1. || y == 0.) return(1.); if(x == 0.) { if(y > 0.) return(0.); /* y < 0 */ return(R_PosInf); } It does seem, however, that NaN is the logical result here, which I think results from changing the implementation to: if(x == 0.) { if(y > 0.) return(0.); else if(y < 0) return(R_PosInf); else return(y); /* NA or NaN, we assert */ } Other things being equal, `^` should follow the C pow() function for numeric arguments. After writing pow() as an R function that calls this C function: > pow(0,NaN) [1] NaN > pow(0,NA) [1] NA > pow(0,0) [1] 1 The second example presumably falls out because the C function returns its second argument if that is a NaN (which a numeric NA is, in C but not in R). The modified code in R_pow mimics that behavior. Along the line, notice that both R_pow and pow give 0^0 as 1. (Just at a guess, C might give 0^-0 as Inf, but I don't know how to test that in R.) Other opinions? John __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel