[Rd] Spelling of "parameter" in summary.nls(..., correlation = TRUE) (PR#8759)
Full_Name: Henric Nilsson Version: 2.3.0 alpha (2006-04-08 r37675) OS: Windows XP SP2 Submission from: (NULL) (212.209.13.15) The text preceeding the correlation matrix in summary.nls(..., correlation = TRUE) has a spelling error: parameter is spelled paraneter. > DNase1 <- subset(DNase, Run == 1) > fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) > summary(fm1DNase1, cor = TRUE) Formula: density ~ SSlogis(log(conc), Asym, xmid, scal) Parameters: Estimate Std. Error t value Pr(>|t|) Asym 2.345180.07815 30.01 2.17e-13 *** xmid 1.483090.08135 18.23 1.22e-10 *** scal 1.041460.03227 32.27 8.51e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.01919 on 13 degrees of freedom Correlation of Paraneter Estimates: Asym xmid xmid 0.99 scal 0.90 0.91 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta
Peter Dalgaard wrote: > I don't see it with a current version either. What happens if you > reduce the optimization level? (I've tried both "-g" and -g "-O3"). > Is that -std=gnu99 bit necessary? My gcc is gcc (GCC) 3.3.5 (Debian 1:3.3.5-13). I've now tried with ./configure CFLAGS="-g [-O|-O2|-O3] [-std=gnu99]", i.e. with every combination from "-g" to "-g -O3 -std=gnu99". The error occured if and only if -O2 or -O3 was used. -- Bjørn-Helge Mevik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta
Since that compiler is not even the last in the 3.3.x series, and there are now three later (released) gcc series, I think we have to write that off to an optimization bug in gcc 3.3.x. On Mon, 10 Apr 2006, Bjørn-Helge Mevik wrote: Peter Dalgaard wrote: I don't see it with a current version either. What happens if you reduce the optimization level? (I've tried both "-g" and -g "-O3"). Is that -std=gnu99 bit necessary? (No, but it helps get fast C99 functions from the OS rather than slow substitutes.) My gcc is gcc (GCC) 3.3.5 (Debian 1:3.3.5-13). I've now tried with ./configure CFLAGS="-g [-O|-O2|-O3] [-std=gnu99]", i.e. with every combination from "-g" to "-g -O3 -std=gnu99". The error occured if and only if -O2 or -O3 was used. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Branch changes at feature freeze
Peter Dalgaard is travelling today, so this is a 'heads up' on the effects of having gone today into feature freeze on 2.3.0. R-devel (the SVN trunk and the tarballs made available from ETHZ) is now labelled '2.4.0 Under development' and will shortly include changes intended for 2.4.0 (and not for 2.3.0). The pre-release code for 2.3.0 is on the SVN branch R-2-3-patches: daily tarballs (now labelled R-beta) remain available from http://cran.r-project.org/src/base-prerelease/ -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should demo files be run as part of R CMD check?
On Fri, 7 Apr 2006, Thomas Lumley wrote: > On Fri, 7 Apr 2006, hadley wickham wrote: > >> I was a bit suprised to note that demo files are not run as part of R >> CMD check. This seems out of keeping with the philosophy of running >> all code contained in the package (in the source, in examples etc). >> >> Should demo files be checked as part of R CMD check? >> > > > The rationale may be that a demo is entitled to assume it is being run > interactively. Checking demo(tkdensity), for example, would be > unproductive. Also, it is easy for a package author to arrange to check the demos by a test in the package's tests directory. The non-interactive demos in the R tarball are checked via 'make check-devel'. Had we been starting for now, we would use the 'tests' mechanism, but on Unix-alikes the standard packages are installed and checked in different ways from contributed ones. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Run package code on R shutdown?
On Sun, 9 Apr 2006, Duncan Murdoch wrote: > I'm sure I've seen this discussed before, but haven't been able to find > it. I'd like some package code to be run when R is shut down > (approximately when a user's .Last function would be run), to clean up > properly. What is the best way to do this? The only way I know to do this is to use a finalizer, as we don't run .Last.lib on shutdown. (That's how RODBC does it.) Now, as I recall this cannot be done from reg.finalizer, only from the C-level R_RegisterCFinalizerEx, which has an optional argument to ensure that the finalizer is run 'onexit'. (I have never understood why we have that restriction, nor why reg.finalizer is primitive and not .Internal.) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setIs and method dispatch in S4 classes
Hi Seth , thank you for your reply. Seth Falcon <[EMAIL PROTECTED]> writes: >Peter Ruckdeschel <[EMAIL PROTECTED]> writes: > > >> ## now: B00 mother class to B01 and B02, and again B02 "contains" B01 by >> setIs: >> setClass("B00", representation(a="numeric")) >> setClass("B01", representation(a="numeric",b="numeric"), contains= "B00") >> setClass("B02", representation(a="numeric",d="numeric"), contains= "B00") >> setIs("B02","B01",coerce=function(obj){new("B01", [EMAIL PROTECTED], [EMAIL >> PROTECTED])}, >>replace=function(obj,value){new("B01", [EMAIL PROTECTED], [EMAIL >> PROTECTED])}) >> >> # now two "+" methods for B00 and B01 >> setMethod("+", signature=c("B00","B00"), function(e1,e2)[EMAIL PROTECTED]@a}) >> setMethod("+", signature=c("B01","B01"), function(e1,e2)[EMAIL PROTECTED]@b}) >> >> x1=new("B02", a=1, d=2) >> x2=new("B02", a=1, d=3) >> >> x1+x2 ## 2 --- why? >> >> > >My impression from reading over the man page for setIs, is that it >isn't intended to be used to override the existing inheritance >hierarchy. It also mentions that the return value is the extension >info as a list, so that could also be useful in understanding what >setIs is doing. Here's the output for your example: > >Slots: > >Name:a d >Class: numeric numeric > >Extends: >Class "B00", directly >Class "B01", directly, with explicit coerce > >Use the contains arg of setClass to define the superclasses. With the >contains arg, the order determines the precedence for method lookup. >But I suspect you know that already. > > Yes, I have been aware of this, thank you. >> Is there a possibility to force usage of the B01 method /without/ >> explicitely coercing x1,x2 to B01, i.e. interfere in the dispatching >> precedence, telling R somehow (by particular arguments for setIs ?) >> to always use the is-relation defined by setIs first before mounting >> the hierarchy tree? >> >> > Perhaps explaining a bit more about what you are trying to accomplish > will allow someone to provide a more helpful suggestion than mine :-) In the "real" context, B00 stands for a class "AbscontDistribution", which implements absolutely continuous (a.c.) distributions. B01 is class "Gammad" which implements Gamma distributions, and B02 is class "Exp" which implements exponential distributions. The method still is "+", but interpreted as convolution. For a.c. distributions, the default method is an FFT-based numerical convolution algorithm, while for Gamma distributions (with the same scale parameter), analytic, hence much more accurate convolution formulas are used. For "Exp", I would tell R that it also 'is' a "Gammad" distribution by a call to setIs and use the "Gammad"-method. Of course, I could also declare explicitly "+" methods for signatures c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in which I would then use as(.) to coerce "Exp" to "Gammad" (and again the same procedure for further Gamma-methods). But, this would create an extra (3 or possibly much more) methods to dispatch, and I doubt whether this really is the preferred solution. > If you know the inheritance structure you want before run-time, then > I'm not seeing why you wouldn't just use the contains arg I do not want to use the "+" method for "B00" for accuracy reasons (see above). The reason why I do not want to implement "B01" ("Gammad") as mother class of "B02" is that (a) the slot structure is not identical --- in the real context Gamma and Exp use different parametrizations --- + rate for "Exp" (cf ?rexp) and + shape for "Gammad" (cf rgamma) (b) also class "Weibull" could be used as mother class to "Exp", and I do not want to decide whether the Weibull or the Gamma is the (more) "legitimate" mother to Exp ;-) I know: 'contains' could be a vector of classes --- c("Gammad", "Weibull") --- but then which would be the correct slot structure for "Exp" the one of "Gammad" or the one of "Weibull" ? My context is a bad example, "Gammad", "Weibull" do have the same slots, but more generally this /is/ an issue... --- So my guess was to rather implement two 'is'-relations ( "Exp" 'is' "Gammad" and "Exp" 'is' "Weibull") declared by 'setIs' , and then on run time let the dispatching mechanism decide whether to use a Gamma or a Weibull method. But maybe there is a better solution ? Any suggestions are welcome. > And if you want to force certain behavior at run-time, then I don't > see what's wrong with an explicit coercion using as(foo, "bar"). If you have two objects E1, E2 of class "Exp" (with the same rate) you (or the user for whom we provide these classes) rather want to call "+" by E1 + E2 than by as(E1, "Gammad") + as(E2,"Gammad") ... Anyway, thank you for your help Peter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Branch changes at feature freeze
On 4/10/2006 5:16 AM, Prof Brian Ripley wrote: > Peter Dalgaard is travelling today, so this is a 'heads up' on the effects > of having gone today into feature freeze on 2.3.0. > > R-devel (the SVN trunk and the tarballs made available from ETHZ) is now > labelled '2.4.0 Under development' and will shortly include changes > intended for 2.4.0 (and not for 2.3.0). > > The pre-release code for 2.3.0 is on the SVN branch R-2-3-patches: > daily tarballs (now labelled R-beta) remain available from > > http://cran.r-project.org/src/base-prerelease/ > For anyone who downloads the Windows builds: the "r-patched" build will stay on the old 2-2-patches branch until the release, and the "r-devel" build will continue to be made from the daily tarballs, now on the R-2-3-patches branch. From now until the release date there won't be any binary builds from the trunk. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta
On 10 April 2006 at 10:06, Prof Brian Ripley wrote: | Since that compiler is not even the last in the 3.3.x series, and there | are now three later (released) gcc series, I think we have to write that | off to an optimization bug in gcc 3.3.x. Fair point, especially as you have to insist on using gcc 3.3.* on Debian: -- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5 -- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6) -- 4.0.3 is the current default -- 4.1.0 is available too That appears to be the same on Debian testing and unstable. Dirk [EMAIL PROTECTED]:~> dpkg -l | grep gcc | cut -c -78 ii gcc 4.0.2-2The GNU C ii gcc-2.95 2.95.4-22 The GNU C ii gcc-3.3 3.3.6-13 The GNU C ii gcc-3.3-base 3.3.6-13 The GNU C ii gcc-3.4 3.4.5-2The GNU C ii gcc-3.4-base 3.4.5-2The GNU C ii gcc-4.0 4.0.3-1The GNU C ii gcc-4.0-base 4.0.3-1The GNU C ii gcc-4.1-base 4.1.0-1The GNU C ii libgcc1 4.1.0-1GCC suppo [EMAIL PROTECTED]:~> -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta
Dirk Eddelbuettel wrote: > Fair point, especially as you have to insist on using gcc 3.3.* on Debian: > -- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5 > -- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6) > -- 4.0.3 is the current default > -- 4.1.0 is available too > > That appears to be the same on Debian testing and unstable. > > Dirk > > [EMAIL PROTECTED]:~> dpkg -l | grep gcc | cut -c -78 > ii gcc 4.0.2-2The GNU C > ii gcc-2.95 2.95.4-22 The GNU C > ii gcc-3.3 3.3.6-13 The GNU C > ii gcc-3.3-base 3.3.6-13 The GNU C > ii gcc-3.4 3.4.5-2The GNU C > ii gcc-3.4-base 3.4.5-2The GNU C > ii gcc-4.0 4.0.3-1The GNU C > ii gcc-4.0-base 4.0.3-1The GNU C > ii gcc-4.1-base 4.1.0-1The GNU C > ii libgcc1 4.1.0-1GCC suppo > [EMAIL PROTECTED]:~> Hmmm... I don't `see' all those versions. After an `aptitude update': 9 (1) $ aptitude search gcc [...] i gcc - The GNU C compiler i gcc-2.95- The GNU C compiler p gcc-2.95-doc- Documentation for the GNU compilers (gcc, v gcc-3.0 - v gcc-3.0-base- v gcc-3.0-doc - v gcc-3.2 - v gcc-3.2-base- v gcc-3.2-doc - i A gcc-3.3 - The GNU C compiler i A gcc-3.3-base- The GNU Compiler Collection (base package p gcc-3.3-doc - Documentation for the GNU compilers (gcc, i gcc-3.4 - The GNU C compiler i A gcc-3.4-base- The GNU Compiler Collection (base package p gcc-3.4-doc - Documentation for the GNU compilers (gcc, [...] The 3.3 is 3.3.5-13, and the 3.4 is 3.4.3-13. My /etc/apt/sources.list is: deb http://ftp.no.debian.org/debian/ sarge main non-free contrib deb-src http://ftp.no.debian.org/debian/ sarge main non-free contrib deb http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free deb-src http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free deb http://security.debian.org/ sarge/updates main contrib non-free Why am I seeing older versions than you? I just installed gcc-3.4, but gcc --version still says 3.3.5. What have I done (probably without knowing it) to `insist on using gcc 3.3.*', and how can I reverse that? (I have no desire to use old compiler versions. :-) -- Bjørn-Helge Mevik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta
On 10 April 2006 at 14:31, Bjørn-Helge Mevik wrote: | Dirk Eddelbuettel wrote: | | > Fair point, especially as you have to insist on using gcc 3.3.* on Debian: | > -- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5 | > -- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6) | > -- 4.0.3 is the current default | > -- 4.1.0 is available too [...] | Hmmm... I don't `see' all those versions. After an `aptitude update': (That didn't show version numbers...) | My /etc/apt/sources.list is: | | deb http://ftp.no.debian.org/debian/ sarge main non-free contrib | deb-src http://ftp.no.debian.org/debian/ sarge main non-free contrib | deb http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free | deb-src http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free | deb http://security.debian.org/ sarge/updates main contrib non-free | | Why am I seeing older versions than you? Because you point to 'sarge' which was frozen and released a year ago. If you want something newer than Debian stable, you have to point to it. This is all off-topic here. Please consider (subscribing and) posting to r-sig-debian for R/Debian related matters, or debian-help for generic Debian questions. Hope this helps, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Example in ?order
Hello! On R Version 2.2.1 (2005-12-20 r36812) and in SVN The following part of the example in ?order says ## For character vectors we can make use of rank: cy <- as.character(y) rbind(x,y,z)[, order(x, -rank(y), z)] But "cy" is not used in there. -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Run package code on R shutdown?
On 4/10/06, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > I'm sure I've seen this discussed before, but haven't been able to find > it. I'd like some package code to be run when R is shut down > (approximately when a user's .Last function would be run), to clean up > properly. What is the best way to do this? I tried to do this some time ago. My conclusion then is that it cannot be done with a guarantee, because R can exit in different ways. I implemented what I had an came up with an onSessionExit() method available in R.utils. Check that out for a start. It modifies .Last(), but that can be circumvented by quit(callLast=FALSE). /Henrik > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Henrik Bengtsson Mobile: +46 708 909208 (+2h UTC) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should demo files be run as part of R CMD check?
> > The rationale may be that a demo is entitled to assume it is being run > > interactively. Checking demo(tkdensity), for example, would be > > unproductive. > > Also, it is easy for a package author to arrange to check the demos by a > test in the package's tests directory. Thanks for your comments - I hadn't considered the case of interactive demos, and as you say it is easy enough to add these checks by using a test in the tests directory.Would it be helpful to provide a short note to this effect in writing R extensions? I would be happy to provide a diff against the latest source. Hadley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Run package code on R shutdown?
On 4/10/2006 6:16 AM, Prof Brian Ripley wrote: > On Sun, 9 Apr 2006, Duncan Murdoch wrote: > >> I'm sure I've seen this discussed before, but haven't been able to find >> it. I'd like some package code to be run when R is shut down >> (approximately when a user's .Last function would be run), to clean up >> properly. What is the best way to do this? > > The only way I know to do this is to use a finalizer, as we don't run > .Last.lib on shutdown. (That's how RODBC does it.) > > Now, as I recall this cannot be done from reg.finalizer, only from the > C-level R_RegisterCFinalizerEx, which has an optional argument to ensure > that the finalizer is run 'onexit'. (I have never understood why we have > that restriction, nor why reg.finalizer is primitive and not .Internal.) Thanks! Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] get(name, envir=envir) : formal argument "envir" matched by multiple actual arguments
Hi, very sporadic and non-reproducible, I get the following type of errors: Error in get(name, envir = envir) : formal argument "envir" matched by multiple actual arguments Error in exists(cacheName, envir = envir, inherit = FALSE) : formal argument "envir" matched by multiple actual arguments Error in paste(..., sep = sep) : formal argument "sep" matched by multiple actual arguments I cannot see how these errors can occur. Note, in the third example "..." does not contain a 'sep' (or an argument with the same prefix). The thing is that it does not happen all the time and if I just re-run my code it works fine again. What I can remember, I've seen this since about R v2.0.0 or so. My current version is Rv 2.3.0 alpha (2006-04-02 r37626) on WinXP. It has been to rare to be able to troubleshoot it and I cannot reproduce it more than running a script for hours. If I rename the variable to say, envir2 <- envir get(name, envir=envir2) the problem seems to go away, i.e. it is not frequent enough to observe it. Has anyone else seen this? /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setIs and method dispatch in S4 classes
From your description of the application, it sounds like you would be better off just forcing "+" to behave as you want. Using inheritance is a much more powerful mechanism & can introduce results you don't want, as it seems to have in this case. An important point about using inheritance is that the subclass is a asserted to be substitutable for the superclass for ALL purposes. This applies whether using "contains=" or setIs(). When the focus is on a particular function, it's usually better to implement methods for that function, maybe along with setAs() methods--not setIs(). It seems likely that such a solution would be cleaner in design, not to mention that it would likely work. (see also suggestion below) Peter Ruckdeschel wrote: >Hi Seth , > >thank you for your reply. > >Seth Falcon <[EMAIL PROTECTED]> writes: > > > >>Peter Ruckdeschel <[EMAIL PROTECTED]> writes: >> >> >> >> >>>## now: B00 mother class to B01 and B02, and again B02 "contains" B01 by >>>setIs: >>>setClass("B00", representation(a="numeric")) >>>setClass("B01", representation(a="numeric",b="numeric"), contains= "B00") >>>setClass("B02", representation(a="numeric",d="numeric"), contains= "B00") >>>setIs("B02","B01",coerce=function(obj){new("B01", [EMAIL PROTECTED], [EMAIL >>>PROTECTED])}, >>> replace=function(obj,value){new("B01", [EMAIL PROTECTED], [EMAIL >>> PROTECTED])}) >>> >>># now two "+" methods for B00 and B01 >>>setMethod("+", signature=c("B00","B00"), function(e1,e2)[EMAIL PROTECTED]@a}) >>>setMethod("+", signature=c("B01","B01"), function(e1,e2)[EMAIL PROTECTED]@b}) >>> >>>x1=new("B02", a=1, d=2) >>>x2=new("B02", a=1, d=3) >>> >>>x1+x2 ## 2 --- why? >>> >>> >>> >>> >>My impression from reading over the man page for setIs, is that it >>isn't intended to be used to override the existing inheritance >>hierarchy. It also mentions that the return value is the extension >>info as a list, so that could also be useful in understanding what >>setIs is doing. Here's the output for your example: >> >> Slots: >> >> Name:a d >> Class: numeric numeric >> >> Extends: >> Class "B00", directly >> Class "B01", directly, with explicit coerce >> >>Use the contains arg of setClass to define the superclasses. With the >>contains arg, the order determines the precedence for method lookup. >>But I suspect you know that already. >> >> >> >> >Yes, I have been aware of this, thank you. > > > >>>Is there a possibility to force usage of the B01 method /without/ >>>explicitely coercing x1,x2 to B01, i.e. interfere in the dispatching >>>precedence, telling R somehow (by particular arguments for setIs ?) >>>to always use the is-relation defined by setIs first before mounting >>>the hierarchy tree? >>> >>> >>> >>> >>Perhaps explaining a bit more about what you are trying to accomplish >>will allow someone to provide a more helpful suggestion than mine :-) >> >> > >In the "real" context, B00 stands for a class "AbscontDistribution", >which implements absolutely continuous (a.c.) distributions. B01 is >class "Gammad" which implements Gamma distributions, and B02 is >class "Exp" which implements exponential distributions. The method >still is "+", but interpreted as convolution. > >For a.c. distributions, the default method is an FFT-based numerical >convolution algorithm, while for Gamma distributions (with the same > scale parameter), analytic, hence much more accurate convolution >formulas are used. For "Exp", I would tell R that it also 'is' a "Gammad" >distribution by a call to setIs and use the "Gammad"-method. > >Of course, I could also declare explicitly "+" methods for signatures >c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in >which I would then use as(.) to coerce "Exp" to "Gammad" >(and again the same procedure for further Gamma-methods). > >But, this would create an extra (3 or possibly much more) methods >to dispatch, and I doubt whether this really is the preferred >solution. > > Why not? And you can avoid some of the extra methods by defining a virtual class that is the union of the classes for which you want the new methods. Something like (untested code!) setClassUnion("analyticConvolution", c("Exp", "Gammad")) setMethod("+", c("analyticConvolution", "analyticConvolution"), ) > > >>If you know the inheritance structure you want before run-time, then >>I'm not seeing why you wouldn't just use the contains arg >> >> > >I do not want to use the "+" method for "B00" for accuracy reasons >(see above). > >The reason why I do not want to implement "B01" ("Gammad") >as mother class of "B02" is that > >(a) the slot structure is not identical --- in the real context Gamma >and Exp use different parametrizations --- > + rate for "Exp" (cf ?rexp) and > + shape for "Gammad" (cf rgamma) > >(b) also class "Weibull" could be used as mother class to "Exp", >and I do not want to decide whether the
Re: [Rd] setIs and method dispatch in S4 classes
Hi John, I found your comments helpful, even though this isn't _my_ question. But now I have one of my own :-) John Chambers <[EMAIL PROTECTED]> writes: >>Of course, I could also declare explicitly "+" methods for signatures >>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in >>which I would then use as(.) to coerce "Exp" to "Gammad" >> (and again the same procedure for further Gamma-methods). >> >>But, this would create an extra (3 or possibly much more) methods >>to dispatch, and I doubt whether this really is the preferred >>solution. >> >> > Why not? And you can avoid some of the extra methods by defining a > virtual class that is the union of the classes for which you want the > new methods. > > Something like (untested code!) > > setClassUnion("analyticConvolution", c("Exp", "Gammad")) > setMethod("+", c("analyticConvolution", "analyticConvolution"), > ) Why class union here and not an abstract superclass? If you "own" the Exp and Gammad classes, would an abstract superclass work as well? I think so. However, if you don't own the Exp and Gammad classes, I can see that the class union approach allows you the flexibility of defining a superclass post-hoc. I guess I have the sense that class unions are fancy/tricky (a number of popular languages don't have that concept, AFAIK). That isn't a reason not to use them in a langauge that does support them, of course. It is an interesting design question. On the one hand, one could argue for abstract superclasses when possible because they are "less tricky" (and you need them when you want to share slots). On the other hand, the class union approach provides a more loosely coupled design since members of the union don't have to know about each other. Hmm, I think I understand class unions a lot better already. Thanks. If I'm terribly off-track, please let me know. + seth __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Suggestions to speed up median() and has.na()
Hi, I've got two suggestions how to speed up median() about 50%. For all iterative methods calling median() in the loops this has a major impact. The second suggestion will apply to other methods too. This is what the functions look like today: > median function (x, na.rm = FALSE) { if (is.factor(x) || mode(x) != "numeric") stop("need numeric data") if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(NA) n <- length(x) if (n == 0) return(NA) half <- (n + 1)/2 if (n%%2 == 1) { sort(x, partial = half)[half] } else { sum(sort(x, partial = c(half, half + 1))[c(half, half + 1)])/2 } } Suggestion 1: Replace the sort() calls with the .Internal(psort(x, partial)). This will avoid unnecessary overhead, especially an expensive second check for NAs using any(is.na(x)). Simple benchmarking with x <- rnorm(10e6) system.time(median(x))/system.time(median2(x)) where median2() is the function with the above replacements, gives about 20-25% speed up. Suggestion 2: Create a has.na(x) function to replace any(is.na(x)) that returns TRUE as soon as a NA value is detected. In the best case it returns after the first index with TRUE, in the worst case it returns after the last index N with FALSE. The cost for is.na(x) is always O(N), and any() in the best case O(1) and in the worst case O(N) (if any() is implemented as I hope). An has.na() function would be very useful elsewhere too. An poor mans alternative to (2), is to have a third alternative to 'na.rm', say, NA, which indicates that we know that there are no NAs in 'x'. The original median() is approx 50% slower (naive benchmarking) than a version with the above two improvements, if passing a large 'x' with no NAs; median2 <- function (x, na.rm = FALSE) { if (is.factor(x) || mode(x) != "numeric") stop("need numeric data") if (is.na(na.rm)) { } else if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(NA) n <- length(x) if (n == 0) return(NA) half <- (n + 1)/2 if (n%%2 == 1) { .Internal(psort(x, half))[half] } else { sum(.Internal(psort(x, c(half, half + 1)))[c(half, half + 1)])/2 } } x <- rnorm(10e5) K <- 10 t0 <- system.time({ for (kk in 1:K) y <- median(x); }) print(t0) # [1] 1.82 0.14 1.98 NA NA t1 <- system.time({ for (kk in 1:K) y <- median2(x, na.rm=NA); }) print(t1) # [1] 1.25 0.06 1.34 NA NA print(t0/t1) # [1] 1.456000 2.33 1.477612 NA NA BTW, without having checked the source code, it looks like is.na() is unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on a vector without NAs. On the other hand, is.na(sum(x)) becomes awfully slow if 'x' contains NAs. /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] install.packages on unix / su (PR#8760)
Full_Name: Thomas Friedrichsmeier Version: R 2.2.1 OS: Debian / Linux Submission from: (NULL) (84.60.123.243) Wishlist item: There is a small problem using intall.packages() (and update.packages()): Typically I want to install packages for system-wide use, not in a user directory. Obviously this does not work without superuser rights. What I would like to be able to do is to specify a "become root" command to use in install.packages (). Probably this would be done using an extra argument to install.packages () and update.packages (): install.packages ([...], install.wrapper=NULL) The argument value I would typically want to supply on my system (running in a KDE Session) would be: install.wrapper="kdesu --" . I.e. I would like to run the R CMD INSTALL command through kdesu. Technically it would basically function like this: Instead of cmd0 <- paste(file.path(R.home("bin"),"R"), "CMD INSTALL") in install.packages (), it would read cmd0 <- paste(install.wrapper, file.path(R.home("bin"),"R"), "CMD INSTALL") This feature would save me a lot of small hazzles. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setIs and method dispatch in S4 classes
Seth Falcon wrote: >Hi John, > >I found your comments helpful, even though this isn't _my_ question. >But now I have one of my own :-) > >John Chambers <[EMAIL PROTECTED]> writes: > > >>>Of course, I could also declare explicitly "+" methods for signatures >>>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in >>>which I would then use as(.) to coerce "Exp" to "Gammad" >>>(and again the same procedure for further Gamma-methods). >>> >>>But, this would create an extra (3 or possibly much more) methods >>>to dispatch, and I doubt whether this really is the preferred >>>solution. >>> >>> >>> >>> >>Why not? And you can avoid some of the extra methods by defining a >>virtual class that is the union of the classes for which you want the >>new methods. >> >>Something like (untested code!) >> >>setClassUnion("analyticConvolution", c("Exp", "Gammad")) >>setMethod("+", c("analyticConvolution", "analyticConvolution"), >>) >> >> > >Why class union here and not an abstract superclass? > >If you "own" the Exp and Gammad classes, would an abstract superclass >work as well? I think so. > > Yes, as is said frequently of a certain other language "There's more than one way to do it" My own feeling is that class unions are a convenient shorthand & clearer than explicitly defining the superclass and then having to establish the inheritance separately for the two subclasses. Although the documentation mentions that they _must_ be used for classes you don't own, that's not their only purpose. Virtual classes (ahem, I assume that's what you meant by "abstract" ;-)) may or may not have slots of their own. Creating a virtual class "analyticConvolution" and doing two setIs() calls would in fact be roughly equivalent to the setClassUnion, but not as clear, IMO. If the superclass was really crucial to the model, that would make it more natural to have it explicitly in the contains= for the individual subclasses. Here, though, it seems more like a computational convenience for a fairly small part of the overall package, so isolating it in a single setClassUnion() call seems more natural. Obviously, a question of taste and style. >However, if you don't own the Exp and Gammad classes, I can see that >the class union approach allows you the flexibility of defining a >superclass post-hoc. > >I guess I have the sense that class unions are fancy/tricky (a number >of popular languages don't have that concept, AFAIK). That isn't a >reason not to use them in a langauge that does support them, of >course. > >It is an interesting design question. On the one hand, one could >argue for abstract superclasses when possible because they are "less >tricky" (and you need them when you want to share slots). On the >other hand, the class union approach provides a more loosely coupled >design since members of the union don't have to know about each other. > >Hmm, I think I understand class unions a lot better already. Thanks. >If I'm terribly off-track, please let me know. > >+ seth > >__ >R-devel@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setIs and method dispatch in S4 classes
Hi Seth and John, Thank you for your helpful responses, >John Chambers <[EMAIL PROTECTED]> writes: >>From your description of the application, it sounds like you would be >>better off just forcing "+" to behave as you want. Using inheritance is >>a much more powerful mechanism & can introduce results you don't want, >>as it seems to have in this case. >> >>An important point about using inheritance is that the subclass is a >>asserted to be substitutable for the superclass for ALL purposes. This >>applies whether using "contains=" or setIs(). I am not sure whether I got the meaning of "substitutable for the superclass for ALL purposes" : In the application I sketched, any Exp(rate = lambda) distribution really /is/ a Gammad(shape = 1, scale = 1/lambda) distribution; so my understanding is that "Exp" is substitutable for "Gammad" for ALL purposes. "Gammad" was not designed to be the motherclass to "Exp" right from the beginning because the same 'is'-relation also applies to "Weibull": any Exp(rate = lambda) distribution /is/ a Weibull(shape = 1, scale = 1/lambda) distribution. Does "substitutable for the superclass for ALL purposes" mean 'without ambiguity' (as might enter through Weibull/Gammad)? >>When the focus is on a particular function, it's usually better to >>implement methods for that function, maybe along with setAs() >>methods--not setIs(). You mean I should not leave the coercion decision up to the dispatching mechanism? >>It seems likely that such a solution would be cleaner in design, not to >>mention that it would likely work. (see also suggestion below) Yes, your indication does work; thank you! >>Peter Ruckdeschel <[EMAIL PROTECTED]> writes: >>>Of course, I could also declare explicitly "+" methods for signatures >>>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in >>>which I would then use as(.) to coerce "Exp" to "Gammad" >>> (and again the same procedure for further Gamma-methods). >>> >>>But, this would create an extra (3 or possibly much more) methods >>>to dispatch, and I doubt whether this really is the preferred >>>solution. >>> >> Why not? It simply did not seem to me elegant to have three calls to setMethod() doing more or less the same thing. I thought that, as elegant as R solutions from the R core are most times, there should be some mechanism to avoid this threefold code---and in fact you indicated how to--- thank you! >> And you can avoid some of the extra methods by defining a >> virtual class that is the union of the classes for which you >> want the new methods. >> >> Something like (untested code!) >> >> setClassUnion("analyticConvolution", c("Exp", "Gammad")) >> setMethod("+", c("analyticConvolution", "analyticConvolution"), >> ) Seth Falcon <[EMAIL PROTECTED]> writes: > Why class union here and not an abstract superclass? Am I right: the class generated by setClassUnion() does not enter the inheritance tree / mechanism? setClassUnion()---at least in my case---solves the problem; thank you again. [snip] Peter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages on unix / su (PR#8760)
On 10 April 2006 at 21:14, [EMAIL PROTECTED] wrote: | Full_Name: Thomas Friedrichsmeier | Version: R 2.2.1 | OS: Debian / Linux | Submission from: (NULL) (84.60.123.243) | | | Wishlist item: | | There is a small problem using intall.packages() (and update.packages()): | Typically I want to install packages for system-wide use, not in a user | directory. Obviously this does not work without superuser rights. One can see this problem as a local system management issue for which another possible answer is to add you (and/or the user users installing R packages) to, say, group 'admin' and to make /usr/local/lib/R of group admin and group-writeable. Or create a custom group radmin. Or ... Dirk | What I would like to be able to do is to specify a "become root" command to use | in install.packages (). Probably this would be done using an extra argument to | install.packages () and update.packages (): | | install.packages ([...], install.wrapper=NULL) | | The argument value I would typically want to supply on my system (running in a | KDE Session) would be: install.wrapper="kdesu --" . I.e. I would like to run the | R CMD INSTALL command through kdesu. | | Technically it would basically function like this: | | Instead of | | cmd0 <- paste(file.path(R.home("bin"),"R"), "CMD INSTALL") | | in install.packages (), it would read | | cmd0 <- paste(install.wrapper, file.path(R.home("bin"),"R"), "CMD INSTALL") | | This feature would save me a lot of small hazzles. | | __ | R-devel@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages on unix / su (PR#8760)
> | Wishlist item: > | > | There is a small problem using intall.packages() (and update.packages()): > | Typically I want to install packages for system-wide use, not in a user > | directory. Obviously this does not work without superuser rights. > > One can see this problem as a local system management issue for which > another possible answer is to add you (and/or the user users installing R > packages) to, say, group 'admin' and to make /usr/local/lib/R of group > admin and group-writeable. Or create a custom group radmin. Or ... It's about convenience, no more, no less, and so it's a wishlist item, no more, and no less. I don't think the case of a non-root user working on a de-facto single user system is too uncommon on linux. It's why tools like kdesu exist in the first place. Unless there are strong reasons not to (and there may well be), I think adding some convenience option for this particular case may well be worth while. Regards Thomas pgpsRa2e77DYM.pgp Description: PGP signature __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestions to speed up median() and has.na()
On Mon, 10 Apr 2006, Henrik Bengtsson wrote: > Hi, > > I've got two suggestions how to speed up median() about 50%. For all > iterative methods calling median() in the loops this has a major > impact. The second suggestion will apply to other methods too. I'm surprised this has a major impact -- in your example it takes much longer to generate the ten million numbers than to find the median. > Suggestion 1: > Replace the sort() calls with the .Internal(psort(x, partial)). This > will avoid unnecessary overhead, especially an expensive second check > for NAs using any(is.na(x)). Simple benchmarking with > > x <- rnorm(10e6) > system.time(median(x))/system.time(median2(x)) > > where median2() is the function with the above replacements, gives > about 20-25% speed up. There's something that seems a bit undesirable about having median() call the .Internal function for sort(). > Suggestion 2: > Create a has.na(x) function to replace any(is.na(x)) that returns TRUE > as soon as a NA value is detected. In the best case it returns after > the first index with TRUE, in the worst case it returns after the last > index N with FALSE. The cost for is.na(x) is always O(N), and any() > in the best case O(1) and in the worst case O(N) (if any() is > implemented as I hope). An has.na() function would be very useful > elsewhere too. This sounds useful (though it has missed the deadline for 2.3.0). It won't help if the typical case is no missing values, as you suggest, but it will be faster when there are missing values. > BTW, without having checked the source code, it looks like is.na() is > unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on > a vector without NAs. On the other hand, is.na(sum(x)) becomes > awfully slow if 'x' contains NAs. > I don't think it is unnecessarily slow. It has to dispatch methods and it has to make sure that matrix structure is preserved. After that the code is just case REALSXP: for (i = 0; i < n; i++) LOGICAL(ans)[i] = ISNAN(REAL(x)[i]); break; and it's hard to see how that can be improved. It does suggest that a faster anyNA() function would have to not be generic. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestions to speed up median() and has.na()
On Mon, 10 Apr 2006, Thomas Lumley wrote: > On Mon, 10 Apr 2006, Henrik Bengtsson wrote: > > > Hi, > > > > I've got two suggestions how to speed up median() about 50%. For all > > iterative methods calling median() in the loops this has a major > > impact. The second suggestion will apply to other methods too. > > > Suggestion 2: > > Create a has.na(x) function to replace any(is.na(x)) that returns TRUE > > as soon as a NA value is detected. In the best case it returns after > > the first index with TRUE, in the worst case it returns after the last > > index N with FALSE. The cost for is.na(x) is always O(N), and any() > > in the best case O(1) and in the worst case O(N) (if any() is > > implemented as I hope). An has.na() function would be very useful > > elsewhere too. > > This sounds useful (though it has missed the deadline for 2.3.0). > > It won't help if the typical case is no missing values, as you suggest, > but it will be faster when there are missing values. Splus has such a function, but it is called anyMissing(). In the interests of interoperability it would be nice if R used that name. (I did not choose the name, but that is what it is.) The following experiment using Splus seems to indicate the speedup has less to do with stopping at the first NA than it does with not making/filling/copying/whatever the big vector of logicals that is.na returns. > # NA near start of list of 10 million integers > { z<-replace(1:1e7,2,NA); unix.time(anyMissing(z)) } [1] 0 0 0 0 0 > { z<-replace(1:1e7,2,NA); unix.time(any(is.na(z)))} [1] 0.62 0.13 0.75 0.00 0.00 > # NA at end of list > { z<-replace(1:1e7,1e7,NA); unix.time(anyMissing(z)) } [1] 0.07 0.00 0.07 0.00 0.00 > { z<-replace(1:1e7,1e7,NA); unix.time(any(is.na(z)))} [1] 0.64 0.11 0.75 0.00 0.00 The Splus anyMissing is an s3 generic (i.e., it calls UseMethod()). The Splus is.na is an s4 generic and its default method may invoke an s3 generic. > > BTW, without having checked the source code, it looks like is.na() is > > unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on > > a vector without NAs. On the other hand, is.na(sum(x)) becomes > > awfully slow if 'x' contains NAs. > > > > I don't think it is unnecessarily slow. It has to dispatch methods and > it has to make sure that matrix structure is preserved. After that the > code is just > > case REALSXP: > for (i = 0; i < n; i++) > LOGICAL(ans)[i] = ISNAN(REAL(x)[i]); > break; > > and it's hard to see how that can be improved. It does suggest that a > faster anyNA() function would have to not be generic. Bill Dunlap Insightful Corporation bill at insightful dot com 360-428-8146 "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestions to speed up median() and has.na()
On 4/10/2006 7:22 PM, Thomas Lumley wrote: > On Mon, 10 Apr 2006, Henrik Bengtsson wrote: > >> Hi, >> >> I've got two suggestions how to speed up median() about 50%. For all >> iterative methods calling median() in the loops this has a major >> impact. The second suggestion will apply to other methods too. > > I'm surprised this has a major impact -- in your example it takes much > longer to generate the ten million numbers than to find the median. > >> Suggestion 1: >> Replace the sort() calls with the .Internal(psort(x, partial)). This >> will avoid unnecessary overhead, especially an expensive second check >> for NAs using any(is.na(x)). Simple benchmarking with >> >> x <- rnorm(10e6) >> system.time(median(x))/system.time(median2(x)) >> >> where median2() is the function with the above replacements, gives >> about 20-25% speed up. > > There's something that seems a bit undesirable about having median() call > the .Internal function for sort(). > >> Suggestion 2: >> Create a has.na(x) function to replace any(is.na(x)) that returns TRUE >> as soon as a NA value is detected. In the best case it returns after >> the first index with TRUE, in the worst case it returns after the last >> index N with FALSE. The cost for is.na(x) is always O(N), and any() >> in the best case O(1) and in the worst case O(N) (if any() is >> implemented as I hope). An has.na() function would be very useful >> elsewhere too. > > This sounds useful (though it has missed the deadline for 2.3.0). > > It won't help if the typical case is no missing values, as you suggest, > but it will be faster when there are missing values. I think it would help even in that case if the vector is large, because it avoids allocating and disposing of the logical vector of the same length as x. >> BTW, without having checked the source code, it looks like is.na() is >> unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on >> a vector without NAs. On the other hand, is.na(sum(x)) becomes >> awfully slow if 'x' contains NAs. >> > > I don't think it is unnecessarily slow. It has to dispatch methods and > it has to make sure that matrix structure is preserved. After that the > code is just > > case REALSXP: > for (i = 0; i < n; i++) > LOGICAL(ans)[i] = ISNAN(REAL(x)[i]); > break; > > and it's hard to see how that can be improved. It does suggest that a > faster anyNA() function would have to not be generic. If it's necessary to make it not generic to achieve the speedup, I don't think it's worth doing. If anyNA is written not to be generic I'd guess a very common error will be to apply it to a dataframe and get a misleading "FALSE" answer. If we do that, I predict that the total amount of r-help time wasted on it will exceed the CPU time saved by orders of magnitude. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestions to speed up median() and has.na()
On Mon, 10 Apr 2006, Duncan Murdoch wrote: > On 4/10/2006 7:22 PM, Thomas Lumley wrote: >> On Mon, 10 Apr 2006, Henrik Bengtsson wrote: >> >>> Suggestion 2: >>> Create a has.na(x) function to replace any(is.na(x)) that returns TRUE >>> as soon as a NA value is detected. In the best case it returns after >>> the first index with TRUE, in the worst case it returns after the last >>> index N with FALSE. The cost for is.na(x) is always O(N), and any() >>> in the best case O(1) and in the worst case O(N) (if any() is >>> implemented as I hope). An has.na() function would be very useful >>> elsewhere too. >> >> This sounds useful (though it has missed the deadline for 2.3.0). >> >> It won't help if the typical case is no missing values, as you suggest, but >> it will be faster when there are missing values. > > I think it would help even in that case if the vector is large, because it > avoids allocating and disposing of the logical vector of the same length as > x. That makes sense. I have just tried, and for vectors of length ten million it does make a measurable difference. >>> BTW, without having checked the source code, it looks like is.na() is >>> unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on >>> a vector without NAs. On the other hand, is.na(sum(x)) becomes >>> awfully slow if 'x' contains NAs. >>> >> >> I don't think it is unnecessarily slow. It has to dispatch methods and it >> has to make sure that matrix structure is preserved. After that the code >> is just >> >> case REALSXP: >> for (i = 0; i < n; i++) >> LOGICAL(ans)[i] = ISNAN(REAL(x)[i]); >> break; >> >> and it's hard to see how that can be improved. It does suggest that a >> faster anyNA() function would have to not be generic. > > If it's necessary to make it not generic to achieve the speedup, I don't > think it's worth doing. If anyNA is written not to be generic I'd guess a > very common error will be to apply it to a dataframe and get a misleading > "FALSE" answer. If we do that, I predict that the total amount of r-help > time wasted on it will exceed the CPU time saved by orders of magnitude. > I wasn't proposing that it should be stupid, just not generic. It could support data frames (sum(), does, for example). If it didn't support data frames it should certainly give an error rather than the wrong answer, but if we are seriously trying to avoid delays around 0.1 seconds then going through the generic function mechanism may be a problem. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestions to speed up median() and has.na()
On 4/10/2006 8:08 PM, Thomas Lumley wrote: > On Mon, 10 Apr 2006, Duncan Murdoch wrote: > >> On 4/10/2006 7:22 PM, Thomas Lumley wrote: >>> On Mon, 10 Apr 2006, Henrik Bengtsson wrote: >>> Suggestion 2: Create a has.na(x) function to replace any(is.na(x)) that returns TRUE as soon as a NA value is detected. In the best case it returns after the first index with TRUE, in the worst case it returns after the last index N with FALSE. The cost for is.na(x) is always O(N), and any() in the best case O(1) and in the worst case O(N) (if any() is implemented as I hope). An has.na() function would be very useful elsewhere too. >>> This sounds useful (though it has missed the deadline for 2.3.0). >>> >>> It won't help if the typical case is no missing values, as you suggest, but >>> it will be faster when there are missing values. >> I think it would help even in that case if the vector is large, because it >> avoids allocating and disposing of the logical vector of the same length as >> x. > > That makes sense. I have just tried, and for vectors of length ten > million it does make a measurable difference. > > BTW, without having checked the source code, it looks like is.na() is unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on a vector without NAs. On the other hand, is.na(sum(x)) becomes awfully slow if 'x' contains NAs. >>> I don't think it is unnecessarily slow. It has to dispatch methods and it >>> has to make sure that matrix structure is preserved. After that the code >>> is just >>> >>> case REALSXP: >>> for (i = 0; i < n; i++) >>> LOGICAL(ans)[i] = ISNAN(REAL(x)[i]); >>> break; >>> >>> and it's hard to see how that can be improved. It does suggest that a >>> faster anyNA() function would have to not be generic. >> If it's necessary to make it not generic to achieve the speedup, I don't >> think it's worth doing. If anyNA is written not to be generic I'd guess a >> very common error will be to apply it to a dataframe and get a misleading >> "FALSE" answer. If we do that, I predict that the total amount of r-help >> time wasted on it will exceed the CPU time saved by orders of magnitude. >> > > I wasn't proposing that it should be stupid, just not generic. It could > support data frames (sum(), does, for example). If it didn't support data > frames it should certainly give an error rather than the wrong answer, but > if we are seriously trying to avoid delays around 0.1 seconds then going > through the generic function mechanism may be a problem. If it's not dataframes, it will be something else. I think it's highly desirable that any(is.na(x)) == anyNA(x) within base packages, and we should make it straightforward to maintain this identity in contributed packages. By the way, I think Bill's suggestion of calling it anyMissing makes a lot of sense. Duncan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] eapply() fails on baseenv() (PR#8761)
eapply() works on most environments, but not on baseenv(). For example, > x <- 1 > eapply(globalenv(), function(x) x) $x [1] 1 > eapply(baseenv(), function(x) x) list() I'm probably not going to have time to work on this before 2.3.0, but I don't think it's really urgent; if no one else fixes it first I'll do it after the release. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] install.packages on unix / su (PR#8760)
On Mon, 10 Apr 2006, Thomas Friedrichsmeier wrote: >> | Wishlist item: >> | >> | There is a small problem using intall.packages() (and update.packages()): >> | Typically I want to install packages for system-wide use, not in a user >> | directory. Obviously this does not work without superuser rights. [From a reply I started last night.] Not obvious at all, especially to those of us who do it all the time. Many of us set up an account to `own' R, and either install under that account or change the ownership of the library directory to that account. I think what you suggest is quite dangerous, as different directories may be visible to the user account producing the summary information and to root. Then update.packages() (run by you) and R CMD INSTALL (run by root) may do different things. This could apply both within a library directory (root might have installed a later version of a package not readable by you) and over different library trees (my personal R library is not readable by root, and indeed the main R library tree is not readable by root on our student's machines). Quoting someone else (without attribution, a breach of copyright) >> One can see this problem as a local system management issue for which >> another possible answer is to add you (and/or the user users installing R >> packages) to, say, group 'admin' and to make /usr/local/lib/R of group >> admin and group-writeable. Or create a custom group radmin. Or ... > > It's about convenience, no more, no less, and so it's a wishlist item, no > more, and no less. > I don't think the case of a non-root user working on a de-facto single user > system is too uncommon on linux. It's why tools like kdesu exist in the first > place. Unless there are strong reasons not to (and there may well be), I > think adding some convenience option for this particular case may well be > worth while. See the `strong reason' above. Two of us have suggested better solutions. If you want yours, you can of course patch your installation, the beauty of Open Source. But unless you can find an R-core member who is prepared to maintain your solution, it will not be going into R. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel