[Rd] Spelling of "parameter" in summary.nls(..., correlation = TRUE) (PR#8759)

2006-04-10 Thread henric . nilsson
Full_Name: Henric Nilsson
Version: 2.3.0 alpha (2006-04-08 r37675)
OS: Windows XP SP2
Submission from: (NULL) (212.209.13.15)


The text preceeding the correlation matrix in summary.nls(..., correlation =
TRUE) has a spelling error: parameter is spelled paraneter.

> DNase1 <- subset(DNase, Run == 1)
> fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1)
> summary(fm1DNase1, cor = TRUE)

Formula: density ~ SSlogis(log(conc), Asym, xmid, scal)

Parameters:
 Estimate Std. Error t value Pr(>|t|)
Asym  2.345180.07815   30.01 2.17e-13 ***
xmid  1.483090.08135   18.23 1.22e-10 ***
scal  1.041460.03227   32.27 8.51e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.01919 on 13 degrees of freedom

Correlation of Paraneter Estimates:
 Asym xmid
xmid 0.99 
scal 0.90 0.91

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta

2006-04-10 Thread Bjørn-Helge Mevik
Peter Dalgaard wrote:

> I don't see it with a current version either. What happens if you
> reduce the optimization level? (I've tried both "-g" and -g "-O3").
> Is that -std=gnu99 bit necessary?

My gcc is gcc (GCC) 3.3.5 (Debian 1:3.3.5-13).

I've now tried with ./configure CFLAGS="-g [-O|-O2|-O3] [-std=gnu99]",
i.e. with every combination from "-g" to "-g -O3 -std=gnu99".  The
error occured if and only if -O2 or -O3 was used.

-- 
Bjørn-Helge Mevik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta

2006-04-10 Thread Prof Brian Ripley
Since that compiler is not even the last in the 3.3.x series, and there 
are now three later (released) gcc series, I think we have to write that 
off to an optimization bug in gcc 3.3.x.


On Mon, 10 Apr 2006, Bjørn-Helge Mevik wrote:


Peter Dalgaard wrote:


I don't see it with a current version either. What happens if you
reduce the optimization level? (I've tried both "-g" and -g "-O3").
Is that -std=gnu99 bit necessary?


(No, but it helps get fast C99 functions from the OS rather than slow 
substitutes.)



My gcc is gcc (GCC) 3.3.5 (Debian 1:3.3.5-13).

I've now tried with ./configure CFLAGS="-g [-O|-O2|-O3] [-std=gnu99]",
i.e. with every combination from "-g" to "-g -O3 -std=gnu99".  The
error occured if and only if -O2 or -O3 was used.




--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Branch changes at feature freeze

2006-04-10 Thread Prof Brian Ripley
Peter Dalgaard is travelling today, so this is a 'heads up' on the effects 
of having gone today into feature freeze on 2.3.0.

R-devel (the SVN trunk and the tarballs made available from ETHZ) is now 
labelled '2.4.0 Under development' and will shortly include changes 
intended for 2.4.0 (and not for 2.3.0).

The pre-release code for 2.3.0 is on the SVN branch R-2-3-patches:
daily tarballs (now labelled R-beta) remain available from

http://cran.r-project.org/src/base-prerelease/

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should demo files be run as part of R CMD check?

2006-04-10 Thread Prof Brian Ripley
On Fri, 7 Apr 2006, Thomas Lumley wrote:

> On Fri, 7 Apr 2006, hadley wickham wrote:
>
>> I was a bit suprised to note that demo files are not run as part of R
>> CMD check.  This seems out of keeping with the philosophy of running
>> all code contained in the package (in the source, in examples etc).
>>
>> Should demo files be checked as part of R CMD check?
>>
>
>
> The rationale may be that a demo is entitled to assume it is being run
> interactively.  Checking demo(tkdensity), for example, would be
> unproductive.

Also, it is easy for a package author to arrange to check the demos by a 
test in the package's tests directory.

The non-interactive demos in the R tarball are checked via 'make 
check-devel'.  Had we been starting for now, we would use the 'tests' 
mechanism, but on Unix-alikes the standard packages are installed and 
checked in different ways from contributed ones.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Run package code on R shutdown?

2006-04-10 Thread Prof Brian Ripley
On Sun, 9 Apr 2006, Duncan Murdoch wrote:

> I'm sure I've seen this discussed before, but haven't been able to find
> it.  I'd like some package code to be run when R is shut down
> (approximately when a user's .Last function would be run), to clean up
> properly.  What is the best way to do this?

The only way I know to do this is to use a finalizer, as we don't run
.Last.lib on shutdown.  (That's how RODBC does it.)

Now, as I recall this cannot be done from reg.finalizer, only from the 
C-level R_RegisterCFinalizerEx, which has an optional argument to ensure 
that the finalizer is run 'onexit'.   (I have never understood why we have 
that restriction, nor why reg.finalizer is primitive and not .Internal.)

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setIs and method dispatch in S4 classes

2006-04-10 Thread Peter Ruckdeschel
Hi Seth ,

thank you for your reply.

Seth Falcon  <[EMAIL PROTECTED]> writes:

>Peter Ruckdeschel <[EMAIL PROTECTED]> writes:
>  
>
>> ## now: B00 mother class to B01 and B02, and again B02 "contains" B01 by
>> setIs:
>> setClass("B00", representation(a="numeric"))
>> setClass("B01", representation(a="numeric",b="numeric"), contains= "B00")
>> setClass("B02", representation(a="numeric",d="numeric"), contains= "B00")
>> setIs("B02","B01",coerce=function(obj){new("B01", [EMAIL PROTECTED], [EMAIL 
>> PROTECTED])},
>>replace=function(obj,value){new("B01", [EMAIL PROTECTED], [EMAIL 
>> PROTECTED])})
>>
>> # now two "+" methods  for B00 and B01
>> setMethod("+", signature=c("B00","B00"), function(e1,e2)[EMAIL PROTECTED]@a})
>> setMethod("+", signature=c("B01","B01"), function(e1,e2)[EMAIL PROTECTED]@b})
>>
>> x1=new("B02", a=1, d=2)
>> x2=new("B02", a=1, d=3)
>>
>> x1+x2 ## 2 --- why?
>>
>>
>
>My impression from reading over the man page for setIs, is that it
>isn't intended to be used to override the existing inheritance
>hierarchy.  It also mentions that the return value is the extension
>info as a list, so that could also be useful in understanding what
>setIs is doing.  Here's the output for your example:
>
>Slots:
>  
>Name:a   d
>Class: numeric numeric
>
>Extends: 
>Class "B00", directly
>Class "B01", directly, with explicit coerce
>
>Use the contains arg of setClass to define the superclasses.  With the
>contains arg, the order determines the precedence for method lookup.
>But I suspect you know that already.  
>  
>
Yes, I have been aware of this, thank you.

>> Is there a possibility to force usage of the B01 method /without/
>> explicitely coercing x1,x2 to B01, i.e. interfere in the dispatching 
>> precedence, telling R somehow  (by particular arguments for setIs ?)  
>> to always use the is-relation defined by setIs first before mounting 
>> the hierarchy tree?
>>
>>
> Perhaps explaining a bit more about what you are trying to accomplish
> will allow someone to provide a more helpful suggestion than mine :-)

In the "real" context, B00 stands for a class "AbscontDistribution",
which implements absolutely continuous (a.c.) distributions. B01 is
class "Gammad" which implements Gamma distributions, and B02 is
class "Exp" which implements exponential distributions. The method
still is "+", but interpreted as convolution.

For  a.c. distributions, the default method is an FFT-based numerical
convolution algorithm, while for Gamma distributions (with the same
 scale parameter), analytic, hence much more accurate convolution
formulas are used. For "Exp", I would tell R that it also 'is' a "Gammad"
distribution by a call to setIs and use the "Gammad"-method.

Of course, I could also declare explicitly "+" methods for signatures
c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp")  in
which I would then use as(.) to coerce "Exp" to "Gammad"
(and again the same procedure for further Gamma-methods). 

But, this would create an extra (3 or possibly much more) methods
to dispatch, and I doubt whether this really is the preferred
solution.

> If you know the inheritance structure you want before run-time, then
> I'm not seeing why you wouldn't just use the contains arg

I do not want to use the "+"  method for "B00" for accuracy reasons
(see above).

The reason why I do not want to implement "B01" ("Gammad")
as mother class of "B02" is that

(a) the slot structure is not identical --- in the real context Gamma
and Exp use different parametrizations ---
 + rate for "Exp" (cf ?rexp) and
 + shape for "Gammad" (cf rgamma)

(b) also class "Weibull" could be used as mother class to "Exp",
and I do not want to decide whether the Weibull or the
Gamma is the (more) "legitimate" mother to Exp ;-) 

I know: 'contains' could be a vector of classes ---
c("Gammad", "Weibull")  --- but then which would be
the  correct slot structure for "Exp" the one of "Gammad"
or the one of "Weibull" ?
My context is a bad example, "Gammad", "Weibull"
do have the same slots, but more generally this /is/ an issue...
 
--- So my guess was to rather implement two 'is'-relations
( "Exp" 'is' "Gammad"  and  "Exp" 'is' "Weibull")
declared by 'setIs' , and then on run time let the
dispatching mechanism decide whether to use
a Gamma or a Weibull method.

But maybe there is a better solution ?
Any suggestions are welcome.

> And if you want to force certain behavior at run-time, then I don't
> see what's wrong with an explicit coercion using as(foo, "bar").

If you have two objects E1, E2 of class "Exp" (with the same rate)
you (or the user for whom we provide these classes)  rather want to
call "+" by E1 + E2  than
by  as(E1, "Gammad") + as(E2,"Gammad")
...

Anyway, thank you for your help

Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Branch changes at feature freeze

2006-04-10 Thread Duncan Murdoch
On 4/10/2006 5:16 AM, Prof Brian Ripley wrote:
> Peter Dalgaard is travelling today, so this is a 'heads up' on the effects 
> of having gone today into feature freeze on 2.3.0.
> 
> R-devel (the SVN trunk and the tarballs made available from ETHZ) is now 
> labelled '2.4.0 Under development' and will shortly include changes 
> intended for 2.4.0 (and not for 2.3.0).
> 
> The pre-release code for 2.3.0 is on the SVN branch R-2-3-patches:
> daily tarballs (now labelled R-beta) remain available from
> 
> http://cran.r-project.org/src/base-prerelease/
> 

For anyone who downloads the Windows builds: the "r-patched" build will 
stay on the old 2-2-patches branch until the release, and the "r-devel" 
build will continue to be made from the daily tarballs, now on the 
R-2-3-patches branch.  From now until the release date there won't be 
any binary builds from the trunk.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta

2006-04-10 Thread Dirk Eddelbuettel

On 10 April 2006 at 10:06, Prof Brian Ripley wrote:
| Since that compiler is not even the last in the 3.3.x series, and there 
| are now three later (released) gcc series, I think we have to write that 
| off to an optimization bug in gcc 3.3.x.

Fair point, especially as you have to insist on using gcc 3.3.* on Debian:
-- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5
-- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6)
-- 4.0.3 is the current default
-- 4.1.0 is available too

That appears to be the same on Debian testing and unstable.

Dirk

[EMAIL PROTECTED]:~> dpkg -l | grep gcc | cut -c -78
ii  gcc   4.0.2-2The GNU C
ii  gcc-2.95  2.95.4-22  The GNU C
ii  gcc-3.3   3.3.6-13   The GNU C
ii  gcc-3.3-base  3.3.6-13   The GNU C
ii  gcc-3.4   3.4.5-2The GNU C
ii  gcc-3.4-base  3.4.5-2The GNU C
ii  gcc-4.0   4.0.3-1The GNU C
ii  gcc-4.0-base  4.0.3-1The GNU C
ii  gcc-4.1-base  4.1.0-1The GNU C
ii  libgcc1   4.1.0-1GCC suppo
[EMAIL PROTECTED]:~>

-- 
Hell, there are no rules here - we're trying to accomplish something. 
  -- Thomas A. Edison

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta

2006-04-10 Thread Bjørn-Helge Mevik
Dirk Eddelbuettel wrote:

> Fair point, especially as you have to insist on using gcc 3.3.* on Debian:
> -- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5
> -- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6)
> -- 4.0.3 is the current default
> -- 4.1.0 is available too
>
> That appears to be the same on Debian testing and unstable.
>
> Dirk
>
> [EMAIL PROTECTED]:~> dpkg -l | grep gcc | cut -c -78
> ii  gcc   4.0.2-2The GNU C
> ii  gcc-2.95  2.95.4-22  The GNU C
> ii  gcc-3.3   3.3.6-13   The GNU C
> ii  gcc-3.3-base  3.3.6-13   The GNU C
> ii  gcc-3.4   3.4.5-2The GNU C
> ii  gcc-3.4-base  3.4.5-2The GNU C
> ii  gcc-4.0   4.0.3-1The GNU C
> ii  gcc-4.0-base  4.0.3-1The GNU C
> ii  gcc-4.1-base  4.1.0-1The GNU C
> ii  libgcc1   4.1.0-1GCC suppo
> [EMAIL PROTECTED]:~>

Hmmm... I don't `see' all those versions.  After an `aptitude update':

9 (1) $ aptitude search gcc
[...]
i   gcc - The GNU C compiler   
i   gcc-2.95- The GNU C compiler   
p   gcc-2.95-doc- Documentation for the GNU compilers (gcc,
v   gcc-3.0 -  
v   gcc-3.0-base-  
v   gcc-3.0-doc -  
v   gcc-3.2 -  
v   gcc-3.2-base-  
v   gcc-3.2-doc -  
i A gcc-3.3 - The GNU C compiler   
i A gcc-3.3-base- The GNU Compiler Collection (base package
p   gcc-3.3-doc - Documentation for the GNU compilers (gcc,
i   gcc-3.4 - The GNU C compiler   
i A gcc-3.4-base- The GNU Compiler Collection (base package
p   gcc-3.4-doc - Documentation for the GNU compilers (gcc,
[...]
The 3.3 is 3.3.5-13, and the 3.4 is 3.4.3-13.

My /etc/apt/sources.list is:

deb http://ftp.no.debian.org/debian/ sarge main non-free contrib
deb-src http://ftp.no.debian.org/debian/ sarge main non-free contrib
deb http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free
deb-src http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib 
non-free
deb http://security.debian.org/ sarge/updates main contrib non-free

Why am I seeing older versions than you?

I just installed gcc-3.4, but gcc --version still says 3.3.5.  What
have I done (probably without knowing it) to `insist on using gcc
3.3.*', and how can I reverse that? (I have no desire to use old
compiler versions. :-)

-- 
Bjørn-Helge Mevik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check of R-alpha_2006-04-08_r37675 fails: qbeta

2006-04-10 Thread Dirk Eddelbuettel

On 10 April 2006 at 14:31, Bjørn-Helge Mevik wrote:
| Dirk Eddelbuettel wrote:
| 
| > Fair point, especially as you have to insist on using gcc 3.3.* on Debian:
| > -- 3.3.6 is the current 3.3.* one whereas Bjørn-Helge used 3.3.5
| > -- 3.4.5 is the latest 3.* one supplanting 3.3.(5,6)
| > -- 4.0.3 is the current default
| > -- 4.1.0 is available too

[...]
| Hmmm... I don't `see' all those versions.  After an `aptitude update':

(That didn't show version numbers...)

| My /etc/apt/sources.list is:
| 
| deb http://ftp.no.debian.org/debian/ sarge main non-free contrib
| deb-src http://ftp.no.debian.org/debian/ sarge main non-free contrib
| deb http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib non-free
| deb-src http://ftp.no.debian.org/debian-non-US sarge/non-US main contrib 
non-free
| deb http://security.debian.org/ sarge/updates main contrib non-free
| 
| Why am I seeing older versions than you?

Because you point to 'sarge' which was frozen and released a year ago.
If you want something newer than Debian stable, you have to point to it.

This is all off-topic here. Please consider (subscribing and) posting to
r-sig-debian for R/Debian related matters, or debian-help for generic Debian
questions. 

Hope this helps, Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
  -- Thomas A. Edison

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Example in ?order

2006-04-10 Thread Gregor Gorjanc
Hello!

On R Version 2.2.1  (2005-12-20 r36812) and in SVN

The following part of the example in ?order says

 ## For character vectors we can make use of rank:
 cy <- as.character(y)
 rbind(x,y,z)[, order(x, -rank(y), z)]

But "cy" is not used in there.

-- 
Lep pozdrav / With regards,
Gregor Gorjanc

--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc  bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Run package code on R shutdown?

2006-04-10 Thread Henrik Bengtsson
On 4/10/06, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> I'm sure I've seen this discussed before, but haven't been able to find
> it.  I'd like some package code to be run when R is shut down
> (approximately when a user's .Last function would be run), to clean up
> properly.  What is the best way to do this?

I tried to do this some time ago.  My conclusion then is that it
cannot be done with a guarantee, because R can exit in different ways.
 I implemented what I had an came up with an onSessionExit() method
available in R.utils.  Check that out for a start.  It modifies
.Last(), but that can be circumvented by quit(callLast=FALSE).

/Henrik

> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


--
Henrik Bengtsson
Mobile: +46 708 909208 (+2h UTC)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should demo files be run as part of R CMD check?

2006-04-10 Thread hadley wickham
> > The rationale may be that a demo is entitled to assume it is being run
> > interactively.  Checking demo(tkdensity), for example, would be
> > unproductive.
>
> Also, it is easy for a package author to arrange to check the demos by a
> test in the package's tests directory.

Thanks for your comments - I hadn't considered the case of interactive
demos, and as you say it is easy enough to add these checks by using a
test in the tests directory.Would it be helpful to provide a short
note to this effect in writing R extensions?  I would be happy to
provide a diff against the latest source.

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Run package code on R shutdown?

2006-04-10 Thread Duncan Murdoch
On 4/10/2006 6:16 AM, Prof Brian Ripley wrote:
> On Sun, 9 Apr 2006, Duncan Murdoch wrote:
> 
>> I'm sure I've seen this discussed before, but haven't been able to find
>> it.  I'd like some package code to be run when R is shut down
>> (approximately when a user's .Last function would be run), to clean up
>> properly.  What is the best way to do this?
> 
> The only way I know to do this is to use a finalizer, as we don't run
> .Last.lib on shutdown.  (That's how RODBC does it.)
> 
> Now, as I recall this cannot be done from reg.finalizer, only from the 
> C-level R_RegisterCFinalizerEx, which has an optional argument to ensure 
> that the finalizer is run 'onexit'.   (I have never understood why we have 
> that restriction, nor why reg.finalizer is primitive and not .Internal.)

Thanks!

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] get(name, envir=envir) : formal argument "envir" matched by multiple actual arguments

2006-04-10 Thread Henrik Bengtsson
Hi,

very sporadic and non-reproducible, I get the following type of errors:

Error in get(name, envir = envir) : formal argument "envir" matched by
multiple actual arguments

Error in exists(cacheName, envir = envir, inherit = FALSE) : formal
argument "envir" matched by multiple actual arguments

Error in paste(..., sep = sep) : formal argument "sep" matched by
multiple actual arguments

I cannot see how these errors can occur. Note, in the third example
"..." does not contain a 'sep' (or an argument with the same prefix).

The thing is that it does not happen all the time and if I just re-run
my code it works fine again.  What I can remember, I've seen this
since about R v2.0.0 or so.  My current version is Rv 2.3.0 alpha
(2006-04-02 r37626) on WinXP.  It has been to rare to be able to
troubleshoot it and I cannot reproduce it more than running a script
for hours.  If I rename the variable to say,

envir2 <- envir
get(name, envir=envir2)

the problem seems to go away, i.e. it is not frequent enough to observe it.

Has anyone else seen this?

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setIs and method dispatch in S4 classes

2006-04-10 Thread John Chambers
 From your description of the application, it sounds like you would be 
better off just forcing "+" to behave as you want.  Using inheritance is 
a much more powerful mechanism & can introduce results you don't want, 
as it seems to have in this case.

An important point about using inheritance is that the subclass is a 
asserted to be substitutable for the superclass for ALL purposes.  This 
applies whether using "contains=" or  setIs().

When the focus is on a particular function, it's usually better to 
implement methods for that function, maybe along with setAs() 
methods--not setIs().

It seems likely that such a solution would be cleaner in design, not to 
mention that it would likely work.  (see also suggestion below)


Peter Ruckdeschel wrote:

>Hi Seth ,
>
>thank you for your reply.
>
>Seth Falcon  <[EMAIL PROTECTED]> writes:
>
>  
>
>>Peter Ruckdeschel <[EMAIL PROTECTED]> writes:
>> 
>>
>>
>>
>>>## now: B00 mother class to B01 and B02, and again B02 "contains" B01 by
>>>setIs:
>>>setClass("B00", representation(a="numeric"))
>>>setClass("B01", representation(a="numeric",b="numeric"), contains= "B00")
>>>setClass("B02", representation(a="numeric",d="numeric"), contains= "B00")
>>>setIs("B02","B01",coerce=function(obj){new("B01", [EMAIL PROTECTED], [EMAIL 
>>>PROTECTED])},
>>>   replace=function(obj,value){new("B01", [EMAIL PROTECTED], [EMAIL 
>>> PROTECTED])})
>>>
>>># now two "+" methods  for B00 and B01
>>>setMethod("+", signature=c("B00","B00"), function(e1,e2)[EMAIL PROTECTED]@a})
>>>setMethod("+", signature=c("B01","B01"), function(e1,e2)[EMAIL PROTECTED]@b})
>>>
>>>x1=new("B02", a=1, d=2)
>>>x2=new("B02", a=1, d=3)
>>>
>>>x1+x2 ## 2 --- why?
>>>   
>>>
>>>  
>>>
>>My impression from reading over the man page for setIs, is that it
>>isn't intended to be used to override the existing inheritance
>>hierarchy.  It also mentions that the return value is the extension
>>info as a list, so that could also be useful in understanding what
>>setIs is doing.  Here's the output for your example:
>>
>>   Slots:
>> 
>>   Name:a   d
>>   Class: numeric numeric
>>   
>>   Extends: 
>>   Class "B00", directly
>>   Class "B01", directly, with explicit coerce
>>
>>Use the contains arg of setClass to define the superclasses.  With the
>>contains arg, the order determines the precedence for method lookup.
>>But I suspect you know that already.  
>> 
>>
>>
>>
>Yes, I have been aware of this, thank you.
>
>  
>
>>>Is there a possibility to force usage of the B01 method /without/
>>>explicitely coercing x1,x2 to B01, i.e. interfere in the dispatching 
>>>precedence, telling R somehow  (by particular arguments for setIs ?)  
>>>to always use the is-relation defined by setIs first before mounting 
>>>the hierarchy tree?
>>>   
>>>
>>>  
>>>
>>Perhaps explaining a bit more about what you are trying to accomplish
>>will allow someone to provide a more helpful suggestion than mine :-)
>>
>>
>
>In the "real" context, B00 stands for a class "AbscontDistribution",
>which implements absolutely continuous (a.c.) distributions. B01 is
>class "Gammad" which implements Gamma distributions, and B02 is
>class "Exp" which implements exponential distributions. The method
>still is "+", but interpreted as convolution.
>
>For  a.c. distributions, the default method is an FFT-based numerical
>convolution algorithm, while for Gamma distributions (with the same
> scale parameter), analytic, hence much more accurate convolution
>formulas are used. For "Exp", I would tell R that it also 'is' a "Gammad"
>distribution by a call to setIs and use the "Gammad"-method.
>
>Of course, I could also declare explicitly "+" methods for signatures
>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp")  in
>which I would then use as(.) to coerce "Exp" to "Gammad"
>(and again the same procedure for further Gamma-methods). 
>
>But, this would create an extra (3 or possibly much more) methods
>to dispatch, and I doubt whether this really is the preferred
>solution.
>  
>
Why not?  And you can avoid some of the extra methods by defining a 
virtual class that is the union of the classes for which you want the 
new methods.

Something like (untested code!)

setClassUnion("analyticConvolution", c("Exp", "Gammad"))
setMethod("+", c("analyticConvolution", "analyticConvolution"), )

>  
>
>>If you know the inheritance structure you want before run-time, then
>>I'm not seeing why you wouldn't just use the contains arg
>>
>>
>
>I do not want to use the "+"  method for "B00" for accuracy reasons
>(see above).
>
>The reason why I do not want to implement "B01" ("Gammad")
>as mother class of "B02" is that
>
>(a) the slot structure is not identical --- in the real context Gamma
>and Exp use different parametrizations ---
> + rate for "Exp" (cf ?rexp) and
> + shape for "Gammad" (cf rgamma)
>
>(b) also class "Weibull" could be used as mother class to "Exp",
>and I do not want to decide whether the 

Re: [Rd] setIs and method dispatch in S4 classes

2006-04-10 Thread Seth Falcon
Hi John,

I found your comments helpful, even though this isn't _my_ question.
But now I have one of my own :-)

John Chambers <[EMAIL PROTECTED]> writes:
>>Of course, I could also declare explicitly "+" methods for signatures
>>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp")  in
>>which I would then use as(.) to coerce "Exp" to "Gammad"
>> (and again the same procedure for further Gamma-methods). 
>>
>>But, this would create an extra (3 or possibly much more) methods
>>to dispatch, and I doubt whether this really is the preferred
>>solution.
>>  
>>
> Why not?  And you can avoid some of the extra methods by defining a
> virtual class that is the union of the classes for which you want the
> new methods.
>
> Something like (untested code!)
>
> setClassUnion("analyticConvolution", c("Exp", "Gammad"))
> setMethod("+", c("analyticConvolution", "analyticConvolution"),
> )

Why class union here and not an abstract superclass?  

If you "own" the Exp and Gammad classes, would an abstract superclass
work as well?  I think so.

However, if you don't own the Exp and Gammad classes, I can see that
the class union approach allows you the flexibility of defining a
superclass post-hoc.

I guess I have the sense that class unions are fancy/tricky (a number
of popular languages don't have that concept, AFAIK).  That isn't a
reason not to use them in a langauge that does support them, of
course.  

It is an interesting design question.  On the one hand, one could
argue for abstract superclasses when possible because they are "less
tricky" (and you need them when you want to share slots).  On the
other hand, the class union approach provides a more loosely coupled
design since members of the union don't have to know about each other.

Hmm, I think I understand class unions a lot better already.  Thanks.
If I'm terribly off-track, please let me know.  

+ seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Henrik Bengtsson
Hi,

I've got two suggestions how to speed up median() about 50%.  For all
iterative methods calling median() in the loops this has a major
impact.  The second suggestion will apply to other methods too.

This is what the functions look like today:

> median
function (x, na.rm = FALSE)
{
if (is.factor(x) || mode(x) != "numeric")
stop("need numeric data")
if (na.rm)
x <- x[!is.na(x)]
else if (any(is.na(x)))
return(NA)
n <- length(x)
if (n == 0)
return(NA)
half <- (n + 1)/2
if (n%%2 == 1) {
sort(x, partial = half)[half]
}
else {
sum(sort(x, partial = c(half, half + 1))[c(half, half +
1)])/2
}
}


Suggestion 1:
Replace the sort() calls with the .Internal(psort(x, partial)).   This
will avoid unnecessary overhead, especially an expensive second check
for NAs using any(is.na(x)).  Simple benchmarking with

x <- rnorm(10e6)
system.time(median(x))/system.time(median2(x))

where median2() is the function with the above replacements, gives
about 20-25% speed up.

Suggestion 2:
Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
as soon as a NA value is detected.  In the best case it returns after
the first index with TRUE, in the worst case it returns after the last
index N with FALSE.  The cost for is.na(x) is always O(N), and any()
in the best case O(1) and in the worst case O(N) (if any() is
implemented as I hope).  An has.na() function would be very useful
elsewhere too.

An poor mans alternative to (2), is to have a third alternative to
'na.rm', say, NA, which indicates that we know that there are no NAs
in 'x'.

The original median() is approx 50% slower (naive benchmarking) than a
version with the above two improvements, if passing a large 'x' with
no NAs;

median2 <- function (x, na.rm = FALSE) {
if (is.factor(x) || mode(x) != "numeric")
stop("need numeric data")

if (is.na(na.rm)) {
} else if (na.rm)
x <- x[!is.na(x)]
else if (any(is.na(x)))
return(NA)

n <- length(x)
if (n == 0)
return(NA)
half <- (n + 1)/2
if (n%%2 == 1) {
.Internal(psort(x, half))[half]
}
else {
sum(.Internal(psort(x, c(half, half + 1)))[c(half, half + 1)])/2
}
}

x <- rnorm(10e5)
K <- 10
t0 <- system.time({
  for (kk in 1:K)
y <- median(x);
})
print(t0)  # [1] 1.82 0.14 1.98   NA   NA
t1 <- system.time({
  for (kk in 1:K)
y <- median2(x, na.rm=NA);
})
print(t1)  # [1] 1.25 0.06 1.34   NA   NA
print(t0/t1)  # [1] 1.456000 2.33 1.477612   NA   NA

BTW, without having checked the source code, it looks like is.na() is
unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
a vector without NAs.  On the other hand, is.na(sum(x)) becomes
awfully slow if 'x' contains NAs.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] install.packages on unix / su (PR#8760)

2006-04-10 Thread thomas . friedrichsmeier
Full_Name: Thomas Friedrichsmeier
Version: R 2.2.1
OS: Debian / Linux
Submission from: (NULL) (84.60.123.243)


Wishlist item:

There is a small problem using intall.packages() (and update.packages()):
Typically I want to install packages for system-wide use, not in a user
directory. Obviously this does not work without superuser rights.
What I would like to be able to do is to specify a "become root" command to use
in install.packages (). Probably this would be done using an extra argument to
install.packages () and update.packages ():

install.packages ([...], install.wrapper=NULL)

The argument value I would typically want to supply on my system (running in a
KDE Session) would be: install.wrapper="kdesu --" . I.e. I would like to run the
R CMD INSTALL command through kdesu.

Technically it would basically function like this:

Instead of

cmd0 <- paste(file.path(R.home("bin"),"R"), "CMD INSTALL")

in install.packages (), it would read

cmd0 <- paste(install.wrapper, file.path(R.home("bin"),"R"), "CMD INSTALL")

This feature would save me a lot of small hazzles.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setIs and method dispatch in S4 classes

2006-04-10 Thread John Chambers
Seth Falcon wrote:

>Hi John,
>
>I found your comments helpful, even though this isn't _my_ question.
>But now I have one of my own :-)
>
>John Chambers <[EMAIL PROTECTED]> writes:
>  
>
>>>Of course, I could also declare explicitly "+" methods for signatures
>>>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp")  in
>>>which I would then use as(.) to coerce "Exp" to "Gammad"
>>>(and again the same procedure for further Gamma-methods). 
>>>
>>>But, this would create an extra (3 or possibly much more) methods
>>>to dispatch, and I doubt whether this really is the preferred
>>>solution.
>>> 
>>>
>>>  
>>>
>>Why not?  And you can avoid some of the extra methods by defining a
>>virtual class that is the union of the classes for which you want the
>>new methods.
>>
>>Something like (untested code!)
>>
>>setClassUnion("analyticConvolution", c("Exp", "Gammad"))
>>setMethod("+", c("analyticConvolution", "analyticConvolution"),
>>)
>>
>>
>
>Why class union here and not an abstract superclass?  
>
>If you "own" the Exp and Gammad classes, would an abstract superclass
>work as well?  I think so.
>  
>
Yes, as is said frequently of a certain other language "There's more 
than one way to do it"

My own feeling is that class unions are a convenient shorthand & clearer 
than explicitly defining the superclass and then having to establish the 
inheritance separately for the two subclasses.  Although the 
documentation mentions that they _must_ be used for classes you don't 
own, that's not their only purpose.

Virtual classes (ahem, I assume that's what you meant by "abstract"  
;-)) may or may not have slots of their own.   Creating a virtual class 
"analyticConvolution" and doing two setIs() calls would in fact be 
roughly equivalent to the setClassUnion, but not as clear, IMO.

If the superclass was really crucial to the model, that would make it 
more natural to have it explicitly in the contains= for the individual 
subclasses.  Here, though, it seems more like a computational 
convenience for a fairly small part of the overall package, so isolating 
it in a single setClassUnion() call seems more natural.

Obviously, a question of taste and style.

>However, if you don't own the Exp and Gammad classes, I can see that
>the class union approach allows you the flexibility of defining a
>superclass post-hoc.
>
>I guess I have the sense that class unions are fancy/tricky (a number
>of popular languages don't have that concept, AFAIK).  That isn't a
>reason not to use them in a langauge that does support them, of
>course.  
>
>It is an interesting design question.  On the one hand, one could
>argue for abstract superclasses when possible because they are "less
>tricky" (and you need them when you want to share slots).  On the
>other hand, the class union approach provides a more loosely coupled
>design since members of the union don't have to know about each other.
>
>Hmm, I think I understand class unions a lot better already.  Thanks.
>If I'm terribly off-track, please let me know.  
>
>+ seth
>
>__
>R-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>
>  
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] setIs and method dispatch in S4 classes

2006-04-10 Thread Peter Ruckdeschel
Hi Seth and John,

Thank you for your helpful responses,

>John Chambers <[EMAIL PROTECTED]> writes:
>>From your description of the application, it sounds like you would be
>>better off just forcing "+" to behave as you want.  Using inheritance is
>>a much more powerful mechanism & can introduce results you don't want,
>>as it seems to have in this case.
>>
>>An important point about using inheritance is that the subclass is a
>>asserted to be substitutable for the superclass for ALL purposes.  This
>>applies whether using "contains=" or  setIs().

I am not sure whether I got the meaning of "substitutable for the
superclass for ALL purposes" :

In the application I sketched, any  Exp(rate = lambda) distribution
really /is/ a Gammad(shape = 1, scale = 1/lambda) distribution; 
so my understanding is that "Exp" is substitutable for "Gammad"
for ALL purposes.

"Gammad" was not designed to be the motherclass to "Exp" right
from the beginning because the same 'is'-relation also applies to
"Weibull": any Exp(rate = lambda) distribution /is/ a
Weibull(shape = 1, scale = 1/lambda) distribution.

Does "substitutable for the superclass for ALL purposes"
mean 'without ambiguity' (as might enter through Weibull/Gammad)?

>>When the focus is on a particular function, it's usually better to
>>implement methods for that function, maybe along with setAs()
>>methods--not setIs().

You mean I should not leave the coercion decision up to the dispatching
mechanism?

>>It seems likely that such a solution would be cleaner in design, not to
>>mention that it would likely work.  (see also suggestion below)

Yes, your indication does work; thank you!
 
>>Peter Ruckdeschel <[EMAIL PROTECTED]> writes:
>>>Of course, I could also declare explicitly "+" methods for signatures
>>>c("Exp", "Exp"), c("Exp", "Gammad"), and c("Gammad", "Exp") in
>>>which I would then use as(.) to coerce "Exp" to "Gammad"
>>> (and again the same procedure for further Gamma-methods).
>>>
>>>But, this would create an extra (3 or possibly much more) methods
>>>to dispatch, and I doubt whether this really is the preferred
>>>solution.
>>>
>> Why not?

It simply did not seem to me elegant to have three calls to
setMethod() doing more or less the same thing.
I thought that, as elegant as R solutions from the R core are
most times, there should be some mechanism to avoid this
threefold code---and in fact you indicated how to---
thank you!

>> And you can avoid some of the extra methods by defining a
>> virtual class that is the union of the classes for which you
>> want the new methods.
>>
>> Something like (untested code!)
>>
>> setClassUnion("analyticConvolution", c("Exp", "Gammad"))
>> setMethod("+", c("analyticConvolution", "analyticConvolution"),
>> )

Seth Falcon <[EMAIL PROTECTED]> writes:
> Why class union here and not an abstract superclass?

Am I right: the class generated by setClassUnion() does not
enter the inheritance tree / mechanism?

setClassUnion()---at least in my case---solves the problem;
thank you again.

[snip]

Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages on unix / su (PR#8760)

2006-04-10 Thread Dirk Eddelbuettel

On 10 April 2006 at 21:14, [EMAIL PROTECTED] wrote:
| Full_Name: Thomas Friedrichsmeier
| Version: R 2.2.1
| OS: Debian / Linux
| Submission from: (NULL) (84.60.123.243)
| 
| 
| Wishlist item:
| 
| There is a small problem using intall.packages() (and update.packages()):
| Typically I want to install packages for system-wide use, not in a user
| directory. Obviously this does not work without superuser rights.

One can see this problem as a local system management issue for which another
possible answer is to add you (and/or the user users installing R packages)
to, say, group 'admin' and to make /usr/local/lib/R of group admin and
group-writeable.  Or create a custom group radmin. Or ...

Dirk

| What I would like to be able to do is to specify a "become root" command to 
use
| in install.packages (). Probably this would be done using an extra argument to
| install.packages () and update.packages ():
| 
| install.packages ([...], install.wrapper=NULL)
| 
| The argument value I would typically want to supply on my system (running in a
| KDE Session) would be: install.wrapper="kdesu --" . I.e. I would like to run 
the
| R CMD INSTALL command through kdesu.
| 
| Technically it would basically function like this:
| 
| Instead of
| 
| cmd0 <- paste(file.path(R.home("bin"),"R"), "CMD INSTALL")
| 
| in install.packages (), it would read
| 
| cmd0 <- paste(install.wrapper, file.path(R.home("bin"),"R"), "CMD INSTALL")
| 
| This feature would save me a lot of small hazzles.
| 
| __
| R-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hell, there are no rules here - we're trying to accomplish something. 
  -- Thomas A. Edison

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages on unix / su (PR#8760)

2006-04-10 Thread Thomas Friedrichsmeier
> | Wishlist item:
> |
> | There is a small problem using intall.packages() (and update.packages()):
> | Typically I want to install packages for system-wide use, not in a user
> | directory. Obviously this does not work without superuser rights.
>
> One can see this problem as a local system management issue for which
> another possible answer is to add you (and/or the user users installing R
> packages) to, say, group 'admin' and to make /usr/local/lib/R of group
> admin and group-writeable.  Or create a custom group radmin. Or ...

It's about convenience, no more, no less, and so it's a wishlist item, no 
more, and no less.
I don't think the case of a non-root user working on a de-facto single user 
system is too uncommon on linux. It's why tools like kdesu exist in the first 
place. Unless there are strong reasons not to (and there may well be), I 
think adding some convenience option for this particular case may well be 
worth while.

Regards
Thomas




pgpsRa2e77DYM.pgp
Description: PGP signature
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Thomas Lumley
On Mon, 10 Apr 2006, Henrik Bengtsson wrote:

> Hi,
>
> I've got two suggestions how to speed up median() about 50%.  For all
> iterative methods calling median() in the loops this has a major
> impact.  The second suggestion will apply to other methods too.

I'm surprised this has a major impact -- in your example it takes much 
longer to generate the ten million numbers than to find the median.

> Suggestion 1:
> Replace the sort() calls with the .Internal(psort(x, partial)).   This
> will avoid unnecessary overhead, especially an expensive second check
> for NAs using any(is.na(x)).  Simple benchmarking with
>
> x <- rnorm(10e6)
> system.time(median(x))/system.time(median2(x))
>
> where median2() is the function with the above replacements, gives
> about 20-25% speed up.

There's something that seems a bit undesirable about having median() call 
the .Internal function for sort().

> Suggestion 2:
> Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
> as soon as a NA value is detected.  In the best case it returns after
> the first index with TRUE, in the worst case it returns after the last
> index N with FALSE.  The cost for is.na(x) is always O(N), and any()
> in the best case O(1) and in the worst case O(N) (if any() is
> implemented as I hope).  An has.na() function would be very useful
> elsewhere too.

This sounds useful (though it has missed the deadline for 2.3.0).

It won't help if the typical case is no missing values, as you suggest, 
but it will be faster when there are missing values.

> BTW, without having checked the source code, it looks like is.na() is
> unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
> a vector without NAs.  On the other hand, is.na(sum(x)) becomes
> awfully slow if 'x' contains NAs.
>

I don't think  it is unnecessarily slow.  It has to dispatch methods and 
it has to make sure that matrix structure is preserved.  After that the 
code is just

 case REALSXP:
 for (i = 0; i < n; i++)
 LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
 break;

and it's hard to see how that can be improved. It does suggest that a 
faster anyNA() function would have to not be generic.


-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Bill Dunlap
On Mon, 10 Apr 2006, Thomas Lumley wrote:

> On Mon, 10 Apr 2006, Henrik Bengtsson wrote:
>
> > Hi,
> >
> > I've got two suggestions how to speed up median() about 50%.  For all
> > iterative methods calling median() in the loops this has a major
> > impact.  The second suggestion will apply to other methods too.
>
> > Suggestion 2:
> > Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
> > as soon as a NA value is detected.  In the best case it returns after
> > the first index with TRUE, in the worst case it returns after the last
> > index N with FALSE.  The cost for is.na(x) is always O(N), and any()
> > in the best case O(1) and in the worst case O(N) (if any() is
> > implemented as I hope).  An has.na() function would be very useful
> > elsewhere too.
>
> This sounds useful (though it has missed the deadline for 2.3.0).
>
> It won't help if the typical case is no missing values, as you suggest,
> but it will be faster when there are missing values.

Splus has such a function, but it is called anyMissing().  In the
interests of interoperability it would be nice if R used that name.
(I did not choose the name, but that is what it is.)

The following experiment using Splus seems to indicate the speedup has
less to do with stopping at the first NA than it does with not
making/filling/copying/whatever the big vector of logicals that is.na
returns.

   > # NA near start of list of 10 million integers
   > { z<-replace(1:1e7,2,NA); unix.time(anyMissing(z)) }
   [1] 0 0 0 0 0
   > { z<-replace(1:1e7,2,NA); unix.time(any(is.na(z)))}
   [1] 0.62 0.13 0.75 0.00 0.00

   > # NA at end of list
   > { z<-replace(1:1e7,1e7,NA); unix.time(anyMissing(z)) }
   [1] 0.07 0.00 0.07 0.00 0.00
   > { z<-replace(1:1e7,1e7,NA); unix.time(any(is.na(z)))}
   [1] 0.64 0.11 0.75 0.00 0.00

The Splus anyMissing is an s3 generic (i.e., it calls UseMethod()).
The Splus is.na is an s4 generic and its default method may invoke
an s3 generic.

> > BTW, without having checked the source code, it looks like is.na() is
> > unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
> > a vector without NAs.  On the other hand, is.na(sum(x)) becomes
> > awfully slow if 'x' contains NAs.
> >
>
> I don't think  it is unnecessarily slow.  It has to dispatch methods and
> it has to make sure that matrix structure is preserved.  After that the
> code is just
>
>  case REALSXP:
>  for (i = 0; i < n; i++)
>  LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
>  break;
>
> and it's hard to see how that can be improved. It does suggest that a
> faster anyNA() function would have to not be generic.


Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Duncan Murdoch
On 4/10/2006 7:22 PM, Thomas Lumley wrote:
> On Mon, 10 Apr 2006, Henrik Bengtsson wrote:
> 
>> Hi,
>>
>> I've got two suggestions how to speed up median() about 50%.  For all
>> iterative methods calling median() in the loops this has a major
>> impact.  The second suggestion will apply to other methods too.
> 
> I'm surprised this has a major impact -- in your example it takes much 
> longer to generate the ten million numbers than to find the median.
> 
>> Suggestion 1:
>> Replace the sort() calls with the .Internal(psort(x, partial)).   This
>> will avoid unnecessary overhead, especially an expensive second check
>> for NAs using any(is.na(x)).  Simple benchmarking with
>>
>> x <- rnorm(10e6)
>> system.time(median(x))/system.time(median2(x))
>>
>> where median2() is the function with the above replacements, gives
>> about 20-25% speed up.
> 
> There's something that seems a bit undesirable about having median() call 
> the .Internal function for sort().
> 
>> Suggestion 2:
>> Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
>> as soon as a NA value is detected.  In the best case it returns after
>> the first index with TRUE, in the worst case it returns after the last
>> index N with FALSE.  The cost for is.na(x) is always O(N), and any()
>> in the best case O(1) and in the worst case O(N) (if any() is
>> implemented as I hope).  An has.na() function would be very useful
>> elsewhere too.
> 
> This sounds useful (though it has missed the deadline for 2.3.0).
> 
> It won't help if the typical case is no missing values, as you suggest, 
> but it will be faster when there are missing values.

I think it would help even in that case if the vector is large, because 
it avoids allocating and disposing of the logical vector of the same 
length as x.

>> BTW, without having checked the source code, it looks like is.na() is
>> unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
>> a vector without NAs.  On the other hand, is.na(sum(x)) becomes
>> awfully slow if 'x' contains NAs.
>>
> 
> I don't think  it is unnecessarily slow.  It has to dispatch methods and 
> it has to make sure that matrix structure is preserved.  After that the 
> code is just
> 
>  case REALSXP:
>  for (i = 0; i < n; i++)
>  LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
>  break;
> 
> and it's hard to see how that can be improved. It does suggest that a 
> faster anyNA() function would have to not be generic.

If it's necessary to make it not generic to achieve the speedup, I don't 
think it's worth doing.  If anyNA is written not to be generic I'd guess 
a very common error will be to apply it to a dataframe and get a 
misleading "FALSE" answer.  If we do that, I predict that the total 
amount of r-help time wasted on it will exceed the CPU time saved by 
orders of magnitude.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Thomas Lumley
On Mon, 10 Apr 2006, Duncan Murdoch wrote:

> On 4/10/2006 7:22 PM, Thomas Lumley wrote:
>> On Mon, 10 Apr 2006, Henrik Bengtsson wrote:
>> 
>>> Suggestion 2:
>>> Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
>>> as soon as a NA value is detected.  In the best case it returns after
>>> the first index with TRUE, in the worst case it returns after the last
>>> index N with FALSE.  The cost for is.na(x) is always O(N), and any()
>>> in the best case O(1) and in the worst case O(N) (if any() is
>>> implemented as I hope).  An has.na() function would be very useful
>>> elsewhere too.
>> 
>> This sounds useful (though it has missed the deadline for 2.3.0).
>> 
>> It won't help if the typical case is no missing values, as you suggest, but 
>> it will be faster when there are missing values.
>
> I think it would help even in that case if the vector is large, because it 
> avoids allocating and disposing of the logical vector of the same length as 
> x.

That makes sense. I have just tried, and for vectors of length ten 
million it does make a measurable difference.


>>> BTW, without having checked the source code, it looks like is.na() is
>>> unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
>>> a vector without NAs.  On the other hand, is.na(sum(x)) becomes
>>> awfully slow if 'x' contains NAs.
>>> 
>> 
>> I don't think  it is unnecessarily slow.  It has to dispatch methods and it 
>> has to make sure that matrix structure is preserved.  After that the code 
>> is just
>>
>>  case REALSXP:
>>  for (i = 0; i < n; i++)
>>  LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
>>  break;
>> 
>> and it's hard to see how that can be improved. It does suggest that a 
>> faster anyNA() function would have to not be generic.
>
> If it's necessary to make it not generic to achieve the speedup, I don't 
> think it's worth doing.  If anyNA is written not to be generic I'd guess a 
> very common error will be to apply it to a dataframe and get a misleading 
> "FALSE" answer.  If we do that, I predict that the total amount of r-help 
> time wasted on it will exceed the CPU time saved by orders of magnitude.
>

I wasn't proposing that it should be stupid, just not generic.  It could 
support data frames (sum(), does, for example). If it didn't support data 
frames it should certainly give an error rather than the wrong answer, but 
if we are seriously trying to avoid delays around 0.1 seconds then going 
through the generic function mechanism may be a problem.


-thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestions to speed up median() and has.na()

2006-04-10 Thread Duncan Murdoch
On 4/10/2006 8:08 PM, Thomas Lumley wrote:
> On Mon, 10 Apr 2006, Duncan Murdoch wrote:
> 
>> On 4/10/2006 7:22 PM, Thomas Lumley wrote:
>>> On Mon, 10 Apr 2006, Henrik Bengtsson wrote:
>>>
 Suggestion 2:
 Create a has.na(x) function to replace any(is.na(x)) that returns TRUE
 as soon as a NA value is detected.  In the best case it returns after
 the first index with TRUE, in the worst case it returns after the last
 index N with FALSE.  The cost for is.na(x) is always O(N), and any()
 in the best case O(1) and in the worst case O(N) (if any() is
 implemented as I hope).  An has.na() function would be very useful
 elsewhere too.
>>> This sounds useful (though it has missed the deadline for 2.3.0).
>>>
>>> It won't help if the typical case is no missing values, as you suggest, but 
>>> it will be faster when there are missing values.
>> I think it would help even in that case if the vector is large, because it 
>> avoids allocating and disposing of the logical vector of the same length as 
>> x.
> 
> That makes sense. I have just tried, and for vectors of length ten 
> million it does make a measurable difference.
> 
> 
 BTW, without having checked the source code, it looks like is.na() is
 unnecessarily slow; is.na(sum(x)) is much faster than any(is.na(x)) on
 a vector without NAs.  On the other hand, is.na(sum(x)) becomes
 awfully slow if 'x' contains NAs.

>>> I don't think  it is unnecessarily slow.  It has to dispatch methods and it 
>>> has to make sure that matrix structure is preserved.  After that the code 
>>> is just
>>>
>>>  case REALSXP:
>>>  for (i = 0; i < n; i++)
>>>  LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
>>>  break;
>>>
>>> and it's hard to see how that can be improved. It does suggest that a 
>>> faster anyNA() function would have to not be generic.
>> If it's necessary to make it not generic to achieve the speedup, I don't 
>> think it's worth doing.  If anyNA is written not to be generic I'd guess a 
>> very common error will be to apply it to a dataframe and get a misleading 
>> "FALSE" answer.  If we do that, I predict that the total amount of r-help 
>> time wasted on it will exceed the CPU time saved by orders of magnitude.
>>
> 
> I wasn't proposing that it should be stupid, just not generic.  It could 
> support data frames (sum(), does, for example). If it didn't support data 
> frames it should certainly give an error rather than the wrong answer, but 
> if we are seriously trying to avoid delays around 0.1 seconds then going 
> through the generic function mechanism may be a problem.

If it's not dataframes, it will be something else.  I think it's highly 
desirable that any(is.na(x)) == anyNA(x) within base packages, and we 
should make it straightforward to maintain this identity in contributed 
packages.

By the way, I think Bill's suggestion of calling it anyMissing makes a 
lot of sense.

Duncan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] eapply() fails on baseenv() (PR#8761)

2006-04-10 Thread murdoch
eapply() works on most environments, but not on baseenv().  For example,

 > x <- 1
 > eapply(globalenv(), function(x) x)
$x
[1] 1

 > eapply(baseenv(), function(x) x)
list()

I'm probably not going to have time to work on this before 2.3.0, but I 
don't think it's really urgent; if no one else fixes it first I'll do it 
after the release.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] install.packages on unix / su (PR#8760)

2006-04-10 Thread ripley
On Mon, 10 Apr 2006, Thomas Friedrichsmeier wrote:

>> | Wishlist item:
>> |
>> | There is a small problem using intall.packages() (and update.packages()):
>> | Typically I want to install packages for system-wide use, not in a user
>> | directory. Obviously this does not work without superuser rights.

[From a reply I started last night.]

Not obvious at all, especially to those of us who do it all the time.
Many of us set up an account to `own' R, and either install under that
account or change the ownership of the library directory to that account.

I think what you suggest is quite dangerous, as different directories may
be visible to the user account producing the summary information and to
root.  Then update.packages() (run by you) and R CMD INSTALL (run by root) 
may do different things.  This could apply both within a library directory 
(root might have installed a later version of a package not readable by 
you) and over different library trees (my personal R library is not 
readable by root, and indeed the main R library tree is not readable by 
root on our student's machines).

Quoting someone else (without attribution, a breach of copyright)

>> One can see this problem as a local system management issue for which
>> another possible answer is to add you (and/or the user users installing R
>> packages) to, say, group 'admin' and to make /usr/local/lib/R of group
>> admin and group-writeable.  Or create a custom group radmin. Or ...
>
> It's about convenience, no more, no less, and so it's a wishlist item, no
> more, and no less.
> I don't think the case of a non-root user working on a de-facto single user
> system is too uncommon on linux. It's why tools like kdesu exist in the first
> place. Unless there are strong reasons not to (and there may well be), I
> think adding some convenience option for this particular case may well be
> worth while.

See the `strong reason' above.  Two of us have suggested better solutions.
If you want yours, you can of course patch your installation, the beauty 
of Open Source.  But unless you can find an R-core member who is prepared 
to maintain your solution, it will not be going into R.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel