Re: [Rd] operators as S4 methods

2005-06-14 Thread Martin Maechler
For arithmetic operators,
the most elegant way often is to define so called `group methods'
for the whole group of arithmetic operators.

This actually applies also applies to the old classes and
methods.

One example where we do this is the 'Matrix' package,
see the source, e.g., in 
https://svn.r-project.org/R-packages/Matrix/R/dMatrix.R

Note that for a namespaced package, on also needs to import the
group generics from the 'methods' package in NAMESPACE:

## Currently, group generics need to be explicitly imported:
importFrom("methods", Arith, Compare, Math, Math2, Summary, Complex)


Martin Maechler, ETH Zurich


>>>>> "Iago" == Iago Mosqueira <[EMAIL PROTECTED]>
>>>>> on Tue, 14 Jun 2005 09:23:40 +0200 writes:

Iago> Dear all,
Iago> I need to re-define some mathematical operators (+, *, /, etc) for an 
S4
Iago> class based on array. All references I have found (S Programming, 
Green
Iago> Book) show how to define S3 methods for this (like in page 89 of S
Iago> Programming for "-.polynomial"). What is the preferred S4 way for 
doing
Iago> this? I hope I haven't missed some obvious piece of documentation.

Iago> Many thanks,

 
Iago> Iago Mosqueira

Iago> __
Iago> R-devel@r-project.org mailing list
Iago> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Open device -> glibc 2.3.4 bug for Redhat Enterprise 4?

2005-06-21 Thread Martin Maechler
We have been using Redhat Enterprise 4, on some of our Linux
clients for a while,
and Christoph has just found that opening an R device for a file
without write permission gives a bad glibc error and subsequent
seg.fault:

> postscript("/blabla.ps")
*** glibc detected *** double free or corruption (!prev): 0x01505f10 ***

or

> xfig("/blabla.fig")
*** glibc detected *** double free or corruption (!prev): 0x01505f10 ***

and similar for pdf();
does not happen for jpeg() {which runs via x11},
nor e.g. for 

> sink("/bla.txt")

---

Happens both on 32-bit (Pentium) and 64-bit (AMD Athlon)
machines with the following libc :

32-bit:
  -rwxr-xr-x  1 root root 1451681 May 13 00:17 /lib/tls/libc-2.3.4.so*
64-bit:
  -rwxr-xr-x  1 root root 1490956 May 12 23:26 /lib64/tls/libc-2.3.4.so*

---

Can anyone reproduce this problem?

Regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Open device -> glibc 2.3.4 bug for Redhat Enterprise 4?

2005-06-21 Thread Martin Maechler
>>>>> "TL" == Thomas Lumley <[EMAIL PROTECTED]>
>>>>> on Tue, 21 Jun 2005 09:59:31 -0700 (PDT) writes:

TL> This was supposed to be fixed in 2.1.1 -- which version are you using?

2.1.1 -- and 2.1.0 and 2.0.0 all showed the problem.

But thanks, Thomas, looking in "NEWS" of R-devel showed that
there was a fix for this in R-devel only --- too bad it didn't
make it for R 2.1.1.

And yes, the seg.fault doesn't happen in my version of R-devel.

Further note that only the very recent libc produces the
segfault for us.  Earlier versions, including the libc-2.3.2
used in our Debian sid (on AMD Opteron), do give the correct
error message instead of the seg.fault.


TL> On Tue, 21 Jun 2005, Martin Maechler wrote:

>> We have been using Redhat Enterprise 4, on some of our Linux
>> clients for a while,
>> and Christoph has just found that opening an R device for a file
>> without write permission gives a bad glibc error and subsequent
>> seg.fault:
>> 
>>> postscript("/blabla.ps")
>> *** glibc detected *** double free or corruption (!prev): 
0x01505f10 ***
>> 
>> or
>> 
>>> xfig("/blabla.fig")
>> *** glibc detected *** double free or corruption (!prev): 
0x01505f10 ***
>> 
>> and similar for pdf();
>> does not happen for jpeg() {which runs via x11},
>> nor e.g. for
>> 
>>> sink("/bla.txt")
>> 
>> ---
>> 
>> Happens both on 32-bit (Pentium) and 64-bit (AMD Athlon)
>> machines with the following libc :
>> 
>> 32-bit:
>> -rwxr-xr-x  1 root root 1451681 May 13 00:17 /lib/tls/libc-2.3.4.so*
>> 64-bit:
>> -rwxr-xr-x  1 root root 1490956 May 12 23:26 /lib64/tls/libc-2.3.4.so*
>> 
>> ---
>> 
>> Can anyone reproduce this problem?
>> 
>> Regards,
>> Martin
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 

TL> __
TL> R-devel@r-project.org mailing list
TL> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Trouble with ifelse and if statement (PR#7962)

2005-06-22 Thread Martin Maechler
   
   

Marc> This is not a bug and yes you have missed something.

Marc> Read R FAQ 7.31 Why doesn't R think these numbers are equal?

Marc> More information is also available here:

Marc> http://grouper.ieee.org/groups/754/

thank you, Marc.

Marc> One possible solution:

>> i
Marc> [1] 0.08 0.00 0.33 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.08 0.08 0.20
Marc> [14] 0.00 0.13

Note that a slightly more recommended way for the following is

  as.integer(sapply(i, function(x) isTRUE(all.equal(x, 0.33

using theisTRUE(all.equal(...))  idiom 
which I'd recommend quite generally.

Martin

>> ifelse(sapply(i, function(x) all.equal(x, 0.33)) == "TRUE", 1, 0)
Marc> [1] 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0

>> ifelse(sapply(i, function(x) all.equal(x, 0.08)) == "TRUE", 1, 0)
Marc> [1] 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0

>> ifelse(sapply(i, function(x) all.equal(x, 0.2)) == "TRUE", 1, 0)
Marc> [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion for the R Bugs web page

2005-06-22 Thread Martin Maechler
Thank you, Marc, for your suggestion.

> "Marc" == Marc Schwartz <[EMAIL PROTECTED]>
> on Wed, 22 Jun 2005 10:15:00 -0500 writes:

Marc> Hi all,
Marc> I would like to recommend that the following text from the R Posting
Marc> Guide be placed on the R Bug submission page in the section "Submit 
New
Marc> Reports", which would read as follows:

Marc> Submit New Reports

Marc> You can submit new bug reports either using an online form by clicking
Marc> on the button below or by sending email to [EMAIL PROTECTED] 

actually, nobody should advertize that e-mail (but maybe those
at ku.dk, when they talk about it inside DK), 
but rather  [EMAIL PROTECTED] .

The advantage of the latter is its "genericity" and the fact
that mails are filtered a bit more.

Marc> Before you post a real bug report, make sure you read R Bugs in the
Marc> R-faq. If you're not completely and utterly sure something is a bug,
Marc> post a question to r-help, not a bug report to r-bugs - every bug 
report
Marc> requires manual action by one of the R-core members.

Marc> If you wish to comment upon an existing report, you cannot do that via
Marc> the web interface. Instead send an email to the above address with the
Marc> Subject: header containing (PR#999) -- replace 999 with actual report
Marc> number, of course.



Marc> Perhaps reading that brief middle section, without having to click to
Marc> another page, will help to reduce user error reports going to R Bugs 
and
Marc> save members of R Core some time.

Note that we (well, primarily Peter Dalgaard) have considered
complete changes to the R-bugs "system" anyway some of which
would obliterate the e-mail interface completely IIRC.


Marc> Also, as a quick pointer, I noted that there is a repeated word 
("for")
Marc> on the R Home Page in the "Getting Started" box:

> R is a free software environment _for for_ statistical computing and
> graphics. It compiles and runs on a wide variety of UNIX platforms,
> Windows and MacOS. To download R, please choose your preferred CRAN
> mirror.

I've fixed that one --- haven't checked for how many months this
has remained unreported

Thank you, Marc!
Martin

Marc> Best regards,

Marc> Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] efficiency of sample() with prob.

2005-06-24 Thread Martin Maechler
>>>>> "Bo" == Bo Peng <[EMAIL PROTECTED]>
>>>>> on Fri, 24 Jun 2005 10:32:45 -0500 writes:

Bo> On 6/24/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
>> `Research' involves looking at all the competitor methods, devising a
>> near-optimal strategy and selecting amongst methods according to that
>> strategy.  It is not a quick fix we are looking for but something that
>> will be good for the long term.

Bo> I am sorry but I am afraid that I do not have enough time and
Bo> background knowledge
Bo> to do a thorough research in this area.

which I think is well understandable.

Bo> I have tried bisection search method and the alias
Bo> method, the latter has greatly improved the performance
Bo> of my bioinformatics application. Since this method is
Bo> the only one mentioned in Knuth's book, I have no idea
Bo> about other alternatives.

I think you've also explored the space of possible inputs a bit
and have suggested that the alias method was "uniformly" better
than the current one, i.e. always better, sometimes only
slightly but sometimes considerably (and never worse).  
If this (uniform improvement) can be ``proven'' in some way,
{and that maybe a considerable "if", I haven't started to go in there}
and because the algorithm is relatively simple {i.e., there's
not much code added to the current one},
I'd think that we (R-core) should incorporate the algorithm for
the time being, until someone has time for the ``real research''
and provide even better algorithm(s).
I don't see why the phrase 
   "the good is the enemy of the better" should apply in this
situation.  

Martin Maechler, ETH Zurich


Bo> Attached is a slightly improved version of the alias method.

(deleted for this reply).

Bo> It may be helpful to people having similar problems.

Bo> Thanks.


Bo> --
Bo> Bo Peng
Bo> Department of Statistics
Bo> Rice University.
Bo> http://bp6.stat.rice.edu:8080/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] (PR#7976) split() dropping levels (was "boxplot by factor")

2005-07-01 Thread Martin Maechler
> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
> on 28 Jun 2005 14:57:42 +0200 writes:

PD> "Liaw, Andy" <[EMAIL PROTECTED]> writes:
>> The issue is not with boxplot, but with split.  boxplot.formula() 
>> calls boxplot(split(split(mf[[response]], mf[-response]), ...), 
>> but look at what split() returns when there are empty levels in
>> the factor:
>> 
>> > f <- factor(gl(3, 6), levels=1:5)
>> > y <- rnorm(f)
>> > split(y, f)
>> $"1"
>> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
>> 
>> $"2"
>> [1] -1.1296642 -0.4808355 -0.2789933  0.1220718  0.1287742 -0.7573801
>> 
>> $"3"
>> [1]  1.2320902  0.5090700 -1.5508074  2.1373780  1.1681297 -0.7151561
>> 
>> The "culprit" is the following in split.default():
>> 
>> f <- factor(f)
>> 
>> which drops empty levels in f, if there are any.  BTW, ?split doesn't
>> mention what it does in such situation.  Perhaps it should?
>> 
>> If this is to be "fixed", I suppose an additional argument, e.g.,
>> drop=TRUE, can be added, and the corresponding line mentioned
>> above changed to something like:
>> 
>> if (drop || !is.factor(f)) f <- factor(f)
>> 
>> Then this additional argument can be pass on from boxplot.formula() to 
>> split().

PD> Alternatively, I suspect that the intention was as.factor() rather
PD> than factor(). 

at first I thought Peter was right; but the real source of
split.default contains a comment (!) and that line is

f <- factor(f) # drop extraneous levels

so it seems, this was done there very much on purpose.
OTOH, S(-plus) has implemented it quite a bit differently, and actually
does keep the empty levels in the example

  f <- factor(rep(1:3, each=6), levels=1:5); y <- rnorm(f); split(y, f)

PD> It does require a bit of care to fix it that way,
PD> though. There could be problems with empty levels popping up in
PD> unexpected places. 

Indeed!
Given the new facts, I think we want to go in Andy's direction
with a new argument, 'drop'

A Peter mentioned, the real question is about its default.
"drop = TRUE"   would be fully compatible with previous versions of R.
"drop = FALSE"  would be compatible with S and S-plus.

I'm going to implement it, and try to see if 'drop = FALSE'
gives changes for R and its standard packages;  if 'yes', that
would be an indication that such a R-back-compatibility breaking
change was not a good idea.  If 'no', I could commit it and see
if it has an effect on the CRAN packages

Of course, since split() and split()<- are S3 generics, and
since there's also unsplit(),  this entails a whole slew of
changes {adding a "drop = FALSE" argument everywhere!}
and I presume will break everyone's code who has written own
split.foobar methods

great...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#7976) split() dropping levels (was "boxplot by factor")

2005-07-04 Thread Martin Maechler

[ Hmm, is everyone of those interested in changes inside R "sleeping" ,
  uninterested, ... 
]

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Fri, 1 Jul 2005 18:36:54 +0200 writes:

>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 28 Jun 2005 14:57:42 +0200 writes:

PD> "Liaw, Andy" <[EMAIL PROTECTED]> writes:
>>> The issue is not with boxplot, but with split.  boxplot.formula() 
>>> calls boxplot(split(split(mf[[response]], mf[-response]), ...), 
>>> but look at what split() returns when there are empty levels in
>>> the factor:
>>> 
>>> > f <- factor(gl(3, 6), levels=1:5)
>>> > y <- rnorm(f)
>>> > split(y, f)
>>> $"1"
>>> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
>>> 
>>> $"2"
>>> [1] -1.1296642 -0.4808355 -0.2789933  0.1220718  0.1287742 -0.7573801
>>> 
>>> $"3"
>>> [1]  1.2320902  0.5090700 -1.5508074  2.1373780  1.1681297 -0.7151561
>>> 
>>> The "culprit" is the following in split.default():
>>> 
>>> f <- factor(f)
>>> 
>>> which drops empty levels in f, if there are any.  BTW, ?split doesn't
>>> mention what it does in such situation.  Perhaps it should?
>>> 
>>> If this is to be "fixed", I suppose an additional argument, e.g.,
>>> drop=TRUE, can be added, and the corresponding line mentioned
>>> above changed to something like:
>>> 
>>> if (drop || !is.factor(f)) f <- factor(f)
>>> 
>>> Then this additional argument can be pass on from boxplot.formula() to 
>>> split().

PD> Alternatively, I suspect that the intention was as.factor() rather
PD> than factor(). 

MM> at first I thought Peter was right; but the real source of
MM> split.default contains a comment (!) and that line is

MM> f <- factor(f) # drop extraneous levels

MM> so it seems, this was done there very much on purpose.
MM> OTOH, S(-plus) has implemented it quite a bit differently, and actually
MM> does keep the empty levels in the example

MM> f <- factor(rep(1:3, each=6), levels=1:5); y <- rnorm(f); split(y, f)

PD> It does require a bit of care to fix it that way,
PD> though. There could be problems with empty levels popping up in
PD> unexpected places. 

MM> Indeed!
MM> Given the new facts, I think we want to go in Andy's direction
MM> with a new argument, 'drop'

MM> A Peter mentioned, the real question is about its default.
MM> "drop = TRUE"   would be fully compatible with previous versions of R.
MM> "drop = FALSE"  would be compatible with S and S-plus.

MM> I'm going to implement it, and try to see if 'drop = FALSE'
MM> gives changes for R and its standard packages;  if 'yes', that
MM> would be an indication that such a R-back-compatibility breaking
MM> change was not a good idea.  If 'no', I could commit it and see
MM> if it has an effect on the CRAN packages

MM> Of course, since split() and split()<- are S3 generics, and
MM> since there's also unsplit(),  this entails a whole slew of
MM> changes {adding a "drop = FALSE" argument everywhere!}
MM> and I presume will break everyone's code who has written own
MM> split.foobar methods

MM> great...

MM> Martin

The change doesn't seem to affect the "standard" packages at all
which is good.  On CRAN, it seems there are two packages only that
have  split() or split()<-  methods,  namely 'spatstat' and 'compositions'.

If we introduced the extra argument 'drop', 
these and every other user code defining split methods would
have to be updated to be compatible with the changed (S3)
generic having an extra argument 'drop'.

With this in mind, after more thought, I think that Peter's
initial proposal ---just replacing 'factor()' by 'as.factor()'
inside split--- seems to be nicer than introducing 'drop' and
*change* the default behavior to  'drop = FALSE' for the
following reasons : 

1) people who rely on the current behavior would have to change
   their calls to split() anyway;

2) instead of calling  
   split(x, f, drop=TRUE)
   they can as well go for
   split(x, factor(f)) 
   which has identical effect but does not introduce an extra
   argument 'drop'.

3) advantage of slightly higher compatibility with S

---

I intend to change this in R-devel
{with appropriate notes in NEWS !} during this week, unless
someone finds good reasons for a different (or no) change.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] .Rbuildignore {was: ... upgrading an R (WINDOWS) installation ..}

2005-07-06 Thread Martin Maechler
> "Gabor" == Gabor Grothendieck <[EMAIL PROTECTED]>
> on Wed, 6 Jul 2005 08:24:49 -0400 writes:

  ...
  ...

Gabor> I have cleaned up my batch files (somewhat) and posted them to 
Gabor> CRAN. See my recent post:
Gabor> https://www.stat.math.ethz.ch/pipermail/r-help/2005-July/073400.html

Gabor> If any of this functionality could migrate to R
Gabor> itself that would be great.




Gabor> 2. Also if Rcmd CHECK and Rcmd INSTALL were to
Gabor> process .Rbuildignore like Rcmd BUILD does then
Gabor> makepkg.bat would not have to do a build first.

No!  {We have been here before, and I had explained before that}
this is really undesired:  ".Rbuildignore" should contain what is
ignored by build, but not by "check".
It does make sense to have extra code and / or checks for 'R CMD check'
that I as package developer want to run, but that are
  -- too time consuming
  -- too platform specific
  -- ..
to be run during the daily checks on CRAN (e.g.) /
to be run by others at all.

{And BTW, AFAIK,  'Rcmd' is now `somewhat deprecated' in favor
 of "R CMD" since the latter is portable }

--
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qgamma with first argument 1

2005-07-11 Thread Martin Maechler
Thank you, Mr Hosking,

yes, this is a buglet.

I had started to fix this (and similar cases) 
--  using a C Macro based approach in src/nmath/dpq.h  --

For this reason, the fix will probably only appear in R-devel
i.e., from R-2.2.0 on, and not yet in [R 2.1.1]-patched.

Regards,
Martin Maechler, ETH Zurich

>>>>> "J" == J Hosking <[EMAIL PROTECTED]>
>>>>> on Sat, 09 Jul 2005 15:27:00 -0400 writes:

>> qgamma(1,1.1)
J> [1] Inf

J> as expected.  But for shape parameter between 0.16 and 1:

>> qgamma(1,.5)
J> [1] NaN
J> Warning message:
J> NaNs produced in: qgamma(p, shape, scale, lower.tail, log.p)

J> and for shape parameter 0.16 or less:

>> qgamma(1,.1)
J> [1] 99.42507

J> Arguments close to 1 give approximately correct results:

>> qgamma(1-1e-15,.5)
J> [1] 32.05055
>> qgamma(1-1e-15,.1)
J> [1] 28.8129

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#7976) split() dropping levels (was "boxplot by factor")

2005-07-13 Thread Martin Maechler
I have now committed the new split(x, f, drop = FALSE)
to R-devel --- entailing non-backward compatible behavior,
but consistency with factor indexing (and with S-plus) ---
split() and "split<-" and unsplit() functions and methods to
R-devel.

This does automatically fix the original posters "boxplot by
factor" bug.


>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Mon, 4 Jul 2005 09:15:59 +0200 writes:


>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 28 Jun 2005 14:57:42 +0200 writes:

PD> "Liaw, Andy" <[EMAIL PROTECTED]> writes:
>>>> The issue is not with boxplot, but with split.  boxplot.formula() 
>>>> calls boxplot(split(split(mf[[response]], mf[-response]), ...), 
>>>> but look at what split() returns when there are empty levels in
>>>> the factor:
>>>> 
>>>> > f <- factor(gl(3, 6), levels=1:5)
>>>> > y <- rnorm(f)
>>>> > split(y, f)
>>>> $"1"
>>>> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
>>>> 
>>>> $"2"
>>>> [1] -1.1296642 -0.4808355 -0.2789933  0.1220718  0.1287742 -0.7573801
>>>> 
>>>> $"3"
>>>> [1]  1.2320902  0.5090700 -1.5508074  2.1373780  1.1681297 -0.7151561
>>>> 
>>>> The "culprit" is the following in split.default():
>>>> 
>>>> f <- factor(f)
>>>> 
>>>> which drops empty levels in f, if there are any.  BTW, ?split doesn't
>>>> mention what it does in such situation.  Perhaps it should?
>>>> 
>>>> If this is to be "fixed", I suppose an additional argument, e.g.,
>>>> drop=TRUE, can be added, and the corresponding line mentioned
>>>> above changed to something like:
>>>> 
>>>> if (drop || !is.factor(f)) f <- factor(f)
>>>> 
>>>> Then this additional argument can be pass on from boxplot.formula() to 
>>>> split().

PD> Alternatively, I suspect that the intention was as.factor() rather
PD> than factor(). 

MM> at first I thought Peter was right; but the real source of
MM> split.default contains a comment (!) and that line is

MM> f <- factor(f) # drop extraneous levels

MM> so it seems, this was done there very much on purpose.
MM> OTOH, S(-plus) has implemented it quite a bit differently, and actually
MM> does keep the empty levels in the example

MM> f <- factor(rep(1:3, each=6), levels=1:5); y <- rnorm(f); split(y, f)

PD> It does require a bit of care to fix it that way,
PD> though. There could be problems with empty levels popping up in
PD> unexpected places. 

MM> Indeed!
MM> Given the new facts, I think we want to go in Andy's direction
MM> with a new argument, 'drop'

MM> A Peter mentioned, the real question is about its default.
MM> "drop = TRUE"   would be fully compatible with previous versions of R.
MM> "drop = FALSE"  would be compatible with S and S-plus.

MM> I'm going to implement it, and try to see if 'drop = FALSE'
MM> gives changes for R and its standard packages;  if 'yes', that
MM> would be an indication that such a R-back-compatibility breaking
MM> change was not a good idea.  If 'no', I could commit it and see
MM> if it has an effect on the CRAN packages

MM> Of course, since split() and split()<- are S3 generics, and
MM> since there's also unsplit(),  this entails a whole slew of
MM> changes {adding a "drop = FALSE" argument everywhere!}
MM> and I presume will break everyone's code who has written own
MM> split.foobar methods

MM> great...

MM> Martin

MM> The change doesn't seem to affect the "standard" packages at all
MM> which is good.  On CRAN, it seems there are two packages only that
MM> have  split() or split()<-  methods,  namely 'spatstat' and 
'compositions'.

MM> If we introduced the extra argument 'drop', 
MM> these and every other user code defining split methods would
MM> have to be updated to be compatible with the changed (S3)
MM> generic having an extra argument 'drop'.

MM> With this in mind, after more tho

Re: [Rd] R v2.1.0 patched (>2005-05-09) for Windows?

2005-07-15 Thread Martin Maechler
> "HenrikB" == Henrik Bengtsson <[EMAIL PROTECTED]>
> on Fri, 15 Jul 2005 10:01:05 +0200 writes:

HenrikB> I'm trying to troubleshoot a case where R crashes on Windows.  It 
does 
HenrikB> not occur at all with my R v2.1.0 patched (2005-05-09), but 
happens on R 
HenrikB> v2.1.1 (patched or non-patched) in many different cases.  The R 
HenrikB> v2.2.0dev (2005-07-15) also got this problem (although it won't 
crash on 
HenrikB> the below example).  I previously reported this 
HenrikB> (https://stat.ethz.ch/pipermail/r-devel/2005-June/033772.html) and 
HenrikB> Duncan Murdoch kindly offered to look into the problem, but it is 
HenrikB> tricky.  Now I would like to track down in what patched R v2.1.0 
the 
HenrikB> problem first occurs and are now looking for reports from newer 
version, 
HenrikB> but pre-Rv2.1.1.


HenrikB> If you've got R v2.1.0 patched for Windows *after 2005-05-09*, 
could you 
HenrikB> please try the following in that version of R?

HenrikB> install.packages("R.oo")
HenrikB> library(R.oo)
HenrikB> author <- "dummy"
HenrikB> rdocFile <- system.file("misc", "Exception.R", package="R.oo")
HenrikB> cat("# Empty example code\n", file="Exception.Rex")
HenrikB> Rdoc$compile(rdocFile, destPath=tempdir())
HenrikB> print("successful!")

HenrikB> If you see "successful!", that version is "ok", otherwise R will 
crash 
HenrikB> (or alternatively incorrectly complain about an invalid regular 
HenrikB> expression; rerun and it will crash the 2nd time).  I would 
appreciate a 
HenrikB> lot if you report to me what you get and what is your version of 
R? 
HenrikB> Thanks a lot!


HenrikB> Note that this is most likely *not* due to R.oo (no
HenrikB> native code) - my wild guess is that it has to do
HenrikB> with a memory leak in the code for environments or
HenrikB> regular expressions.

but why would that only affect Windows ??

I've tried your example code also in Linux, and indeed I do see
quite some memory growth of the R process, particularly if I run

   for(i in 1:40) Rdoc$compile(rdocFile, destPath=tempdir())
   ## which takes a few minutes

my R process size grows considerably (50% - 100% depending on
the measure I use in 'ps').
So I can confirm that your guess about memory leakage {or
something close} seems quite on target.
But please don't ask me to dig further here - not for the time
being, at least.

BTW, I get 10 warnings, both in R 2.1.0 and in 2.1.1 patched
(see below) --- but that's probably something not really
relevant here.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Computer algebra in R - would that be an idea??

2005-07-15 Thread Martin Maechler
>>>>> "bry" == bry  <[EMAIL PROTECTED]>
>>>>> on Fri, 15 Jul 2005 14:16:46 +0200 writes:

bry> About a year ago there was a discussion about interfacing R with J on 
the J
bry> forum, the best method seemed to be that outlined in this vector 
article 
bry> http://www.vector.org.uk/archive/v194/finn194.htm

(which is interesting to see for me,
 if I had known that my posted functions would make it to an APL
 workshop... 
 BTW: Does one need special plugins / fonts to properly view
 the APL symbols ? )


bry> and use J instead of APL

bry> http://www.jsoftware.com

well, I've learned about J as the ASCII-variant of APL, and APL
used to be my first `beloved' computer language (in high school!)
-- but does J really provide computer algebra in the sense of
Maxima , Maple or yacas... ??

(and no, please refrain from flame wars about APL vs .. vs ..,
 it's hard to refrain for me, too...)

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] help.search of precedence is empty

2005-07-15 Thread Martin Maechler
> "PatBurns" == Patrick Burns <[EMAIL PROTECTED]>
> on Fri, 15 Jul 2005 14:58:14 +0100 writes:

PatBurns> Doing
PatBurns> help.search('precedence')

PatBurns> comes up empty.  A fix would be to have the title:

PatBurns> Operator Syntax and Precedence

PatBurns> instead of

PatBurns> Operator Syntax

very good idea.
Where as in general one should rather use a 
\concept{...}
entry in order to make the page searchable for new `concepts',
in the present case, adding the "and Precedence" seems more
natural and I've just done it to R-devel.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S4 generics "missing"

2005-07-15 Thread Martin Maechler
Thank you, Lars, for persisting on this topic of
missing S4 generics!
[I'm diverting this away from the original bug report; 
 since this is really about future features rather than R bugs.]

Unfortunately it's not as easy as you might think. 
 
o  One reason is the fact that currently it is still necessary 
   for bootstrapping reasons that R "can run" with only the 'base'
   package loaded, i.e. "methods" not available.  This makes
   it sometimes hard to S4-ize base functions. 

o  Then there have been performance issues hindering the
   S4-ification of ``everything''.  
   Note that e.g. S4 classes , functions methods, etc are not really
   proper C-level SEXPs which would speed up (and lead to clean
   up) of the "methods" code base.

o  One other problem is with function like  cbind() which have
   a signature starting with '...' :  The need to be changed
   before you can define methods for them.
   
   We've recently started to discuss this issue within R core,
   with constructive proposals by John Chambers, and I have been
   strongly considering indeed to try the case of
   cbind() and rbind() in particular.

I also hope that some of these issues will be addressed during
this summer and will eventually lead to much improved S4
facilities in R.

Martin Maechler, ETH Zurich


>>>>> "lars" == lars  <[EMAIL PROTECTED]>
>>>>> on Fri, 15 Jul 2005 01:26:50 +0200 (CEST) writes:

lars> Hi,
lars> I ran into another internal function that is missing S4 dispatch. It 
is 
lars> the binary operator ":". Looking at the code, I see that it is 
actually 
lars> a common problem. Other candidates are operators like "~", "&&", "||" 
lars> and functions like: "length<-", "row", "col", "unlist", "cbind", etc. 
It 
lars> would for instance be nice to be able to write a matrix class that 
has 
lars> the same operators and functions as the built-in class. In general, I 
lars> think that all the operators and functions associates with built-in 
lars> types like vectors, lists, matrices and data frames should have S4 
dispatch.

lars> Thanks,
lars> Lars


lars> lars wrote:

>> Hi,
>> 
>> OK, if you try to explicitly make them generic, you are told that they 
>> are implicitly already generic:
>> 
>> > setGeneric("is.finite", function(from, ...) 
>> standardGeneric("is.finite"))
>> Error in setGeneric("is.finite", function(from, ...) 
>> standardGeneric("is.finite")) :
>> "is.finite" is a primitive function;  methods can be defined, but 
>> the generic function is implicit, and can't be changed.
>> 
>> If you query about its genericness before you define you own generic, 
>> you get:
>> 
>> > isGeneric("is.finite")
>> [1] FALSE
>> 
>> But after you define you own generic, you get:
>> 
>> > setMethod("is.finite", signature(x="TS"),
>> +   function(x) {
>> +  Data(x) = callNextMethod()
>> +  x
>> +   })
>> [1] "is.finite"
>> 
>> > isGeneric("is.finite")
>> [1] TRUE
>> 
>> This all makes some sense, but I am not familiar enough with he 
>> internals to explain exactly why it is done this way. I think you will 
>> fine that 'is.nan' behave exactly the same way.
>> 
>> Thanks,
>> Lars
>> 
>> 
>> Prof Brian Ripley wrote:
>> 
>>> These functions are not generic according to the help page.
>>> The same page says explicitly that is.nan is generic.
>>> 
>>> Where did you get the (false) idea that they were generic?
>>> 
>>> On Thu, 16 Jun 2005 [EMAIL PROTECTED] wrote:
>>> 
>>>> Full_Name: Lars Hansen
>>>> Version: 2.1.0
>>>> OS: SunOS 5.8
>>>> Submission from: (NULL) (207.66.36.189)
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> S4 method displacth does not work for the two generic functions 
>>>> 'is.finite' and 'is.infinite'. It turns out that the C functions 
>>>> 'do_isfinite' and 'do_isinfinite' in src/main/coerce.c are missing a 
>>>> call to 'DispatchOrEva

Re: [Rd] Most accurate timing?

2005-08-01 Thread Martin Maechler
>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
>>>>> on Mon, 01 Aug 2005 08:48:39 -0400 writes:

Duncan> For a graphics display, I'd like a high resolution
Duncan> timer, something like Sys.time(), but it is only
Duncan> accurate to a second.  Is there a clock in R that
Duncan> gives a finer value?

Why can't use  proc.time()  ?

It's help file says

 The resolution of the times will be system-specific; it is
 common for them to be recorded to of the order of 1/100
 second, and elapsed time is rounded to the nearest 1/100.

Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check failing to warn when it should

2005-08-05 Thread Martin Maechler
> "DeepS" == Deepayan Sarkar <[EMAIL PROTECTED]>
> on Wed, 3 Aug 2005 13:52:32 -0500 writes:

DeepS> Hi, I recently made changes to lattice code which
DeepS> needed changes in many man pages as well. Before I
DeepS> made the appropriate changes, R CMD check was
DeepS> flagging most of the problems correctly, except for
DeepS> the man page for tmd. I have created a toy package
DeepS> that shows this, available at

DeepS> http://www.stat.wisc.edu/~deepayan/R/tmdprob_0.12-2.tar.gz

DeepS> This passes R CMD check on R 2.1.0 and r-devel from
DeepS> August 1, but it shouldn't because the code and
DeepS> documentation are inconsistent.

It's because you use \synopsis{}.  
This basically breaks all 'codoc' checking and is the reason
we (mainly Kurt, but I completely agree with him) have been
thinking about deprecating its use -- possibly using a
substitute for the few cases that might need something like it.

Kurt and I (at least) would very strongly advocate not to use
\synopsis{}, and hence writing functions and methods in a way
that can be well documented with exact \usage{}.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] object.size() bug?

2005-08-05 Thread Martin Maechler
>>>>> "Paul" == Paul Roebuck <[EMAIL PROTECTED]>
>>>>> on Thu, 4 Aug 2005 00:29:03 -0500 (CDT) writes:

Paul> Can someone confirm the following as a problem:

Yes, I can.  No promiss for a fix in the very near future
though.

Martin Maechler, ETH Zurich

>> Can someone confirm the following as a problem:
>> 
>> R> setClass("Foo", representation(.handle = "externalptr"))
>> R> object.size(new("Foo"))
>> Error in object.size(new("Foo")) : object.size: unknown type 22
>> R> R.version.string
>> [1] "R version 2.1.1, 2005-06-20"
>> 
>> R-2.1.1/src/include/Rinternals.h
>> #define EXTPTRSXP   22/* external pointer */
>> 
>> R-2.1.1/src/main/size.c:
>> objectsize(SEXP s) has no case for external pointers

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem using model.frame()

2005-08-17 Thread Martin Maechler
> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> on Tue, 16 Aug 2005 18:44:23 +0100 writes:

GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
GS> wrote:
>> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
>> Grothendieck wrote: > > It can handle data frames like
>> this:
>> > >
>> > > model.frame(y1) > > or > > model.frame(~., y1)
>> > 
>> > Thanks Gabor,
>> > 
>> > Yes, I know that works, but I want the function
>> coca.formula to accept a > formula like this y2 ~ y1,
>> with both y1 and y2 being data frames. It is
>> 
>> The expressions I gave work generally (i.e. lm, glm,
>> ...), not just in model.matrix, so would it be ok if the
>> user just does this?
>> 
>> yourfunction(y2 ~., y1)

GS> Thanks again Gabor for your comments,

GS> I'd prefer the y1 ~ y2 as data frames - as this is the
GS> most natural way of doing things. I'd like to have (y2
GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
GS> work - silently without any trouble.

I'm sorry, Gavin, I tend to disagree quite a bit.

The formula notation has quite a history in the S language, and
AFAIK never was the idea to use data.frames as formula
components, but rather as "environments" in which formula
components are looked up --- exactly as Gabor has explained.

To break with such a deeply rooted principle, 
you should have very very good reasons, because you're breaking
the concepts on which all other uses of formulae are based.
And this would potentially lead to much confusion of your users,
at least in the way they should learn to think about what
formulae mean.

Martin


>> If it really is important to do it the way you describe,
>> are the data frames necessarily numeric? If so you could
>> preprocess your formula by placing as.matrix around all
>> the variables representing data frames using something
>> like this:
>> 
>> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

GS> Yes, they are numeric matrices (as data frames). I've
GS> looked at this, but I'd prefer to not have to do too
GS> much messing with the formula.

>> Of course, if they are necessarily numeric maybe they can
>> be matrices in the first place?

GS> Because read.table etc. produce data.frames and this is
GS> the natural way to work with data in R.

but it is also slightly inefficient if they are numeric.
There are places for data frames and for matrices.

Why should it be a problem to use 
M <- as.matrix(read.table(..))
?

For large files, it could be quite a bit more efficient,
needing a bit more of code, to
use scan() to read the numeric data directly :

  h1 <- scan(..., n=1) ## 
  nc <- length(h1)
  a <- matrix(scan(, what = numeric(), ...),  
  ncol = nc, dimnames = list(NULL, h1))

maybe this would be useful to be packaged into
a small utility with usage

  read.matrix(...,  type = numeric(), ...)  


GS> Following your suggestions, I altered my code to
GS> evaluate the rhs of the formula and check if it was of
GS> class "data.frame". If it is then I stop processing and
GS> return it as a data.frame as this point. If not, it
GS> eventually gets passed on to model.frame() for it to
GS> deal with it.

GS> So far - limited testing - it seems to do what I wanted
GS> all along. I'm sure there's a gotcha in there somewhere
GS> but at least the code runs so I can check for problems
GS> against my examples.

GS> Right, back to writing documentation...

GS> G

>> > more intuitive, to my mind at least for this particular
>> example and > analysis, to specify the formula with a
>> data frame on the rhs.
>> > 
>> > model.frame doesn't work with the formula "~ y1" if the
>> object y1, in > the environment when model.frame
>> evaluates the formula, is a data.frame.  > It works if y1
>> is a matrix, however. I'd like to work around this >
>> problem, say by creating an environment in which y1 is
>> modified to be a > matrix, if possible. Can this be done?
>> > 
>> > At the moment I have something working by grabbing the
>> bits of the > formula and then using get() to grab the
>> named object. Of course, this > won't work if someone
>> wants to use R's formula interface with the > following
>> formula y2 ~ var1 + var2 + var3, data = y1, or to use the
>> > subset argument common to many formula
>> implementations. I'd like to have > the function work in
>> as general a manner as possible, so I'm fishing > around
>> for potential solutions.
>> > 
>> > All the best,
>> > 
>> > Gav
>> > 
>> > >
>> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>> wrote: > > > Hi I'm having a problem with model.frame,
>> encapsulated in this e

[Rd] do.call(): no need for quote {was .. Questions about calls..}

2005-08-23 Thread Martin Maechler
> "Gabor" == Gabor Grothendieck <[EMAIL PROTECTED]>
> on Mon, 22 Aug 2005 18:55:38 -0400 writes:

   ..

Gabor> Try do.call like this:

Gabor> ff <- x ~ g*h 
Gabor> do.call("substitute", list(ff, list(x = as.name("weight"

Just a small remark: For all those who -- like me -- have found
it  ``unpleasant'' to have to quote the first argument of do.call():
You don't have to any longer since the NEWS of R 2.1.0 contains

o   do.call() now takes either a function or a character string as
its first argument.  The supplied arguments can optionally be
quoted.

So the above could be 

   do.call(substitute, list(ff, list(x = as.name("weight"

--
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] weigths in boxplot

2005-08-23 Thread Martin Maechler
>>>>> "Erich" == Erich Neuwirth <[EMAIL PROTECTED]>
>>>>> on Sun, 21 Aug 2005 18:51:20 +0200 writes:

Erich> In R 2.2.0 density now can work with weighted
Erich> obesrvations.  It would be nice if boxplot also would
Erich> accept a weight parameter, then one could produce
Erich> consistent density estimators and boxplots.

Erich> Could the developers consider adding this feature?

The first thing I'd want is  quantile() with weights --- which I
personally find quite interesting and have wanted several times
in the past --- not wanted enough to implement though.

I'm interested to hear of (or even see C or R implementations of)
fast algorithms for "weight quantiles".  
Code contributions are welcome too..

(And yes, I do know that boxplots are base on "hinges" rather than
 quartiles but that's less interesting here.)

Martin Maechler <[EMAIL PROTECTED]> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: +41-44-632-3408  fax: ...-1228   <><

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] histogram method for S4 class.

2005-08-26 Thread Martin Maechler
>>>>> "Ernesto" == Ernesto Jardim <[EMAIL PROTECTED]>
>>>>> on Fri, 26 Aug 2005 10:15:01 +0100 writes:

Ernesto> Deepayan Sarkar wrote:

>> [I'm CC-ing to r-devel, please post follow-ups there]

>> .

>> Deepayan

Ernesto> ...

Ernesto> ps: I'm not a subscriber of r-devel so I guess I'm
Ernesto> not able to post there, 

yes, you can post here without being a subscriber (the same as with R-help) !

Ernesto> anyway I'm CC-ing there too.

{ the idea was that you'd only send this to R-devel, not to
  R-help as well, once Deepayan has diverted it to here }

BTW: I think you *should* rather subscribe to R-devel, if you
  are an R package writer. 
  We (R-core) actually sometimes behave as if R-devel was
  read by all interested R package writers - even though we know
  it's not quite true.

  Since R-devel has much a smaller bandwidth than R-help, and
  you're already willing to read R-help, why don't you subscribe
  to R-devel?


Regards,
Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: "loop connections"

2005-08-27 Thread Martin Maechler
> "David" == David Hinds <[EMAIL PROTECTED]>
> on Mon, 22 Aug 2005 23:34:15 + (UTC) writes:

David> I've just implemented a generalization of R's text connections, to
David> also support reading/writing raw binary data.  There is very little
David> new code to speak of.  For input connections, I wrote code to 
populate
David> the old text connection buffer from a raw vector, and provided a new
David> raw_read() method.  For output connections, I wrote a raw_write() to
David> append to a raw vector.  On input, the mode (text or binary) is
David> determined by the data type of the input object; on output, I use the
David> requested output mode (i.e. "w" / "wb").  For example:

> con <- loopConnection("r", "wb")
> a <- c(10,100,1000)
> writeBin(a, con, size=4)
> r
 [1] 00 00 20 41 00 00 c8 42 00 00 7a 44
> close(con)
> con <- loopConnection(r)
> readBin(con, "double", n=3, size=4)
 [1]   10  100 1000
> close(con)

David> I think "loop connection" is a better name for this
David> sort of connection than "text connection" was even
David> for the old version; that confuses the mode of the
David> connection (text vs binary) with the mechanism (file,
David> socket, etc).

..

In the mean time, I think it has become clear that
"loopconnection" isn't necessarily a better name, and that
textConnection() has been there in "the S litterature" for a
good reason and for quite a while.
Let's forget about the naming and the exact UI for the moment.

I think the main point of David's proposal is still worth
consideration:  One way to see text connections is as a way to
treat some kind of R objects as "generalized files" i.e., connections.
And AFAICS David proposes to enlarge the kind of R objects that
can be dealt with as connections 
  from  {"character"} 
  to{"character", "raw"} 
something which has some appeal to me.
IIUC, Brian Ripley is doubting the potential use for the
proposed generalization, whereas David makes a point of someone
else (the 'caTools' author) having written raw2bin / bin2raw function
for a related use case.

Maybe you can elaborate on the above a bit, David?
In any case, as you might have guessed by now, R-core would have
been more positive to a proposal to generalize current
textConnection() - fully back-compatibly - rather than renaming
it first.

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] generic function argument list problem

2005-08-31 Thread Martin Maechler
>>>>> "Robin" == Robin Hankin <[EMAIL PROTECTED]>
>>>>> on Wed, 31 Aug 2005 08:09:15 +0100 writes:

Robin> Hi it says in R-exts that

1) A method must have all the arguments of the generic,  
   including ... if the generic does.

2) A method must have arguments in exactly the same order as the generic.

3) A method should use the same defaults as the generic.


Robin> So, how come the arguments for rep() are (x, times, ...) and the  
Robin> arguments
Robin> for rep.default() are  (x, times, length.out, each, ...) ?  
Shouldn't  
Robin> these be the same?

no.  If they should be the same, the "R-exts" manual would use
a much shorter formulation than the carefully crafted points
1--3 above!

The point is that methods often have  *extra* arguments
which match the "..." of the generic. 
That's one of the points about "..." !

Martin Maechler, ETH Zurich


Robin> I am writing a rep() method for objects with class "octonion", and
Robin> my function rep.octonion() has argument list (x, times, length.out,  
Robin> each, ...)
Robin> just like rep.default(),   but  R CMD check complains about it, 
pointing
Robin> out that rep() and rep.octonion() have different arguments.

Robin> What do I have to do to my rep.octonion() function to make my package
Robin> pass R CMD check without warning?


Robin> --
Robin> Robin Hankin
Robin> Uncertainty Analyst
Robin> National Oceanography Centre, Southampton
Robin> European Way, Southampton SO14 3ZH, UK
Robin> tel  023-8059-7743

Robin> __
Robin> R-devel@r-project.org mailing list
Robin> https://stat.ethz.ch/mailman/listinfo/r-devel



Robin>  A method must have all the arguments of the
Robin> generic, including ... if the generic does.  A method
Robin> must have arguments in exactly the same order as the
Robin> generic.  A method should use the same defaults as
Robin> the generic.


Robin> So, how come the arguments for rep() are (x, times,
Robin> ...) and the arguments for rep.default() are (x,
Robin> times, length.out, each, ...) ?  Shouldn't these be
Robin> the same?


Robin> I am writing a rep() method for objects with class
Robin> "octonion", and my function rep.octonion() has
Robin> argument list (x, times, length.out, each, ...)  just
Robin> like rep.default(), but R CMD check complains about
Robin> it, pointing out that rep() and rep.octonion() have
Robin> different arguments.

Robin> What do I have to do to my rep.octonion() function to
Robin> make my package pass R CMD check without warning?


Robin> -- Robin Hankin Uncertainty Analyst National
Robin> Oceanography Centre, Southampton European Way,
Robin> Southampton SO14 3ZH, UK tel 023-8059-7743

Robin> __
Robin> R-devel@r-project.org mailing list
Robin> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] generic function argument list problem

2005-08-31 Thread Martin Maechler
> "Robin" == Robin Hankin <[EMAIL PROTECTED]>
> on Wed, 31 Aug 2005 08:09:15 +0100 writes:

  

Robin> I am writing a rep() method for objects with class "octonion", and
Robin> my function rep.octonion() has argument list (x, times, length.out,  
Robin> each, ...)
Robin> just like rep.default(),   but  R CMD check complains about it, 
pointing
Robin> out that rep() and rep.octonion() have different arguments.

Hmm, not exactly, ``like rep.default'', I'm pretty sure.
Why not peek into R's /src/library/base/man/rep.Rd  
which has

\usage{
rep(x, times, \dots)

\method{rep}{default}(x, times, length.out, each, \dots)
}

and definitely passes R CMD check without a warning.

Robin> What do I have to do to my rep.octonion() function to make my package
Robin> pass R CMD check without warning?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why should package.skeleton() fail R CMD check?

2005-08-31 Thread Martin Maechler
>>>>> "Jari" == Jari Oksanen <[EMAIL PROTECTED]>
>>>>> on Wed, 31 Aug 2005 11:58:10 +0300 writes:

Jari> I find it a bit peculiar that a package skeleton created with a utils
Jari> function package.skeleton() fails subsequent R CMD check. I do
Jari> understand that the function is intended to produce only a skeleton 
that
Jari> should be edited by the package author. I think that it would be
Jari> justified to say that the skeleton *should* fail the test. However, I
Jari> have two arguments against intentional failure:

Jari> * When you produce a skeleton, a natural thing is to see if it works 
and
Jari> run R CMD check. It is is baffling (but educating) if this fails.

yes, and the ``but educating'' part has at least kept me from
fixing the problem in the past.
However, I nowadays rather agree with you.

Jari> * The second argument is more major: If you produce a package with
Jari> several functions, you want to edit one Rd file in time to see what
Jari> errors you made. You don't want to correct errors in other Rd files 
not
Jari> yet edited by you to see your own errors. This kind of incremental
Jari> editing is much more pleasant, as following strict R code is painful
Jari> even with your own mistakes.

Jari> The failure comes only from Rd files, and it seems that the violating
Jari> code is produced by prompt.default function hidden in the utils
Jari> namespace. I attach a uniform diff file which shows the minimal set of
Jari> changes I had to do make utils:::prompt.default to produce Rd files
Jari> passing R CMD check. There are still two warnings: one on missing 
source
Jari> files and another on missing keywords, but these are not fatal. This
Jari> still produces bad looking latex. 

both is *desired*; a package author *should* get some urge to
edit the files,  but I now agree that she should only get
warnings, not errors.

Jari> These are the changes I made 

Jari> * I replaced "__description__" with "description", since "__" will 
give
Jari> latex errors. 

Jari> * I enclosed ""Make other sections" within Note, so that it won't give
Jari> error on stray top level text. It will now appear as numbered latex
Jari> \section{} in dvi file, but that can the package author correct.

Jari> * I replaced reference to a non-existent function ~~fun~~ with a
Jari> reference to function help. 

sounds all reasonable


Jari> I'm sorry for the formatting of the diff file: my emacs/ESS is 
cleverer
Jari> than I and changes indentation and line breaks against my will.

hmm; did you *edit* the *source*
or did you just edit a ``dump'' of the function definition?

Since you didn't use  text/plain  as content type, your
attachment didn't make it to the list anyway, and you have a
second chance:

Please use a "diff -u" against

   https://svn.R-project.org/R/trunk/src/library/utils/R/prompt.R

or maybe even a "diff -ubBw ..." one.

Thank you for your proposition and willingness to contribute!

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why should package.skeleton() fail R CMD check?

2005-08-31 Thread Martin Maechler
One thing I forgot to add:

Did you try to include

- data frames
- other data

- S3 generics and methods
- S4 generics and methods

in the objects you gave to package.skeleton() ?

If we want to change the prompt*() functions such that 
package.skeleton() produces a package that only gives warnings
{for the case of no ./src/ dependence; no NAMESPACE ; no other
 package dependencies; ...}

I think we'd also need patches for the above objects' prompt*()
results.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD BATCH on scripts without trailing newline

2005-09-01 Thread Martin Maechler
> "StEgl" == Stephen Eglen <[EMAIL PROTECTED]>
> on Thu, 1 Sep 2005 12:09:15 +0100 writes:

StEgl> If the last line of an R script does not have a
StEgl> trailing newline, a small errror is produced at the
StEgl> end of the script.

StEgl> Small example.  If file eg.r contains one line:
StEgl> getwd() and there is no newline after the closing
StEgl> paren

StEgl> $ R CMD BATCH eg.r

StEgl> produces an error: $ cat eg.r.Rout

StEgl> R : Copyright 2005, The R Foundation for Statistical
StEgl> Computing Version 2.1.1 Patched (2005-09-01), ISBN
StEgl> 3-900051-07-0

StEgl> ...

>> getwd()proc.time()
StEgl> Error: syntax error Execution halted $

aahh, now I finally understand via some people append
those **ugly** unneeded ';' to the end of almost every line of R
code.  It would have helped here
:-) :-)

StEgl> Is it worth changing the BATCH script so that it adds
StEgl> a newline before adding the call to proc.time()?

Yes I think it would be.  This is trivial, at least for 
 /src/scripts/BATCH
Slightly better but more tricky:  only append a newline "when needed".
Any idea for that?

You didn't tell us the *platform* you run R on
(and BATCH does depend on the platform),
but I know that it's a version of unix,  Linux I suppose?

BTW: The windows version of "R CMD BATCH" is actually
 *documented* do to work with files that don't end in newline.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 64 bit R for Windows

2005-09-02 Thread Martin Maechler
>>>>> "PD" == Peter Dalgaard <[EMAIL PROTECTED]>
>>>>> on 02 Sep 2005 18:48:24 +0200 writes:

PD> "Milton Lopez" <[EMAIL PROTECTED]> writes:

>> I appreciate the update. We will consider using Linux,
>> which leads me to one more question: what is the maximum
>> RAM that R can use on each platform (Linux and Windows)?
>> 
>> Thanks again for your prompt responses.

PD> On Win32, something like 3GB. Maybe a little more on
PD> Linux32, but there's a physical limit at 4GB.

for a *single* object, yes.  However (and Peter knows this
probably better than me ..), R's workspace can be very much
larger which makes it realistically possible to start *using* R
functions on objects of around 4GB.
Someone (Venables & Ripley ?) have once stated the rule of thumb
that you need about 5--10 times the size of your "single" large
object for your "workspace", because of (intermediate) copies,
sometimes multiple ones are needed, or at least part of the
current implementations of many basic algorithms / functions.
In other words, if you got a 32 GB RAM, you could probably start
to work with objects of the size of (a little less than) 4GB
relatively comfortably.

Martin Maechler

PD> On Linux 64, the motherboards set the limit in practice,
PD> 32GB systems have been reported working and I think at
PD> least 64GB should be possible. I seem to recall that the
PD> maximum _virtual_ memory is not quite 2^64, but it will
PD> be pretty huge (2^48, 256TB)?.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD BATCH on scripts without trailing newline

2005-09-03 Thread Martin Maechler
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Thu, 1 Sep 2005 13:39:52 +0200 writes:

>>>>> "StEgl" == Stephen Eglen <[EMAIL PROTECTED]>
>>>>> on Thu, 1 Sep 2005 12:09:15 +0100 writes:

StEgl> If the last line of an R script does not have a
StEgl> trailing newline, a small errror is produced at the
StEgl> end of the script.

StEgl> Small example.  If file eg.r contains one line:
StEgl> getwd() and there is no newline after the closing
StEgl> paren

StEgl> $ R CMD BATCH eg.r

StEgl> produces an error: $ cat eg.r.Rout

StEgl> R : Copyright 2005, The R Foundation for Statistical
StEgl> Computing Version 2.1.1 Patched (2005-09-01), ISBN
StEgl> 3-900051-07-0

StEgl> ...

>>> getwd()proc.time()
StEgl> Error: syntax error Execution halted $

MM> aahh, now I finally understand via some people append
MM> those **ugly** unneeded ';' to the end of almost every
MM> line of R code.  It would have helped here :-) :-)

StEgl> Is it worth changing the BATCH script so that it adds
StEgl> a newline before adding the call to proc.time()?

MM> Yes I think it would be.  This is trivial, at least for
MM> /src/scripts/BATCH Slightly better but more
MM> tricky: only append a newline "when needed".  Any idea
MM> for that?

It's probably not worth the extra effort (I agree with Jan on
*that); I've added the obvious to the BATCH script used on
unix-alike platforms, now adding a newline before 'proc.time()'
unconditionally.  I hope people can live with an extra byte in
the output files.

Martin

> BTW: The windows version of "R CMD BATCH" is actually
MM> *documented* do to work with files that don't end in
MM> newline.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] system on windows vs. unix

2005-09-07 Thread Martin Maechler
> "Gabor" == Gabor Grothendieck <[EMAIL PROTECTED]>
> on Wed, 7 Sep 2005 00:08:05 -0400 writes:

Gabor> The R system command has different arguments on Windows and UNIX.
Gabor> I hadn't realized that and I think it would be nice if the input=
Gabor> argument available
Gabor> on Windows were available on UNIX too and the ignore.stderr= argument
Gabor> available on
Gabor> UNIX were avaliable on Windows too.  

Gabor> Even without that I could have saved some time if the help file had
Gabor> pointed out that
Gabor> the arguments vary from OS to OS or even better which are common and 
which 
Gabor> are OS-specific.

I very much agree.

Patches (against R-devel!) are very welcome.

Regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wishlist: write.delim()

2005-09-09 Thread Martin Maechler
> "Douglas" == Douglas Grove <[EMAIL PROTECTED]>
> on Thu, 8 Sep 2005 15:33:02 -0700 (PDT) writes:

Douglas> Hi,
Douglas> It would be great if someone would add write.delim() as an
Douglas> adjunct to write.table(), just as with write.csv().

Douglas> I store a lot of data in tab-delimited files and can read
Douglas> it in easily with:  read.delim("text.txt", as.is=TRUE)
Douglas> and would love to be able to write it out as easily when
Douglas> I create these files.

Douglas> The obvious setting needed for write.delim() is sep = "\t",
Douglas> but in addition I would request the setting row.names = FALSE

Douglas> i.e. 

Douglas> write.delim(x, file) = write.table(x, file, sep = "\t", 
row.names=FALSE)

i.e.,
   write.delim <- function(x, file, ...) 
write.table(x, file, sep = "\t", row.names=FALSE, ...)

So, why don't you just add that one line to your .Rprofile ?

In general, I don't think that it's worth to introduce a whole
new function just because of some frequent argument use of an
already existing function  {{and I have wondered if it was worth
to provide write.csv() at all - although, there the difference to default
write.table() is quite a bit larger}}

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]

2005-09-10 Thread Martin Maechler
> "TL" == Thomas Lumley <[EMAIL PROTECTED]>
> on Sat, 10 Sep 2005 09:32:29 -0700 (PDT) writes:

>>  Standard location or a mechachanism like the one you
>> describe are both similar amount of work (and not much at
>> all), the HTML pages are generated by perl and I have the
>> parsed DESCRIPTION file there, i.e., using a fixed name
>> or the value of the Changelog field is basically the
>> same.
>> 

TL> In which case a Changlog entry in DESCRIPTION would be a
TL> very nice addition, and would have the advantage of not
TL> requiring changes to packages.

yes *and* does allow slightly more flexibility with almost
no cost, as Fritz confirmed.

And, BTW, Gabor,  NEWS and ChangeLog are not at all the same
thing and it would be silly to urge users to one of them.
At least 'ChangeLog' is a well defined format for emacs users
that can very quickly be updated semi-automagically
("C-x 4 a" when you're in file  foo.R with function myfun(.)
 autogenerates a neat entry in a ChangeLog file);
but then really people should be allowed to use other formats
for good reasons.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] plot(): new behavior in R-2.2.0 alpha

2005-09-13 Thread Martin Maechler
As some of you R-devel readers may know, the plot() method for
"lm" objects is based in large parts on contributions by John
Maindonald, subsequently "massaged" by me and other R-core
members.

In the statistics litterature on applied regression, people have
had  diverse oppinions on what (and how many!) plots should be
used for goodness-of-fit / residual diagnostics, and to my
knowledge most people have agreed to want to see one (or more)
version of a Tukey-Anscombe plot {Residuals ~ Fitted} and a QQ
normal plot.
Another consideration was to be somewhat close to what S
(S-plus) was doing.  So we have two versions of residuals vs
fitted, one for checking  E[error] = 0, the other for checking 
Var[error] = constant.  So we got to the first three plots of
plot.lm() about which I don't want to debate at the moment
{though, there's room for improvement even there: e.g., I know of at
 least one case where plot() wasn't used because the user
 was missing the qqline() she was so used to in the QQ plot}

The topic of this e-mail is the (default) 4th plot which I had
changed; really prompted by the following:
More than three months ago, John wrote
  http://tolstoy.newcastle.edu.au/R/devel/05/04/0594.html
(which became a thread of about 20 messages, from Apr.23 -- 29, 2005)

and currently, 
NEWS for R 2.2.0 alpha contains

>> USER-VISIBLE CHANGES
>> 
>>o plot() uses a new default for the fourth panel when
>>  'which' is not specified.
>>  ___ may change before release ___

and the header is

plot.lm <- 
function (x, which = c(1:3, 5), 
  caption = c("Residuals vs Fitted", 
  "Normal Q-Q", "Scale-Location", 
  "Cook's distance", "Residuals vs Leverage", 
  "Cook's distance vs Leverage"), 
   . ) {..}

So we now have 6 possible plots, where 1,2,3 and 5 are the
defaults (and 1,2,3,4 where the old defaults).

For the influential points and combination of 'influential' and 'outlier'
there have been quite a few more proposals in the past. R <= 2.1.x
has been plotting the  Cook's distances vs. observation number, whereas
quite a few people in the past have noted that all influence
measures being more or less complicated functions of residuals
and "hat values" aka "leverages", (R_i, h_{ii}), it would really
make sense and fit more to the other plots
to plot residuals vs. Leverages --- with the additional idea of
adding *contours* of (equal) Cook's distances to that plot, in
case one would really want to seem them.

In the mean time, this has been *active* in R-devel for quite a
while, and we haven't received any new comments.

One remaining problem I'd like to address is the "balanced AOV"
situation, something probably pretty rare nowadays in real
practice, but common of course in teaching ANOVA.
As you may remember, in a balanced design, all observations have
the same leverages h_{ii}, and the plot  R_i  vs  h_ii is really
not so useful.  In that case,  the cook distances CD_i = c *  R_i ^2
and so  CD_i  vs  i {the old "4-th plot in plot.lm"} is
graphically identical to   R_i^2 vs i.
Now in that case (of identical h_ii's), I think one would really
want  "R_i  vs  i".

Question to the interested parties:

  Should there be an automatism
 ``when h_ii == const''  {"==" with a bit of numerical fuzz}
  plot a)  R_i   vs i
  or   b)  CD_i  vs i

or should users have to manually use
plot(,  which=1:4, ...)
in such a case?

Feedback very welcome, 
particularly, you first look at the examples in help(plot.lm) 
in *R-devel* aka R-2.2.0 alpha.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plot(): new behavior in R-2.2.0 alpha

2005-09-14 Thread Martin Maechler
Thank you, John, for
Dear 
>>>>> "JohnF" == John Fox <[EMAIL PROTECTED]>
>>>>> on Tue, 13 Sep 2005 16:41:28 -0400 writes:

JohnF> A couple of comments on the new plots (numbers 5 and 6):
JohnF> Perhaps some more thought could be given to the
JohnF> plotted contours for Cook's D (which are 0.5 and 1.0
JohnF> in the example -- large Cook's Ds). A rule-of-thumb
JohnF> cut-off for this example is 4/(n - p) = 4/(50 - 5) =
JohnF> 0.089, and the discrepancy will grow with n.

That's an interesting suggestion.  Where does the 4/(n-p) come
from? or put differently, should I better read in of your books? ;-)

Honestly, I'm so much a fan of R_i / h_ii that I didn't even
know that.

JohnF> I'm not terribly fond of number 6, since it seems
JohnF> natural to me to think of the relationship among
JohnF> these quantities as influence on coefficients =
JohnF> leverage * outlyingness (which corresponds to 5);
JohnF> also note how in the example, the labels for large
JohnF> residuals overplot.

I think John mainly proposed '6' because other proposed it as another
good alternative.  From the few examples I've looked at, I
haven't got fond at all either.

JohnF> Finally, your remarks about balanced data are cogent
JohnF> and suggest going with 1:3 in this case (since R_i
JohnF> vs. i is pretty redundant with the QQ plot).

Ah, that's another, maybe better alternative to my proposal.

One drawback of it is for situations where people do something
likepar(mfrow=c(2,2))
before calling  plot() for several fitted lm models,
assuming to fill one page for each of the plots.
and I think that's something I would have done always in such
situations where several different models are fitted and compared.

Maybe plot.lm() should "advance an empty frame" as soon as
 prod(par("mfrow")) >= 4 
in that case?

Martin

JohnF> 
JohnF> John Fox
JohnF> Department of Sociology
JohnF> McMaster University
JohnF> Hamilton, Ontario
JohnF> Canada L8S 4M4
JohnF> 905-525-9140x23604
JohnF> http://socserv.mcmaster.ca/jfox 
JohnF>  

>> -Original Message-
>> From: Martin Maechler [mailto:[EMAIL PROTECTED] 
>> Sent: Tuesday, September 13, 2005 9:18 AM
>> To: R-devel@stat.math.ethz.ch
>> Cc: John Maindonald; Werner Stahel; John Fox
>> Subject: plot(): new behavior in R-2.2.0 alpha
>> 
>> As some of you R-devel readers may know, the plot() method 
>> for "lm" objects is based in large parts on contributions by 
>> John Maindonald, subsequently "massaged" by me and other 
>> R-core members.
>> 
>> In the statistics litterature on applied regression, people 
>> have had  diverse oppinions on what (and how many!) plots 
>> should be used for goodness-of-fit / residual diagnostics, 
>> and to my knowledge most people have agreed to want to see 
>> one (or more) version of a Tukey-Anscombe plot {Residuals ~ 
>> Fitted} and a QQ normal plot.
>> Another consideration was to be somewhat close to what S
>> (S-plus) was doing.  So we have two versions of residuals vs 
>> fitted, one for checking  E[error] = 0, the other for 
>> checking Var[error] = constant.  So we got to the first three plots of
>> plot.lm() about which I don't want to debate at the moment 
>> {though, there's room for improvement even there: e.g., I 
>> know of at  least one case where plot() wasn't used 
>> because the user  was missing the qqline() she was so used to 
>> in the QQ plot}
>> 
>> The topic of this e-mail is the (default) 4th plot which I 
>> had changed; really prompted by the following:
>> More than three months ago, John wrote
>> http://tolstoy.newcastle.edu.au/R/devel/05/04/0594.html
>> (which became a thread of about 20 messages, from Apr.23 
>> -- 29, 2005)
>> 
>> and currently,
>> NEWS for R 2.2.0 alpha contains
>> 
>> >> USER-VISIBLE CHANGES
>> >> 
>> >>o plot() uses a new default for the fourth panel when
>> >>  'which' is not specified.
>> >>  ___ may change before release ___
>> 
>> and the header is
>> 
>> plot.lm <-
>> function (x, which = c(1:3, 5), 
>> caption = c("Residuals vs Fitted", 
>> "Normal Q-Q&q

Re: [Rd] loadings() generic in R alpha

2005-09-17 Thread Martin Maechler

> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
> on Fri, 16 Sep 2005 14:04:37 -0400 writes:

PaulG> Brian Ok, lets leave this for now. When does the
PaulG> development cycle start for the next version that
PaulG> would allow making a function generic?

Almost immediately after 2.2.0 is released.

PaulG> Paul

PaulG> Prof Brian D Ripley wrote:

>> On Fri, 16 Sep 2005, Paul Gilbert wrote:
>> 
>> 
>> 
>>> Brian
>>> 
>>> It would help if I understood general principles. I
>>> thought one would want a case for NOT making functions
>>> generic, rather than a case for making them
>>> generic. Hopefully a case for why generics and methods
>>> are useful will not be necessary.
>>> 
>>> 
>>  Making things generic
>> 
>> 1) adds runtime cost
>> 
>> 2) essentially fixes the signature for all time
>> 
>> 3) needs the return value sufficiently well defined that
>> all current uses will not be broken by a new method.
>> (This was not a problem with e.g.  as.ts as everone knows
>> the result should be a "ts" object.  But I think it is a
>> problem with acf and loadings.)
>> 
>> I would for example be unhappy with your definition of
>> loadings() as it has no ... argument (and S-PLUS has one
>> in its loadings() generic).
>> 
>> So cases are necessary.  I am pretty sure that we have in
>> the past agreed that making a function generic is a Grand
>> Feature, and we are in GFF.
>> 
>> 
>> 
>> 
>>> The situation with loadings() is that I construct
>>> objects where the loadings are in a list within a list,
>>> so the simple definition in stats does not work:
>>> 
>>> loadings function (x) x$loadings >> namespace:stats>
>>> 
>>> Basically this definition restricts the way in which
>>> objects can be constructed, so I would like it replaced
>>> by
>>> 
>>> loadings <- function (x) UseMethod("loadings")
>>> loadings.default <- function (x) x$loadings
>>> 
>>> There may be a reason for adding a ... argument, but I
>>> have been using this generic and methods for it in my
>>> own work for a fairly long time now and have not
>>> discovered one.  The change seems rather trivial, I have
>>> tested it extensively with my own code, and there is a
>>> fairly complete test suite in place for checking changes
>>> to R, so it seems reasonable to me that this should be
>>> considered as a change that is possible in an alpha
>>> release. It would also be fairly easy to back out of if
>>> there are problems.
>>> 
>>> The reason for needing acf generic is the same, so that
>>> it can be use with more complicated objects that I
>>> construct. However, I see here that there are
>>> potentially more difficult problems, because the
>>> ... argument to the current acf (which one would want as
>>> the default method) is passed to plot.acf.  Here I can
>>> clearly see the reason for wanting to start
>>> consideration of this at an earlier point in the
>>> development cycle.
>>> 
>>> Best, Paul
>>> 
>>> Prof Brian Ripley wrote:
>>> 
>>> 
>>> 
 On Thu, 15 Sep 2005, Paul Gilbert wrote (in two
 separate messages)
 
 
 
> Could loadings() in R-2.2.0 please be made generic?
> 
> 

> Could acf() in R-2.2.0 please be made generic?
> 
> 
 I think it is too late in the process for this (and
 especially for acf). In particular, it could have
 knock-on consequences for packages and recommended
 packages are scheduled to be all fixed in stone by next
 Weds.
 
 To consider making such functions generic we would need
 
 - a case - discussion of what the arguments of the
 generic should be and what is to be specified about the
 return value.
 
 Perhaps you could raise these again with specific
 proposals early in the developement cycle for 2.3.0.
 
 (We have been a little too casual about speciying what
 generic functions should return in the past, and have
 got bitten as a result.  For example, can it be assumed
 that loadings() returns a matrix?)
 
 

PaulG> __
PaulG> R-devel@r-project.org mailing list
PaulG> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] plot(): new behavior in R-2.2.0 alpha

2005-09-17 Thread Martin Maechler

> "Wst" == Werner Stahel <[EMAIL PROTECTED]>
> on Fri, 16 Sep 2005 09:37:02 +0200 writes:

Wst> Dear Martin, dear Johns Thanks for including me into
Wst> your discussion.

Wst> I am a strong supporter of "Residuals vs. Hii"

>>> One remaining problem I'd like to address is the
>>> "balanced AOV" situation, ...

Wst> In order to keep the plots consistent, I suggest to
Wst> draw a histogram. Other alternatives will or can be
Wst> interesting in the general case and therefore are not a
Wst> suitable substitute for this plot.

hmm, but all other 3 default plots have
 (standardized / sqrt) residuals  on the y-axis.
I'd very much like to keep that for any forth plot.
So would we want a horizontal histogram?  And do we really want
a histogram when we've already got the QQ plot?

We need a decent proposal for a 4th plot 
{instead of  R_i vs h_ii  , when  h_ii are constant}
REAL SOON NOW  since it's feature
freeze on Monday. 
Of course the current state can be declared a bug and still be
fixed but that was not the intention...

Also, there are now at least 2 book authors among R-core (and
more book authors more generally!) in whose books there are
pictures with the "old-default" 4th plot. 
So I'd like to have convincing reasons for ``deprecating'' all 
the plot.lm() pictures in the published books.

At the moment, I'd still  go for
 
 R_i  vs i
or  sqrt|R_i| vs i  -- possibly with type = 'h' 

which could be used to "check" an important kind of "temporal" 
auto-correlation.

the latter, because in a 2 x 2 plot arrangement, this gives the
same y-axis as default plot 3.

Wst> 

Wst> Back to currently available methods:

Wst> John Maindonald discusses different contours. I like
Wst> the implementation I get currently in R-devel: contours
Wst> of Cook's distances, since they are popular and we can
Wst> then argue that the plot of D_i vs. i is no more
Wst> needed.

what about John's proposal of different contour levels than
c(0.5, 1)  -- note that these *have* been added as arguments to
plot.lm() a user could modify.

Wst> For most plots, I like to see a smoother along with the
Wst> points.  I suggest to add the option to include
Wst> smoothers, not only as an argument to plot.lm, but even
Wst> as an option().  I have heared of the intense
Wst> discussions about options().  With Martin, we arrived
Wst> at the conclusion that options() should never influence
Wst> calculations and results, but is suitable to adjust
Wst> outputs (numerical: digits=, graphical: smooth=) to the
Wst> user's taste.

{and John Fox agreed, `in general'}

That could be a possibility, for 2.2.0  only applied to
plot.lm() in any case, where plot.lm() would get a new argument

add.smooth = getOption("plot.add.smooth")

What do people think about the name? 
it would ``stick with us'' -- so we better choose it well..

>>> (4) Are there other diagnostics that ought to be
>>> included in stats? (perhaps in a function other than
>>> plot.lm(), which risks being overloaded).  One strong
>>> claiment is vif() (variance inflation factor),

   ...
   ...
   ...


Wst> As we focus on plots, my plot method includes the
Wst> option (default) to add smooths for 20 simulated
Wst> datasets (according to the fitted model).

this and others are really nice.

However not for R 2.2.x in any case.

I agree that one should rather provide `single-plot'
functions and have plot.lm() just call a few of them; instead of
having things all part of plot.lm().
There's the slight advantage that you can guarantee some
consistence (e.g. in the definition of "standardized residuals")
and save some computations when have everything in one function,
but consistency should be possible otherwise as well...
Anyway this is for 2.3.0 or later.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] automatically adding smooth to plot: options("plot.add.smooth")

2005-09-19 Thread Martin Maechler
I've changed the subject in the hope some more people would
voice an opinion...

>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Sat, 17 Sep 2005 17:29:20 +0200 writes:

>>>>> "Wst" == Werner Stahel <[EMAIL PROTECTED]>
>>>>> on Fri, 16 Sep 2005 09:37:02 +0200 writes:


Wst> 
Wst> 
Wst> 

Wst> For most plots, I like to see a smoother along with the
Wst> points.  I suggest to add the option to include
Wst> smoothers, not only as an argument to plot.lm, but even
Wst> as an option().  I have heared of the intense
Wst> discussions about options().  With Martin, we arrived
Wst> at the conclusion that options() should never influence
Wst> calculations and results, but is suitable to adjust
Wst> outputs (numerical: digits=, graphical: smooth=) to the
Wst> user's taste.

MM> {and John Fox agreed, `in general'}

MM> That could be a possibility, for 2.2.0 only applied to
MM> plot.lm() in any case, where plot.lm() would get a new
MM> argument

MM> add.smooth = getOption("plot.add.smooth")

MM> What do people think about the name?  it would ``stick
MM> with us'' -- so we better choose it well..

No reaction so far 


I've realized that I can introduce this very easily into
plot.lm():

Instead of the former argument

 panel = points

I use the new ones

 panel = if(add.smooth) panel.smooth else points,

 add.smooth = isTRUE(getOption("plot.add.smooth")),

- - - 

Now I even propose to have
  
options(add.smooth = TRUE)

as a new default.

Do I get a reaction now?
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] automatically adding smooth to plot: options("plot.add.smooth")

2005-09-19 Thread Martin Maechler
>>>>> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
>>>>> on Mon, 19 Sep 2005 10:01:57 -0400 writes:

PaulG> Martin Maechler wrote:
>> I've changed the subject in the hope some more people
>> would voice an opinion...

PaulG> ...

>> Now I even propose to have
>> 
>> options(add.smooth = TRUE)
>> 
>> as a new default.
>> 
>> Do I get a reaction now?  Martin

PaulG> I think you may break a lot of things if you make
PaulG> this the default for plot. 

You mean plot.default().
Yes, that would be quite dangerous to do and you give a good example:

PaulG> this the default for plot. Plot gets used by other
PaulG> things (like matplot) where this default may not make
PaulG> much sense. (But I may have missed too much of the
PaulG> earlier discussion under some other subject.)

or I was not clear enough:
For R 2.2.0, the option would only be used in plot.lm().
Since I'd set its default to TRUE,  plot.lm()'s panels would use
panel.smooth(x,y,...) rather than points(x,y,...), 
and this actually does look quite useful.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] simulate in stats

2005-09-29 Thread Martin Maechler
>>>>> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
>>>>> on Thu, 15 Sep 2005 12:07:48 -0400 writes:

PaulG> BTW, I think there is a problem with the way the
PaulG> argument "seed" is used in the new simulate in stats.
PaulG> The problem is that repeated calls to simulate using
PaulG> the default argument will introduce a new pattern
PaulG> into the RNG:

>> stats:::simulate
PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0, 
PaulG> .Machine$integer.max)),   ...)
PaulG> UseMethod("simulate")
PaulG> 


>> stats:::simulate.lm
PaulG> function (object, nsim = 1, seed = as.integer(runif(1, 0, 
PaulG> .Machine$integer.max)),...)
PaulG> {
PaulG> if (!exists(".Random.seed", envir = .GlobalEnv))
PaulG> runif(1)
PaulG> RNGstate <- .Random.seed
PaulG> set.seed(seed)
PaulG> ...

PaulG> This should not be done, as the resulting RNG has not
PaulG> been studied or proven. A better mechanism is to have
PaulG> a default argument equal NULL, and not touch the seed
PaulG> in that case.

I agree so far.  I think the default seed should really be NULL
(with your semantic!) rather than a random number.


PaulG>  There are several examples of this in
PaulG> the package dse1 (in bundle dse), see for example
PaulG> simulate.ARMA and simulate.SS. They also use the
PaulG> utilities in the setRNG package to save more of the
PaulG> information necessary to reproduce
PaulG> simulations. Roughly it is done like this:
 
PaulG> simulate.x <- function (model, rng = NULL,  ...)
PaulG>   {if (is.null(rng)) rng <- setRNG() 
PaulG> ## returns the RNG setting to be  saved with the result
PaulG>   else {
PaulG> old.rng <- setRNG(rng)
PaulG> on.exit(setRNG(old.rng))
PaulG>   }
PaulG> ...

as nobody has further delved into this in the mean time,
this is definitely too late for R 2.2.0, even if it was desired.

But I also think you should be able to live with interpreting
'seed' as 'rng' if you want, shouldn't you?

PaulG> The seed by itself is not very useful if the purpose
PaulG> is to be able to reproduce things, and I think it
PaulG> would be a good idea to incorporate the few small
PaulG> functions setRNG into stats (especially if the
PaulG> simulate mechanism is being introduced).

maybe we should reopen this topic {adopting ideas or even exact
implementations from your 'setRNG' into stats} in a few weeks,
when R 2.2.0 is released.

PaulG> The argument "nsim" presumably alleviates to some
PaulG> extent the above concern about changing the RNG
PaulG> pattern. However, in my fairly extensive experience
PaulG> it is not very workable to produce all the
PaulG> simulations and then do the analysis of them. 
PaulG> In a Monte Carlo experiment the generated data set is just
PaulG> too big.

I believe this depends very much on the topic.  The simulate()
uses that we had envisaged with simulate() don't save all the
models and then analyze them.  
But maybe I'm misunderstanding your point completely here.

PaulG>  A better approach is to do the analysis and save
PaulG> only necessary information after each
PaulG> simulation. That is the approach, for example, in
PaulG> dse2:::EstEval.

PaulG> Paul

PaulG> Paul Gilbert wrote:

>> Can the arguments nsim and seed be passed as part of ... in the new 
>> simulate generic in R-2.2.0alpha package stats?

>> This would potentially allow me to use the stats generic rather than 
>> the one I define in dse. There are contexts where nsim and seed do not 
>> make sense.

Well, the current specification for simulate() has been different
explicitly.

I agree that there are situations where both 'nsim' and 'seed'
(or a generalization, say 'RNGstate') wouldn't make sense and
one still would like to use something like "simulate" in the
function name. 

>> I realize that the default arguments could be ignored, but 
>> it does not really make sense to introduce a new generic with that in 
    >> mind. 

I think it would depend on the exaxt context if I would rather
use a (slightly) different function name, or just ignore the
ignorable arguments as you mention.


>> (I would also prefer that the "object" argument was called 
>>  "model" but this is less important.)

I'd personally agree with that;  the argument was that
'object' is very generally used in such situations.

Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about colnames behavior

2005-10-03 Thread Martin Maechler
> "Erich" == Erich Neuwirth <[EMAIL PROTECTED]>
> on Sun, 02 Oct 2005 09:39:36 +0200 writes:

Erich> The following code
Erich> zzz<-1:10
Erich> dim(zzz)<-10
Erich> rownames(zzz)
Erich> colnames(zzz)

Erich> yields NULL for the rownames and colnames calls.
Erich> Let us set rownames

Erich> rownames(zzz)<-1:10

Erich> Now rownames(zzz) returns the expected result, but colnames(zzz)
Erich> produces an error:
Erich> Error in dn[[2]] : subscript out of bounds

Erich> So given a onedimensional structure the return behavior of colnames
Erich> is different depending on the fact if rownames are set or not.

Erich> Should the behavior of colnames be changed to make the result
Erich> independent from this fact?

yes, thank you, Erich. 
It should give an error also in the 1st  case which is
BTW identical to  
zzz <- array(1:10)

Not for R 2.2.0 though, but rather 2.2.1.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question about colnames behavior

2005-10-03 Thread Martin Maechler
>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Mon, 3 Oct 2005 09:44:47 +0100 (BST) writes:

BDR> On Mon, 3 Oct 2005, Martin Maechler wrote:
>>>>>>> "Erich" == Erich Neuwirth <[EMAIL PROTECTED]>
>>>>>>> on Sun, 02 Oct 2005 09:39:36 +0200 writes:
>> 
Erich> The following code
Erich> zzz<-1:10
Erich> dim(zzz)<-10
Erich> rownames(zzz)
Erich> colnames(zzz)
>> 
Erich> yields NULL for the rownames and colnames calls.
Erich> Let us set rownames
>> 
Erich> rownames(zzz)<-1:10
>> 
Erich> Now rownames(zzz) returns the expected result, but colnames(zzz)
Erich> produces an error:
Erich> Error in dn[[2]] : subscript out of bounds
>> 
Erich> So given a onedimensional structure the return behavior of colnames
Erich> is different depending on the fact if rownames are set or not.
>> 
Erich> Should the behavior of colnames be changed to make the result
Erich> independent from this fact?
>> 
>> yes, thank you, Erich.
>> It should give an error also in the 1st  case which is
>> BTW identical to
>> zzz <- array(1:10)

BDR> Not according to my reading of the help, which says

  >> The extractor functions try to do something sensible for any
  >> matrix-like object 'x'.  If the object has 'dimnames' the first
  >> component is used as the row names, and the second component (if
  >> any) is used for the col names.

BDR> and reading on, I think it should give NULL in both cases.  You could 
BDR> argue that a 1D array is not `matrix-like', but that seems a narrow 
BDR> interpretation (especially as rownames does work for such arrays).

I was lead to my conclusion by the same help page, reading

 >> Arguments:
 >> 
 >>x: a matrix-like R object, with at least two dimensions for 'colnames'.

from which I concluded an error was appropriate for 'colnames'
when 'x' doesn't have two dimensions.

If we adopt your proposal (NULL in any case), we should
definitely also fix that paragraph...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vector labels are not permuted properly in a call to sort() (R 2.1)

2005-10-05 Thread Martin Maechler
> "AndyL" == Liaw, Andy <[EMAIL PROTECTED]>
> on Tue, 4 Oct 2005 13:51:11 -0400 writes:

AndyL> The `problem' is that sort() does not doing anything special when 
given
AndyL> a matrix: it only treat it as a vector.  After sorting, it copies
AndyL> attributes of the original input to the output.  Since dimnames are
AndyL> attributes, they get copied as is.

exactly. Thanks Andy.

And I think users would want this (copying of attributes) in
many cases; in particular for user-created attributes

?sort  really talks about sorting of vectors and factors;
   and it doesn't mention attributes explicitly at all
   {which should probably be improved}.

One could wonder if R should keep the dim & dimnames
attributes for arrays and matrices.  
S-plus (6.2) simply drops them {returning a bare unnames vector}
and that seems pretty reasonable to me.

At least the user would never make the wrong assumptions that
Greg made about ``matrix sorting''.


AndyL> Try:

>> y <- matrix(8:1, 4, 2, dimnames=list(LETTERS[1:4], NULL))
>> y
AndyL> [,1] [,2]
AndyL> A84
AndyL> B73
AndyL> C62
AndyL> D51
>> sort(y)
AndyL> [,1] [,2]
AndyL> A15
AndyL> B26
AndyL> C37
AndyL> D48

AndyL> Notice the row names stay the same.  I'd argue that this is the 
correct
AndyL> behavior.

AndyL> Andy


>> From: Greg Finak
>> 
>> Not sure if this is the correct forum for this, 

yes, R-devel is the proper forum.
{also since this is really a proposal for a change in R ...}

>> but I've found what I  
>> would consider to be a potentially serious bug to the 
>> unsuspecting user.
>> Given a numeric vector V with class labels in R,  the following calls
>> 
>> 1.
>> > sort(as.matrix(V))
>> 
>> and
>> 
>> 2.
>> >as.matrix(sort(V))
>> 
>> produce different ouput. The vector is sorted properly in 
>> both cases,  
>> but only 2. produces the correct labeling of the vector. The call to  
>> 1. produces a vector with incorrect labels (not sorted).
>> 
>> Code:
>> >X<-c("A","B","C","D","E","F","G","H")
>> >Y<-rev(1:8)
>> >names(Y)<-X
>> > Y
>> A B C D E F G H
>> 8 7 6 5 4 3 2 1
>> > sort(as.matrix(Y))
>> [,1]
>> A1
>> B2
>> C3
>> D4
>> E5
>> F6
>> G7
>> H8
>> > as.matrix(sort(Y))
>> [,1]
>> H1
>> G2
>> F3
>> E4
>> D5
>> C6
>> B7
>> A8
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Documenting newly created generic versions of non-generic base R functions

2005-10-18 Thread Martin Maechler
> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> on Mon, 10 Oct 2005 17:29:42 +0100 writes:

GS> Hi,
GS> Following the Writing R Extensions manual, I created a method for the
GS> cor function. As cor is not a generic, I followed the advice of section
GS> 6.1 of the same manual and did the following:

GS> cor <- function(x, ...) UseMethod("cor")
GS> cor.default <- stats::cor
GS> cor.symcoca <- function{ some code }

GS> I used package.skeleton to create the basic set-up of my package,
GS> containing the above functions.

GS> Do I need to provide a .Rd file for cor and cor.default? - seeing as
GS> cor.default is cor currently.

GS> What is the best way to handle documenting functions produced using the
GS> above hi-jack methodology?

I'd probably write one help page, mainly for the "symcoca"
method, but also with

\alias{cor.default}
\alias{cor.symcoca}

and
\usage{
\method{cor}{symcoca}(..)
}

and would mention 'cor.default' and your redifinition of cor,
also \code{\link[stats]{cor}}
in the \description{...} or \details{} and/or other appropriate
places.

Regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] is.na<- problem

2005-10-20 Thread Martin Maechler
> "Marc" == Marc Schwartz <[EMAIL PROTECTED]>
> on Wed, 19 Oct 2005 20:28:05 -0500 writes:
   .

>> > In reviewing the Green Book on the top of page 143, it shows an example
>> > in which the RHS of the assignment are the indices into the LHS object
>> > which are to be set to NA. For example:
>> >
>> > > xx <- c(0:5)
>> >
>> > > xx
>> > [1] 0 1 2 3 4 5
>> >
>> > > is.na(xx) <- c(3, 4)
>> >
>> > > xx
>> > [1]  0  1 NA NA  4  5
>> >
>> >

   ...

Marc> In all honesty, while I understood the concept from reading the help
Marc> page, it was not truly clear until I read the Green Book and saw the
Marc> example as to how to actually use the function.

Marc> It would probably be worthwhile to add an example of use to the help
Marc> page.

good idea. I've added a version of yours:

(xx <- c(0:4))
is.na(xx) <- c(2, 4)
xx #> 0 NA  2 NA  4

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Alpha and Beta testing of R versions

2005-11-04 Thread Martin Maechler
[Mainly for R-foundation members; but kept in public for general
 brainstorming...]

> "Simon" == Simon Urbanek <[EMAIL PROTECTED]>
> on Thu, 3 Nov 2005 12:16:25 -0500 writes:

  <>

Simon> As Brian was saying, the error was fixed in R
Simon> immediately after the release - strangely enough no
Simon> one reported the error during the alpha and beta
Simon> cycle although both the GUI and R binaries were
Simon> available for download :(.

Unfortunately, the phrase "strangely enough" could be replaced with
``as almost always''.

Maybe we (the R-foundation) should give serious thoughts to
offer prizes for valid bug reports during alpha and beta
testing.  These could include
- Reduced fee for 'useR' and 'DSC' conferences
- being listed as helpful person in R's 'THANKS' file
  {but that may not entice those who are already listed},
  or even in the NEWS of the new relase 
  or on the "Hall of fame of R beta testers"

In order to discourage an increased number of non-bug reports we
may have to also open a "hall of shame" though...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Brainstorm: Alpha and Beta testing of R versions

2005-11-07 Thread Martin Maechler
Thanks a lot, 
Andrew, for your input!

A general point about your suggestions:  You seem to assume that
bug reports are typically entered via the R-bugs web interface
(which is down at the moment and for a few more dozen hours probably},
rather than via  R's builtin  bug.report() function or the
simple e-mail to [EMAIL PROTECTED]  [[which will also not
properly work for the moment, as long as the bug repository is
suffering from a fiber cable cut in Kopenhagen]].

For some dinosaurs like me, having to fill a web page rather
than sending e-mail would be quite a loss of comfort, but
actually, it might not be a bad idea to require a unique
bug-entry interface -- actually we have been thinking of moving
to bugzilla -- if only Peter Dalgaard could find a smart enough
person (even to be paid) who'd port all the old bug reports into the
new format.. 


>>>>> "Andrew" == Andrew Robinson <[EMAIL PROTECTED]>
>>>>> on Sun, 6 Nov 2005 11:01:30 +1100 writes:

Andrew> Hi Martin, On Fri, Nov 04, 2005 at 09:58:47AM +0100,
Andrew> Martin Maechler wrote:
>> [Mainly for R-foundation members; but kept in public for
>> general brainstorming...]

Andrew> I'll take up the invitation to brainstorm.

good, thank 

Andrew> As a user of R for a number of years, I'd really
Andrew> like to perform some useful service.  I use a
Andrew> relatively obscure platform (FreeBSD) and I can
Andrew> compile code.  I'd like to think that I'm in the
Andrew> target market for beta testing :).  

indeed!

Andrew> But, I'm timid. I do not feel, in general, that R core welcomes bug
Andrew> reports.

I think that's a partly wrong feeling; understandibly nourished
by some of our reactions about some "bug reports" that stemmed
from user misconceptions.  As you've remarked below, I've
expressed gratitude more than once for helpful bug reports.

Andrew> I think that there are several things that could be
Andrew> tried to encourage more, and more useful, bug
Andrew> reports.

Andrew> 1) Put the following text on the *front page* of the
Andrew> tracking system, so that it is seen before the
Andrew> reader clicks on "New Bug Report":

Andrew> "Before submitting a bug report, please read Chapter
Andrew> `R Bugs' of `The R FAQ'. It describes what a bug is
Andrew> and how to report a bug.

Andrew> If you are not sure whether you have observed a bug
Andrew> or not, it is a good idea to ask on the mailing list
Andrew> R-Help by sending an e-mail to
Andrew> r-help@stat.math.ethz.ch rather than submitting a
Andrew> bug report."

Andrew> (BTW is this true also for alpha/beta testing?)

Yes, in principile.  The only thing to be changed would be 
   sub("-help", "-devel",  )

Andrew> 2) Try to use the structure of the reporting page to
Andrew> prompt good reporting.  On the report page,
Andrew> summarize the key points of identifying and
Andrew> reporting a bug in a checklist format.  Maybe even
Andrew> insist that the boxes be checked before allowing
Andrew> submission.  Include seperate text boxes for
Andrew> description and sample code, to suggest that sample
Andrew> code is valued.

Andrew> 3) On either or both pages (and in FAQ), explain
Andrew> that thoughtful bug reports are valued and
Andrew> appreciated.  Further, explain that bug reports that
Andrew> do not follow the protocol are less valuable, and
Andrew> take more time.

Andrew> 4) Add checkboxes to the report page for alpha/beta.
Andrew> (I suggest this for the purposes of marketing, not
Andrew> organization.)

Andrew> 5) On the report page, include hyperlinks to
Andrew> archived bug reports that were good.  Do likewise
Andrew> with some artificial bug reports that are bad.

Andrew> 6) Add an intermediate, draft step for bug
Andrew> submission, to allow checking.  If possible, include
Andrew> as part of this step an automated pattern matching
Andrew> call that identifies similarly texted bug reports,
Andrew> provides links to the reports, and invites a
Andrew> last-minute cross-check.


Andrew> 7) Keep a list of people who report useful bugs in
Andrew> alpha/beta phase on the website.  Many academics
Andrew> could point to it as evidence of community service.

>> In order to discourage an increased number of non-bug
>> reports we may have to also open a "hall of shame"
>> though...

Andrew> 8) I'm sure that you're being ironic!  

indeed I was, partly.  The point was just that if the bug
reporting will 

Re: [Rd] Dead link in documentation for dbinom

2005-11-15 Thread Martin Maechler
Thank you, Ivan, for the documentation update;
Yes, such small "fixes"/patches are welcome as well.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package manpage DCF hooks

2005-11-15 Thread Martin Maechler
> "Paul" == Paul Roebuck <[EMAIL PROTECTED]>
> on Mon, 14 Nov 2005 15:57:04 -0600 (CST) writes:

Paul> On Mon, 14 Nov 2005, Gabor Grothendieck wrote:
>> On 11/14/05, Paul Roebuck <[EMAIL PROTECTED]> wrote:
>> 
>> > Was looking at what was output for -package.Rd
>> > and wondered if any there was any means (via macro, etc)
>> > to merge some of the same information with a template
>> > for my package manpage? As much (all?) of the generated
>> > information was already provided in the DESCRIPTION, I'd
>> > prefer not to have to update the information in multiple
>> > places. I'm thinking here that I could provide a template
>> > file "-package.Rd.in" and during build, the
>> > DCF information could be substituted appropriately and
>> > "-package.Rd" would be output.
>> >
>> > see also:
>> >promptPackage method
>> 
>> What I do is make my whatever-package.Rd page be
>> the central page where one can get a list of all
>> the other places one can look for info (rather than
>> placing the info itself there).  See, for example,
>> 
>> library(dyn)
>> package?dyn

Paul> Thanks for your reply. That gives me some additional
Paul> ideas but still think being able to display DCF
Paul> information and public function listing would be a nice
Paul> thing to have. For example, 'dyn-package.Rd' repeats its
Paul> DCF description.

which I agree is not ideal.  I agree that such information
should in principle reside in one place and be
``auto-distributed'' to other places during package installation
and maybe also package load time.

Note that  packageDescription("dyn")
returns an object that contains (and may print if you want) the
DCF information.

One possibility I see would be the convention that the 
'generated' (text, html, tex) help files for  'package-' 
would combine both the packageDescription() and
the contents of  -package.Rd.  

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package manpage DCF hooks

2005-11-16 Thread Martin Maechler
>>>>> "Paul" == Paul Roebuck <[EMAIL PROTECTED]>
>>>>> on Tue, 15 Nov 2005 13:07:47 -0600 (CST) writes:

Paul> On Tue, 15 Nov 2005, Martin Maechler wrote:
>> >>>>> "Paul" == Paul Roebuck <[EMAIL PROTECTED]>
>> >>>>> on Mon, 14 Nov 2005 15:57:04 -0600 (CST) writes:
>> 
Paul> On Mon, 14 Nov 2005, Gabor Grothendieck wrote:
>> >> On 11/14/05, Paul Roebuck <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > Was looking at what was output for -package.Rd
>> >> > and wondered if any there was any means (via macro, etc)
>> >> > to merge some of the same information with a template
>> >> > for my package manpage? As much (all?) of the generated
>> >> > information was already provided in the DESCRIPTION, I'd
>> >> > prefer not to have to update the information in multiple
>> >> > places. I'm thinking here that I could provide a template
>> >> > file "-package.Rd.in" and during build, the
>> >> > DCF information could be substituted appropriately and
>> >> > "-package.Rd" would be output.
>> >> >
>> >> > see also:
>> >> >promptPackage method
>> >>
>> >> What I do is make my whatever-package.Rd page be
>> >> the central page where one can get a list of all
>> >> the other places one can look for info (rather than
>> >> placing the info itself there).  See, for example,
>> >>
>> >> library(dyn)
>> >> package?dyn
>> 
Paul> Thanks for your reply. That gives me some additional
Paul> ideas but still think being able to display DCF
Paul> information and public function listing would be a nice
Paul> thing to have. For example, 'dyn-package.Rd' repeats its
Paul> DCF description.
>> 
>> which I agree is not ideal.  I agree that such information
>> should in principle reside in one place and be
>> ``auto-distributed'' to other places during package installation
>> and maybe also package load time.
>> 
>> Note that packageDescription("dyn") returns an object that
>> contains (and may print if you want) the DCF information.

Paul> I'm aware of this, having used it in various places. What
Paul> I don't know is how to access/use it during package
Paul> installation (if even possible). Using read.dcf and a sed
Paul> script, I could probably manage to perform the template
Paul> merge. But I don't know how to invoke such without adding
Paul> a configure script (overkill for R-only packages), as
Paul> 'install.R' is meant for something else.

>> One possibility I see would be the convention that the
>> 'generated' (text, html, tex) help files for 'package-'
>> would combine both the packageDescription() and
>> the contents of -package.Rd.

Paul> Well, a system-level approach would be preferable to doing
Paul> this per-package.

Definitely, and actually I was only thinking of the former.

Paul>  R-2.3 then?

with help of contributions from smart R-devel
readers/contributors that should be fairly plausible, 
otherwise I'm much less confident.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] challenge: using 'subset = ' inside function ..

2005-11-18 Thread Martin Maechler
I've been asked by someone else whom I originally taught 
`` to just work with substitute() and then all will be fine'' ...

But it looks to me that I've been caught here.

Is it possible to make this work along the way we thought it should?

1)  Inside a function, say tst() with the 'formula' and a 'data' argument, 
2)  call another modeling function using 'subset = ' with the *original*
data,
3)  but  is really computed from 'formula' itself ..

It would probably be pretty easy to use a modified 'data' (data
frame), inside tst(), instead of trying to the original data;
but let's assume for the moment that this is not at all wanted.


Here is example code {that fails}
showing several other possibilities that fail as well


tst <- function(formula, data, na.action = na.omit) {

stopifnot(inherits(formula,"formula"), length(formula) == 3)
## I want to fit a model to those observations that have 'Y > 0'
## where 'Y' is the left-hand-side (LHS)
## The really natural problem is using 'subset'; since I want to keep 
'data' intact
## It's really  lm(), glm(), gam(), ... but the problem is with model.frame:

cat("subsetting expression: ")
print(substitute(Y > 0, list(Y = formula[[2]])))# is perfect
YY <- formula[[2]]
cat("  or   "); print(bquote(.(YY) > 0))

mf <- model.frame(formula, data=data,
  subset = bquote(.(YY) > 0),
  ##or subset = substitute(Y > 0, list(Y = formula[[2]])),
  ##or subset = eval(substitute(Y > 0, list(Y = 
formula[[2]]))),
  ##or subset = as.expression(bquote(.(formula[[2]]) > 0)),
  ##or subset = bquote(.(formula[[2]]) > 0),
  na.action = na.action)
mf
}


## never works
tst(ncases ~ agegp + alcgp, data = esoph)

traceback() #--> shows that inside model.frame.default
#eval(substitute(subset, ...))  is called as well



Happy quizzing..

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] make check fails for R 2.3.0 (PR#8343)

2005-11-22 Thread Martin Maechler
>>>>> "Arne" == Arne Henningsen <[EMAIL PROTECTED]>
>>>>> on Tue, 22 Nov 2005 16:19:19 +0100 writes:

.

>> You are reporting as a bug in R a problem on your own system in an=20
>> unreleased ('unstable') version of R.  

Arne> I used this version to check my R packages because the
Arne> packages on CRAN are checked by R-devel, too.

>> Since it is unstable and
>> unreleased, such things are by definition not bugs in R.

Arne> Sorry, I did not know this. I thought that my report could help you. 
Arne> The next time when I will find an error in R-devel I won't report it.

No; please do "report" the problem, which may be useful for
development, but please do *NOT* use the bug repository, and
probably don't assume it's a bug in R, unless you have quite a
bit experience about R bugs and non-bugs.

Instead, just send e-mail to R-devel and explain,
and you may actually helping R development, particularly if you
are willing to investigate some details that we ma ask you
about.


>> Others are not seeing this, so we cannot do anything
>> about the problems=20 seen on your system.  This is not
>> at all a new test, and although random=20 it is run with
>> set.seed(1). I can reproduce the result in the output
>> file= =20 (on my systems) exactly by
>> 
>> > set.seed(1)
>> > hist(replicate(100, mean(rexp(10
>> 
>> Please see if you can debug it on your own system.  (My guess would be=20
>> that it only occurs as part of the test file.)

Arne> Yes, that's exactly the case. If you want any further
Arne> information please don't hesitate to contact
Arne> me. Otherwise I won't bother you anymore with this
Arne> issue.

Too bad.
It might have been interesting to see what

 set.seed(1)
 replicate(100, mean(rexp(10)))

or also

 set.seed(1)
 hist(replicate(100, mean(rexp(10
 traceback()
 ##^

gives on your R-devel installation.
That's why Brian Ripley helped you by mentioning 'set.seed(1)'.

Regards,
Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Help page of "par"

2005-11-26 Thread Martin Maechler
Thank you, Berwin.

You are definitely right,
and I have committed a fix to R-patched and R-devel.

Maybe  help(par)  has been just too long a document to be really read .. ;-)

Martin

> "BeT" == Berwin A Turlach <[EMAIL PROTECTED]>
> on Sun, 27 Nov 2005 00:51:51 +0800 writes:

BeT> Dear all,
BeT> the second paragraph on the value returned by par() on the help page
BeT> of par says:

BeT> When just one parameter is queried, the value is a character
BeT> string. When two or more parameters are queried, the result is a
BeT> list of character strings, with the list names giving the
BeT> parameters.

BeT> But this does not seem to be correct:

>> par("lty", "ask", "lwd", "oma")
BeT> $lty
BeT> [1] "solid"

BeT> $ask
BeT> [1] FALSE

BeT> $lwd
BeT> [1] 1

BeT> $oma
BeT> [1] 0 0 0 0

BeT> Only the first one is a character string, the other ones are a
BeT> logical, a number and a vector of numbers, respectively.  Should it
BeT> rather be something like (also in view of the next sentence):

BeT> When just one parameter is queried, the value of that parameter
BeT> is returned as a vector.  When two or more parameters are
BeT> queried, their values are returned in a list, with the list names
BeT> giving the parameters.

BeT> Cheers,

BeT> Berwin

BeT> __
BeT> R-devel@r-project.org mailing list
BeT> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] as.data.frame() : needs "..." ?!

2005-11-28 Thread Martin Maechler
   [diverted from R-help to R-devel]

> "Gabor" == Gabor Grothendieck <[EMAIL PROTECTED]>
> on Sun, 27 Nov 2005 14:16:34 -0500 writes:

  <>

Gabor> making use of as.data.frame.table we can shorten that
Gabor> slightly to just:

Gabor> as.data.frame.table(table(Species = iris$Species),
Gabor> responseName = "Count")

Gabor> Incidently, I just noticed that there is an
Gabor> inconsistency between as.data.frame and
Gabor> as.data.frame.table making it impossible to shorten
Gabor> as.data.frame.table to as.data.frame in the above due
Gabor> to the responseName= argument which is not referenced
Gabor> in the generic.

>> args(as.data.frame)
Gabor> function (x, row.names = NULL, optional = FALSE)
Gabor> NULL
>> args(as.data.frame.table)
Gabor> function (x, row.names = NULL, optional = FALSE, Gabor> responseName 
= "Freq")
Gabor> NULL

  {If you used  str() instead of args()  ,  you wouldn't get the
   superfluous extra 'NULL' line }

I think this is an example where we (R-core) haven't followed
our own recommendations, namely, that  generic functions (and
methods) need to have a (trailing) "..." argument
just so that new methods can have further arguments.

I'm wondering a bit... 
or could there be a good reason in the present case,
why this hasn't been done?

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] all.equal() for mismatching names {was "Enlightenment sought..."}

2005-12-02 Thread Martin Maechler
>>>>> "BeT" == Berwin A Turlach <[EMAIL PROTECTED]>
>>>>> on Fri, 2 Dec 2005 18:31:13 +0800 writes:

BeT> First, I recently had reasons to read the help page of as.vector() and
BeT> noticed in the example section the following example:

BeT> x <- c(a = 1, b = 2)
BeT> is.vector(x)
BeT> as.vector(x)
BeT> all.equal(x, as.vector(x)) ## FALSE

actually 'FALSE' was never the case,  but "non-TRUE" once was, see below.

BeT> However, in all versions of R in which I executed this example, the
BeT> all.equal command returned TRUE which suggest that either the comment
BeT> in the help file is wrong or the all.equal/as.vector combination does
BeT> not work as intended in this case.  For the former case, I attach
BeT> below a patch which would fix vector.Rd.

We recently had the following posting on R-devel
https://stat.ethz.ch/pipermail/r-devel/2005-October/034962.html
(Subject: [Rd] all.equal() improvements (PR#8191))
where Andrew Piskorsky proposed a (quite
extensive) patch to all.equal()  in order to  make sure that
things like names must match for all.equal() to return TRUE.


I did agree back then, and Brian partly disagreed with the very
valid argument that all.equal() has been used in code testing
(particularly R CMD check for packges), and that changes to make all.equal()
more "picky" might well have bad consequences for package
testing.  Also Andy didn't provide the necessary patches to the
documentation that would have been entailed.  
Well, all that's just an excuse for the fact that I had really
lost the topic out of sight ;-)

However, I'd like to take up the case, and I believe we should
fix all.equal() for at at least the following reasons:

1- logical consistency

2- earlier R versions were more picky about name mismatch
   (upto R version 1.6.2) :

  > x <- c(a=1, b=pi); all.equal(x, as.vector(x))
  [1] "names for target but not for current"
  [2] "TRUE"

3- two versions of S-plus were more picky too,
   in particular, S+3.4 which used to be our prototype:
 
   > x <- c(a=1, b=pi); all.equal(x, as.vector(x))
   [1] "names for target but not for current"
   attr(, "continue"):
   [1] T

   Here's Splus 6.2 :

   > x <- c(a=1, b=pi); all.equal(x, as.vector(x))
   [1] "target, current classes differ: named : numeric"
   [2] "class of target is \"named\", class of current is \"numeric\" (coercing 
target to class of current)"

----

I really don't expect package checkings to fail because of a
change.
If some would start failing, a fix should be quiet simple for
the package author and would help find inconsistencies in their
own code IMO.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] t() dropping NULL dimnames {was "all.equal() for mismatching names..."}

2005-12-02 Thread Martin Maechler
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Fri, 2 Dec 2005 14:44:22 +0100 writes:

>>>>> "BeT" == Berwin A Turlach <[EMAIL PROTECTED]>
>>>>> on Fri, 2 Dec 2005 18:31:13 +0800 writes:

BeT> First, I recently had reasons to read the help page of as.vector() and
BeT> noticed in the example section the following example:

BeT> x <- c(a = 1, b = 2)
BeT> is.vector(x)
BeT> as.vector(x)
BeT> all.equal(x, as.vector(x)) ## FALSE

   MM> actually 'FALSE' was never the case,  but "non-TRUE" once was, see below.

BeT> However, in all versions of R in which I executed this example, the
BeT> all.equal command returned TRUE which suggest that either the comment
BeT> in the help file is wrong or the all.equal/as.vector combination does
BeT> not work as intended in this case.  For the former case, I attach
BeT> below a patch which would fix vector.Rd.

MM> We recently had the following posting on R-devel
MM> https://stat.ethz.ch/pipermail/r-devel/2005-October/034962.html
MM> (Subject: [Rd] all.equal() improvements (PR#8191))
MM> where Andrew Piskorsky proposed a (quite
MM> extensive) patch to all.equal()  in order to  make sure that
MM> things like names must match for all.equal() to return TRUE.

I'm testing the first part of Andy's proposition
{the 2nd part was about making the result strings more informative for
 the case where all.equal() does *not* return TRUE}.

Interestingly, it did break 'make check' and because of a
somewhat subtle reason;  
something we could consider an other (typically inconsequential) 
inconsistency :

t() drops dimnames when they are list(NULL,NULL) 
and has been doing so at least since R version 1.0.0 :

 x <- cbind(1:2, 2:1); dimnames(x) <- list(NULL, NULL) 
 identical(x, t(x))  ## -> FALSE !
 str(t(x)) # "no dimnames" (i.e. dimnames(x) === NULL)

Now I'm looking into changing that one

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] computing the variance

2005-12-05 Thread Martin Maechler
{from R-help, diverted to R-devel}:

UweL> Wang Tian Hua wrote:

UweL> hi, when i was computing the variance of a simple
UweL> vector, i found unexpect result. not sure whether it
UweL> is a bug.

UweL> Not a bug! ?var:

UweL> "The denominator n - 1 is used which gives an unbiased
UweL>  estimator of the (co)variance for
UweL>  i.i.d. observations."


UweL> > var(c(1,2,3))
UweL> [1] 1  #which should be 2/3.
UweL> > var(c(1,2,3,4,5))
UweL> [1] 2.5 #which should be 10/5=2
UweL> 
UweL> it seems to me that the program uses (sample size -1) instead of 
sample 
UweL> size at the denominator. how can i rectify this?

UweL> Simply change it by:

UweL> x <- c(1,2,3,4,5)
UweL> n <- length(x)
UweL> var(x)*(n-1)/n

UweL> if you really want it.

It seems Insightful at some point in time have given in to
this user request, and S-plus nowadays has
an argument  "unbiased = TRUE"
where the user can choose {to shoot (him/her)self in the leg and}
require 'unbiased = FALSE'.
{and there's also 'SumSquraes = FALSE' which allows to not
 require any division (by N or N-1)}

Since in some ``schools of statistics'' people are really still
taught to use a 1/N variance, we could envisage to provide such an
argument to var() {and cov()} as well.  Otherwise, people define
their own variance function such as  
  VAR <- function(x,) .. N/(N-1)*var(x,...)
Should we?

BTW: S+ even has the 'unbiased' argument for cor() where of course it
really doesn't make any difference (!), and actually I think is
rather misleading, since the sample correlation is not unbiased
in almost all cases AFAICS.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] typo in `eurodist'

2005-12-08 Thread Martin Maechler
> "Torsten" == Torsten Hothorn <[EMAIL PROTECTED]>
> on Thu, 8 Dec 2005 08:51:57 +0100 (CET) writes:

Torsten> On Wed, 7 Dec 2005, Prof Brian Ripley wrote:

>> I've often wondered about that.

Torsten> and the copy editor did too :-)

>> I've presumed that the names were
>> deliberate, so have you checked the stated source?  It's not readily
>> available to me (as one would expect in Oxford)?

Torsten> our library doesn't seems to have a copy of `The Cambridge
Torsten> Encyclopaedia', so I can't check either. Google has 74.900 hits for
Torsten> `Gibralta' (more than one would expect for a typo, I think)
Torsten> and 57.700.000 for `Gibraltar'.

Torsten> So maybe both spellings are in use.

Well,  do you expect web authors to have a much lower rate of
typos than 1:770 ?
My limited experience on "google voting for spelling correction"
has rather lowered my expectation on webauthors' education in
orthography...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] extension to missing()? {was "hist() ... helpful warning? (PR#8376)"}

2005-12-12 Thread Martin Maechler
   [taken off R-bugs as a non-bug]

> "AndrewC" == clausen  <[EMAIL PROTECTED]>
> on Sun, 11 Dec 2005 08:40:01 +0100 (CET) writes:

AndrewC> Hi Brian,
AndrewC> On Sun, Dec 11, 2005 at 04:34:50AM +, Prof Brian Ripley wrote:
>> Did you check the help page?  ?plot.histogram shows plot.histogram has a 
>> 'freq' argument, and the correct usage is
>> 
>> plot(hist(x), freq=FALSE)

AndrewC> Ah, thanks for the explanation.

AndrewC> I didn't occur to me to check the plot.histogram()
AndrewC> help page.  

[ even though it's prominently mentioned on  help(hist)  ?? ]

AndrewC> Besides, even if I had read it, I still don't think
AndrewC> the semantics would have been clear to me without
AndrewC> additional experimentation.

AndrewC> Perhaps it might be helpful to document in the
AndrewC> hist() help page which attributes are stored in the
AndrewC> hist() object.  
you mean the 'histogram' object.

Yes, that might be helpful; diffs against
  https://svn.R-project.org/R/trunk/src/library/graphics/man/hist.Rd
are welcome.

AndrewC> Alternatively/additionally, hist()
AndrewC> could emit a warning or error if plot=FALSE and
AndrewC> irrelevant (non-stored) attributes are set.

interesting proposal.
I've looked at it for a bit, and found that it seems not to be
doable both easily and elegantly, at least not along the first
line I've tried, and so I think it raises a slightly more
general somewhat interesting problem:

Since *most* arguments of hist.default, including '...' are only
made use of when plot = TRUE, and the code with the warning would
have to look at all of them, and we want to have a nicely
maintainable solution, I had wanted to have a solution which
looks at {almost} all formals() and which of them are missing().
Since formals() is a list,
is.miss <- lapply(formals(), missing)
was the one I've tried but failed with
 Error in lapply(fm, missing) : 2 arguments passed to 'missing' which requires 1

which might be a bit astonishing {missing is Primitive though..}
and of course
is.miss <- lapply(formals(), function(n) missing(n))
``works'' but trivially {why ?} and hence not usefully.

I've needed to make use of eval and substitute in order to make
use of missing() here.
Hence, I'm wondering if we maybe could generalize missing()
by something like   missing(all.formals = TRUE)  {or better syntax}
which would make the following a bit easier.

Here's a context diff of my working version of hist.default()
which implements the above proposal:

--- hist.R  (Revision 36695)
+++ hist.R  (working copy)
@@ -108,7 +108,19 @@
 axes = axes, labels = labels, ...)
invisible(r)
 }
-else r
+else { ## plot is FALSE
+nf <- names(formals()) ## all formals but those 4:
+nf <- nf[match(nf, c("x", "breaks", "nclass", "plot"), nomatch=0) == 0]
+missE <- lapply(nf, function(n)
+substitute(missing(.), list(. = as.name(n
+not.miss <- ! sapply(missE, eval, envir = environment())
+if(any(not.miss))
+warning(sprintf(ngettext(sum(not.miss),
+ "argument %s is not made use of",
+ "arguments %s are not made use of"),
+paste(sQuote(nf[not.miss]), collapse=", ")))
+r
+}
 }
 
 plot.histogram <-

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] correct address for R-bugs ..

2005-12-13 Thread Martin Maechler
>>>>> "Greg" == Gregory Warnes <[EMAIL PROTECTED]>
>>>>> on Mon, 12 Dec 2005 14:03:00 -0500 writes:

Greg> I got an email error message when I attempted to
Greg> send this from my work account.  I have manually
Greg> added it to the bug tracker, and am resending from
Greg> my personal account.


Greg> -G

Greg> On 12/12/05, Warnes, Gregory R <[EMAIL PROTECTED]> wrote:
>> 
>> 
>> 
>> >  -Original Message-
>> > From: Warnes, Gregory R
>> > Sent: Monday, December 12, 2005 1:53 PM
>> > To:   '[EMAIL PROTECTED]'
^^^

Can you tell where you took this address from?

We'd very much like that R bug reports be sent to
     [EMAIL PROTECTED]

(from where they are forwarded to the repository in Denmark,
 *after* having been virus- and spam-filtered).

Martin Maechler, ETH ZUrich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SVN-REVSION altered when building R-devel out of tree from last snapshot

2005-12-20 Thread Martin Maechler
> "Herve" == Herve Pages <[EMAIL PROTECTED]>
> on Mon, 19 Dec 2005 17:10:58 -0800 writes:

Herve> Hi,
Herve> Today I downloaded and compiled the last R-devel snapshot.
Herve> The SVN-REVISION in the tarball contains the following:

Herve> Revision: 36792
Herve> Last Changed Date: 2005-12-18

Herve> But after compiling on Unix (I compiled out of tree), 

i.e. "in a separate build directory tree"

Herve> I ended up with an SVN-REVSION file containing:

Herve> Revision: unknown
Herve> Last Changed Date: Today

Herve> in the build tree.

I can confirm this wrong behavior (Linux Redhat EL4).
There must be something not yet perfect in our 'make' setup
there.  If we are not in the srcdir, we create a 'non-tarball'
file which I think is wrong;  in any case, this is buglet we'll fix.

Thank you, Herve!

Herve> Then when I start R, I get:

Herve> R : Copyright Today, The R Foundation for Statistical Computing
Herve> Version 2.3.0 Under development (unstable) (Today-Today-Today)
Herve> ISBN 3-900051-07-0

Herve> even if I naively edit the SVN-REVISION in the build tree before to
Herve> start R.

Herve> I got this problem on a 64-bit SUSE Linux 9.2, a 32-bit SUSE Linux 
9.2
Herve> and a Solaris 2.9 sparc system.
Herve> On Windows however (where I built R directly in the source tree) I 
don't
Herve> have this problem.

Herve> We need to update R-devel on our various build machines in order to 
test
Herve> Bioconductor devel packages with last R-devel and we try to have the 
exact
Herve> same R revision number on every test-machine. Last time I updated 
R-devel
Herve> was 12/01/2005 and I used the same procedure that I

[ you mean 12th of January? ;-)  {yes, it would help to use
  international 2005-12-01 or then Dec 01, 2005}
]

Herve> am using today but
Herve> I didn't have the SVN-REVISION problem.

Herve> Also I didn't try to build R-devel from SVN. Maybe
Herve> this could solve the problem.

that would definitely solve it, since that's what all of R-core
do "all the time".  
But the way you did, should also work; that's what the tarballs
are for!

Herve> It's just that using the tarball was easier to manage.
Herve> Anyway I thought it might be worth reporting.

Definitely.
Thank you again, Hervé !

Herve> Regards,

Herve> Hervé
 

Herve> -- 
Herve> 
Herve> Hervé Pagès
Herve> E-mail: [EMAIL PROTECTED]
Herve> Phone: (206) 667-5791
Herve> Fax: (206) 667-1319

Herve> __
Herve> R-devel@r-project.org mailing list
Herve> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] pmin(), pmax() - slower than necessary for common cases

2005-12-20 Thread Martin Maechler
A few hours ago, I was making a small point on the R-SIG-robust
mailing list on the point that  ifelse() was not too efficient
in a situation where  pmax() could easily be used instead.

However, this has reminded me of some timing experiments that I
did 13 years ago with S-plus -- where I found that pmin() /
pmax() were really relatively slow for the most common case
where they are used with only two arguments {and typically one
of the arguments is a scalar; but that's not even important here}.
The main reason is that the function accept an arbitrary number
of arguments and that they do recycling.
Their source is at
  https://svn.R-project.org/R/trunk/src/library/base/R/pmax.R

In April 2001 (as I see), I had repeated my timings with R (1.2.2)
which confirmed the picture more or less,  but for some reason I
never drew "proper" consequences of my findings.
Of course one can argue  pmax() & pmin() are still quite fast
functions; OTOH the experiment below shows that -- at least the
special case with 2 (matching) arguments could be made faster by
about a factor of 19 ...

I don't have yet a constructive proposition; just note the fact that

  pmin. <- function(k,x) (x+k - abs(x-k))/2
  pmax. <- function(k,x) (x+k + abs(x-k))/2

are probably the fastest way of computing  pmin() and pmax() of
two arguments {yes, they "suffer" from rounding error of about 1
to 2 bits...} currently in R. 
One "solution" could be to provide  pmin2() and pmax2()
functions based on trival .Internal() versions.

The experiments below are for the special case of  k=0  where I
found the above mentioned factor of 19 which is a bit
overoptimistic for the general case; here is my  pmax-ex.R  source file
(as text/plain attachment ASCII-code --> easy cut & paste)
demonstrating what I claim above.

 Martin Maechler, Aug.1992 (S+ 3.x) --- the same applies to R 1.2.2

 Observation:  (abs(x) + x) / 2  is  MUCH faster than  pmax(0,x) !!

 The function  pmax.fast below is  very slightly slower than (|x|+x)/2

### "this" directory  --- adapt!
thisDir <- "/u/maechler/R/MM/MISC/Speed"


### For R's source,
###  egrep 'pm(ax|in)'  src/library/*/R/*.R
### shows you that most uses of  pmax() / pmin() are really just with arguments
###  ( ,  )  where the fast versions would be much better!

pmax.fast <- function(scalar, vector)
{
 ## Purpose: FAST substitute for pmax(s, v) when length(s) == 1
 ## Author: Martin Maechler, Date: 21 Aug 1992 (for S)
 vector[ scalar > vector ] <- scalar
 vector
}
## 2 things:
##   1) the above also works when 'scalar' == vector of same length
##   2) The following is even (quite a bit!) faster :

pmin. <- function(k,x) (x+k - abs(x-k))/2
pmax. <- function(k,x) (x+k + abs(x-k))/2


### The following are small scale timing simulations which confirm:

N <- 20 ## number of experiment replications
kinds <- c("abs", "[.>.]<-", "Fpmax", "pmax0.", "pmax.0")
Tmat <- matrix(NA, nrow = N, ncol = length(kinds),
   dimnames = list(NULL, kinds))
T0 <- Tmat[,1] # `control group'
n.inner <- 800 # should depend on the speed of your R / S  (i.e. CPU)

set.seed(101)
for(k in 1:N) {
cat(k,"")
## no longer set.seed(101)
x <- rnorm(1000)
Tmat[k, "pmax0."] <- system.time(for(i in 1:n.inner)y <- pmax(0,x))[1]
Tmat[k, "pmax.0"] <- system.time(for(i in 1:n.inner)y <- pmax(x,0))[1]
Tmat[k,"abs"] <- system.time(for(i in 1:n.inner)y <- (x + abs(x))/2)[1]
Tmat[k,"[.>.]<-"] <- system.time(for(i in 1:n.inner)y <-{x[0 > x] <- 
0;x})[1]
Tmat[k,  "Fpmax"] <- system.time(for(i in 1:n.inner)y <- pmax.fast(0,x))[1]
}

save(Tmat, file = file.path(thisDir, "pmax-Tmat.rda"))

###-- Restart here {saving simulation/timing}:

if(!exists("Tmat"))
load(file.path(thisDir, "pmax-Tmat.rda"))

(Tm <- apply(Tmat, 2, mean, trim = .1))
##  abs  [.>.]<-Fpmax   pmax0.   pmax.0
## 0.025625 0.078750 0.077500 0.488125 0.511250
round(100 * Tm / Tm[1])
## abs [.>.]<-   Fpmax  pmax0.  pmax.0
## 100 307 30219051995
## earlier:
##  abs [.>.]<-   Fpmax  pmax0.  pmax.0
##  100 289 34418041884


## pmax0. is really a bit faster than pmax.0 :
## P < .001 (for Wilcoxon; outliers!)
t.test(Tmat[,4], Tmat[,5], paired = TRUE)
t.test(Tmat[,4], Tmat[,5], paired = FALSE)# since random samples
wilcox.test(Tmat[,4], Tmat[,5])# P = 0.00012 {but ties -> doubt}

boxplot(data.frame(Tmat, check.names = FALSE),
notch = TRUE, ylim = range(0,Tmat),
main = "CPU times used for versions of pmax(0,x)")
mtext(paste("x <- rnorm(1000)","  ", N

[Rd] R-bugs e-mail {was ... (Debian Bug 344248): ...}

2005-12-21 Thread Martin Maechler
PLEASE, PLEASE:
do use
[EMAIL PROTECTED]
and nothing else
(It will go to Kopenhagen alright currently,
 but if we could ensure everyone used the above address,
 it would become quite a bit easier to prevent most spam to get
 into the R bug repository)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'sessionInfo()' instead of 'version'

2005-12-29 Thread Martin Maechler
> "roger" == roger koenker <[EMAIL PROTECTED]>
> on Thu, 29 Dec 2005 14:07:19 -0600 writes:

roger> In a private response to Tony Plate's suggestion to
roger> replace version() output with sessionInfo() in R-help
roger> requests,

>> roger koenker wrote:
>>> Thanks for this, it would seem useful to have version
>>> numbers for the packages too?

roger> and Tony replied,
>>  Sounds sensible to me!  If I were you I'd send a message
>> to R-devel suggesting this.  AFAIK, some changes to
>> sessionInfo() are already being considered, so this is a
>> good time to suggest that.

roger> So, for what it is worth

but the version numbers of the non-standard packages are
*there* -- so what do you mean ?

  > sessionInfo()
  R version 2.2.1, 2005-12-20, x86_64-unknown-linux-gnu 

  attached base packages:
  [1] "graphics"  "grDevices" "datasets"  "utils" "methods"   "stats"
  [7] "base" 

  other attached packages:
   cluster fortunes  sfsmisc 
  "1.10.2"  "1.2-0" "0.95-2" 
  >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] all.equal() improvements (PR#8191)

2006-01-02 Thread Martin Maechler
>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Mon, 2 Jan 2006 20:39:18 + (GMT) writes:

BDR> Martin,
BDR> I have some tests running over CRAN now (RUnit has also failed), 

thank you, Brian, for the feedback

BDR> but have  already noticed things like

>> swiss[, 1] -> x
>> names(x) <- rownames(swiss)
>> all.equal(x, x[1:10])
BDR> [1] "Names: Lengths (47, 10) differ (string compare on first 10)"
BDR> [2] "Numeric: lengths (47, 10) differ"

BDR> which is telling me the obvious, with the result that the reports from 
BDR> e.g. rpart are cluttered to the detriment of legibility.

BDR> I think we need to think harder about what should be reported when the
BDR> objects differ in mode or length.

I agree;  the above is good example.
[OTOH, I don't think the above behavior to be a complete show
 stopper; since it's somewhat close to the way S-plus does things]

BDR> Brian

BDR> On Mon, 2 Jan 2006 [EMAIL PROTECTED] wrote:

>> I'm "happy" to have found the first problem myself:
>> 
>> 'Matrix' doesn't pass R CMD check  anymore with the change I had
>> committed:

BDR> I am seeing a problem in setGeneric which stops it being installed.

{ah yes;  for R-devel you need the "next" version of Matrix
 the important part of which I'll commit shortly to R-packages;  it
 will take another day before I'll upload it to CRAN}


>> Basically because of this:
>> 
>> > all.equal(cbind(1:5), matrix(1:5, 5,1, dimnames=list(NULL,NULL)))
>> [1] "Attributes: < Names: Lengths (1, 2) differ (string compare on first 
1) >"
>> [2] "Attributes: < Length mismatch: comparison on first 1 components >"
>> 
>> This new behavior is "S-compatible" insofar as S-plus 6.1 also
>> returns non-TRUE.
>> 
>> Is this what we want?
>> {we'll see soon how many other CRAN packages are having problems for it}
>> 
>> In my intuition, I'd have liked all.equal()  to return TRUE for the 
above,
>> since in principle,  dimnames = NULL  or dimnames = list(NULL,NULL)
>> is a trivial difference.
>> OTOH, it will need "special case" code to assure this, and I
>> wonder if that's worth it.
>> 
>> Please comment!
>> Martin
>> 
>>>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>>>> on Mon,  2 Jan 2006 18:00:15 +0100 (CET) writes:
>> 
MM> I've now finally finalized my work on a subset of Andy's
MM> propositions, and committed it to R-devel.
>> 
MM> The current change doesn't show in our own checks and
MM> examples, but may well in other people's package checks.
MM> For this reason, I've also added a line to the
MM> 'USER-VISIBLE CHANGES' part of the NEWS file.

BDR> -- 
BDR> Brian D. Ripley,  [EMAIL PROTECTED]
BDR> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
BDR> University of Oxford, Tel:  +44 1865 272861 (self)
BDR> 1 South Parks Road, +44 1865 272866 (PA)
BDR> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Pb with agrep()

2006-01-05 Thread Martin Maechler
>>>>> "Herve" == Herve Pages <[EMAIL PROTECTED]>
>>>>> on Wed, 04 Jan 2006 17:29:35 -0800 writes:

Herve> Happy new year everybody,
Herve> I'm getting the following while trying to use the agrep() function:

>> pattern <- "XXX"
>> subject <- c("oo", "oooXooo", "oooXXooo", "oooXXXooo")
>> max <- list(ins=0, del=0, sub=0) # I want exact matches only
>> agrep(pattern, subject, max=max)
Herve> [1] 4

Herve> OK

>> max$sub <- 1 # One allowed substitution
>> agrep(pattern, subject, max=max)
Herve> [1] 3 4

Herve> OK

>> max$sub <- 2 # Two allowed substitutions
>> agrep(pattern, subject, max=max)
Herve> [1] 3 4

Herve> Wrong!

No. 
You have overlooked the fact that 'max.distance = 0.1' (10%) 
*remains* the default, even when 'max.distance' is specified as
a list as in your example [from  "?agrep" ] :

>> max.distance: Maximum distance allowed for a match.  Expressed either
>>   as integer, or as a fraction of the pattern length (will be
>>   replaced by the smallest integer not less than the
>>   corresponding fraction), or a list with possible components
>> 
>>   'all': maximal (overall) distance
>> 
>>   'insertions': maximum number/fraction of insertions
>> 
>>   'deletions': maximum number/fraction of deletions
>> 
>>   'substitutions': maximum number/fraction of substitutions
>> 
>>>>>>   If 'all' is missing, it is set to 10%, the other components
>>>>>>   default to 'all'.  The component names can be abbreviated. 

If you specify max$all as "100%", i.e, as 0.  ('< 1' !)  everything works
as you expect it:

agrep(pattern, subject, max = list(ins=0, del=0, sub= 2, all = 0.))
## --> 2 3 4


Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Using gcc4 visibility features

2006-01-05 Thread Martin Maechler
> "elijah" == elijah wright <[EMAIL PROTECTED]>
> on Thu, 5 Jan 2006 09:13:15 -0600 (CST) writes:

>> Subject: [Rd] Using gcc4 visibility features
>> 
>> R-devel now makes use of gcc4's visibility features: for
>> an in-depth account see
>> 
>> http://people.redhat.com/drepper/dsohowto.pdf


elijah> does this mean that we now have a dependency on
elijah> gcc4, or just that it "can" use the feature of gcc4?

the latter (of course!)

elijah> clarification, please.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] prod(numeric(0)) surprise

2006-01-08 Thread Martin Maechler
> "Ben" == Ben Bolker <[EMAIL PROTECTED]>
> on Sun, 08 Jan 2006 21:40:05 -0500 writes:

Ben> Duncan Murdoch wrote:
>> On 1/8/2006 9:24 PM, Ben Bolker wrote:
>> 
>>> It surprised me that prod(numeric(0)) is 1.  I guess if
>>> you say (operation(nothing) == identity element) this
>>> makes sense, but ??
>> 
>> 
>> What value were you expecting, or were you expecting an
>> error?  I can't think how any other value could be
>> justified, and throwing an error would make a lot of
>> formulas more complicated.
>> 
>>>
>> 
>> 
>> That's a fairly standard mathematical convention, which
>> is presumably why sum and prod work that way.
>> 
>> Duncan Murdoch

Ben>OK.  I guess I was expecting NaN/NA (as opposed to
Ben> an error), but I take the "this makes everything else
Ben> more complicated" point.  Should this be documented or
Ben> is it just too obvious ... ?  (Funny -- I'm willing to
Ben> take gamma(1)==1 without any argument or suggestion
Ben> that it should be documented ...)

see?  so it looks to me as if you have finally convinced
yourself that '1' is the most reasonable result.. ;-)

Anyway, I've added a sentence to help(prod)  {which matches
the sentence in help(sum), BTW}.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] ouml in an .Rd

2006-01-10 Thread Martin Maechler
> "PaulG" == Paul Gilbert <[EMAIL PROTECTED]>
> on Mon, 09 Jan 2006 15:27:12 -0500 writes:

PaulG> (moved from r-help) Ok, UTF-8 works on some of my
PaulG> machines and latin1 on others. If I use one I get
PaulG> failure or spurious characters when I build on the
PaulG> wrong machine. Are .Rd files suppose to work on
PaulG> different platforms when there are special
PaulG> characters, 

yes, they are. That's why we have \encoding{} and \enc{}
nowadays, and the "Writing R Extensions" manual has been
documenting this for a while, currently [an excerpt:]

 >> 2.10 Encoding
 >> =
 >> 
 >> `Rd' files  are text files  and so it  is impossible to  deduce the
 >> encoding they are written in: ASCII, UTF-8, Latin-1, Latin-9 _etc_.  So
 >> the  `\encoding{}' directive  must  be  used  to specify  the
 >> encoding: if not present the processing to HTML assumes that the file is
 >> in Latin-1 (ISO-8859-1).   This is used when creating  the header of the
 >> HTML conversion  and to make a  comment in the examples  file.  It is
 >> also used to indicate to LaTeX how to process the file (see below).
 >> 
 >>Wherever possible, avoid non-ASCII chars in `Rd' files.
 >> 
 >>For convenience, encoding names `latin1' and `latin2' are always
 >> recognized: these and `UTF-8' are likely to work fairly widely.

 >> 
 >> 


I'm a bit surprised that you haven't succeeded finding this
information in the extension manual.  
After all, it's  *the*  R manual for package writers.

Martin

PaulG> or is this a known limitation?

(not at all)

PaulG> Paul

PaulG> Prof Brian Ripley wrote:

>> It means what it says: you need to put the actual
>> character in the file, and specify the encoding for the
>> file via \encoding.  (For you, UTF-8 or latin1, I would
>> guess.)
>> 
>> It's not a question of trying variations, rather of
>> following instructions.
>> 
>> On Fri, 6 Jan 2006, Paul Gilbert wrote:
>> 
>>> I am trying to put an ouml in an .Rd file with no
>>> success. Writing R Extensions suggests:
>>> 
>>> Text which might need to be represented differently in
>>> different encodings should be marked by |\enc|,
>>> e.g. |\enc{Jöreskog}{Joreskog}| where the first argument
>>> will be used where encodings are allowed and the second
>>> should be ASCII (and is used for e.g. the text
>>> conversion).
>>> 
>>> (Above may get mangled by the mail.) I have tried
>>> variations
>>> 
>>> \enc{J"oreskog}{Joreskog} \enc{J\"oreskog}{Joreskog}
>>> \enc{Jo\"reskog}{Joreskog} \enc{Jo\"reskog}{Joreskog}
>>> \enc{J\"{o}reskog}{Joreskog}
>>> \enc{J\\"{o}reskog}{Joreskog}
>>> \enc{Jöoreskog}{Joreskog}
>>> 
>>> all with no effect on the generated pdf file.
>>> Suggestions would be appreciated.
>>> 
>>> Thanks, Paul Gilbert
>>> 
>>> __
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>> read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>> 
>>

PaulG> __
PaulG> R-devel@r-project.org mailing list
PaulG> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] eigen()

2006-01-10 Thread Martin Maechler
> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
> on Tue, 10 Jan 2006 15:01:00 + (GMT) writes:

BDR> I haven't seen most of this thread, but this is a classic case of 
passing 
BDR> integers instead of doubles.  And indeed

BDR> else if(is.numeric(x)) {
BDR> storage.mode(x) <- "double"

BDR> has been removed from eigen.R in R-devel in r36952.  So that's the 
BDR> culprit.

and I am the culprit of that revision.  I'll fix this ASAP.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] "infinite recursion" in do.call when lme4 loaded only

2006-01-13 Thread Martin Maechler
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Fri, 13 Jan 2006 12:24:51 +0100 writes:

>>>>> "Dieter" == Dieter Menne <[EMAIL PROTECTED]>
>>>>> on Thu, 12 Jan 2006 18:14:32 + (UTC) writes:

Dieter> Peter Dalgaard  biostat.ku.dk> writes:
>>> > A larg program which worked with lme4/R about a year ago failed when I
>>> > re-run it today. I reproduced the problem with the program below.

>>> > -- When lme4 is loaded (but never used), the do.call fails
>>> >with infinite recursion after 60 seconds. Memory used increases
>>> >beyond bonds in task manager.
>>> 
>>> However, it surely has to do with methods dispatch:
>>> 
>>> > system.time(do.call("rbind.data.frame",caScore))
>>> [1] 0.99 0.00 0.99 0.00 0.00
>>> 
>>> which provides you with another workaround.

Dieter> Peter, I had increased the optional value already, but I still 
don't understand 
Dieter> what this recursion overflow has to do with the lm4 loading.

MM> Aahh, you've hit a secret ;-)  no, but a semi-hidden feature:
MM> lme4 loads Matrix and Matrix  activates versions of rbind() and
MM> cbind() which use rbind2/cbind2 which are S4 generics and
MM> default methods that are slightly different than then the
MM> original base rbind() and cbind(). 
MM> This was a necessity since the original rbind(), cbind() have
MM> first argument "...", i.e. an invalid signature for S4 method
MM> dispatch.

MM> This was in NEWS for R 2.2.0 :

MM> o   Experimental versions of cbind() and rbind() in methods package,
MM> based on new generic function cbind2(x,y) and rbind2().  This will
MM> allow the equivalent of S4 methods for cbind() and rbind() ---
MM> currently only after an explicit activation call, see ?cbind2.

MM> And 'Matrix' uses the activation call in its .OnLoad hook.
MM> This is now getting much too technical to explain for R-help, so
MM> if we want to go there, we should move this topic to R-devel,
MM> and I'd like to do so, and will be glad if you can provide more
MM> details on how exactly you're using rbind.

One thing -- very useful for you -- I forgot to add:

You can easily quickly revert the  "other cbind/rbind
activation" by using

methods:::bind_activation(FALSE)

so you don't need to unload lme4 or Matrix,  and you can
reactivate them again after your special computation by

methods:::bind_activation(on = TRUE)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] symbols function -- possible enhancements

2006-01-18 Thread Martin Maechler
Hi Jean,

now that you've been told  `the truth' ..  :

If you'd like to carefully look at symbols() and its help page and see
which arguments ('axes' but maybe more) would be useful to pass
to plot.default and if you provide enhanced versions of the two files
 https://svn.r-project.org/R/trunk/src/library/graphics/R/symbols.R
and  https://svn.r-project.org/R/trunk/src/library/graphics/man/symbols.Rd

I'll gladly look at them and incorporate them for R 2.3.0
(unless they break something)

Best regards,
Martin Maechler


>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Tue, 17 Jan 2006 23:15:19 + (GMT) writes:

BDR> On Tue, 17 Jan 2006, Thomas Lumley wrote:
>> On Tue, 17 Jan 2006, Jean Eid wrote:
>> 
>>> Hi
>>> 
>>> I do not get why the symbols function produces warnings when axes=F is
>>> added. The following example illustrate this
>>> 
>>>> symbols(0,10, inches=T, circles=1, axes=F, xlab="", ylab="")
>>> Warning message:
>>> parameter "axes" could not be set in high-level plot() function
>>> 
>>> 
>>> I augmented symbols and added the axes=F argument to the plot function
>>> inside the original symbols function. It works as expected, no warning
>>> message. I am just lost as to why the extra arguments in symbols (...)
>>> are not behaving as expected.
>>> 
>> 
>> The ... argument is also passed to .Internal, and presumably the code 
>> there gives the warning.

BDR> Indeed.  axes=F is not in the allowed list

BDR> ...: graphics parameters can also be passed to this function, as
BDR> can the plot aspect ratio 'asp' (see 'plot.window').

BDR> People confuse 'axes' with the graphics parameters, but it is in fact 
an 
BDR> argument to plot.default.  (The corresponding graphics parameters
BDR> xaxt and yaxt do work.)  R-devel gives a more informative message:

>> attach(trees)
>> symbols(Height, Volume, circles = Girth/24, inches = FALSE, axes=F)
BDR> Warning message:
BDR> "axes" is not a graphical parameter in: symbols(x, y, type, data, 
inches, 
BDR> bg, fg, ...)

BDR> We do ask people to read the help pages before posting for a good 
reason: 
BDR> the information is usually there in a more complete and accurate form 
than 
BDR> people remember.

BDR> -- 
BDR> Brian D. Ripley,  [EMAIL PROTECTED]
BDR> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
BDR> University of Oxford, Tel:  +44 1865 272861 (self)
BDR> 1 South Parks Road, +44 1865 272866 (PA)
BDR> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] natural sorting

2006-01-18 Thread Martin Maechler
> "Greg" == Warnes, Gregory R <[EMAIL PROTECTED]>
> on Tue, 17 Jan 2006 14:48:46 -0500 writes:

Greg> The 'mixedsort' function in the 'gtools' package does
Greg> this.  It is probably slower than the c version you
Greg> point to, but it is already working in R.

Thank you, Greg.

BTW, given the thread, this is a typical example where it might
be very useful to add the following two concepts to the 
   mixedsort.Rd file in gtools :

\concept{natural sort}
\concept{dictionary sort}

so that mixedsort() will be quickly found by help.search("natural sort")
and possibly also via the java search from the HTML help interface?
(I never use it; I use help.search() {or then RSiteSearch()}
 exclusively.)

Martin


>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] Behalf Of Andrew Piskorski
>> Sent: Thursday, January 12, 2006 10:40 AM
>> To: R Development Mailing List
>> Subject: Re: [Rd] natural sorting
>> 
>> 
>> On Wed, Jan 11, 2006 at 05:45:10PM -0500, Gabor Grothendieck wrote:
>> > It would be nifty to incorporate this into R or into an R package:
>> > 
>> > http://sourcefrog.net/projects/natsort/
>> 
>> Btw, I haven't looked at the implementation, but Tcl also contains
>> equivalent functionality, they call it dictionary sort:
>> 
>> http://tcl.activestate.com/man/tcl8.4/TclCmd/lsort.htm
>> 
>> -- 
>> Andrew Piskorski <[EMAIL PROTECTED]>
>> http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R: ecdf - linear

2006-01-18 Thread Martin Maechler
I'm replying to R-devel, the mailing list which should be used
to discuss R feature enhancements.

>>>>> "Norman" == Norman Warthmann <[EMAIL PROTECTED]>
>>>>> on Wed, 18 Jan 2006 11:33:22 +0100 writes:

Norman> .. 

Norman> Is there a specific reason why in the ecdf-function
Norman> the variable method="constant" is hard-coded?
yes, see below

Norman> I would like to use method="linear" and I have created
Norman> a new function based on yours just changing this and
Norman> it seems to work. I am now wondering whether you did
Norman> that on purpose? Maybe because there is problems
Norman> that are not obvious? If there aren't I would like
Norman> to put in a feature request to include the "method"-
Norman> argument into ecdf.

It can't be the way you did it:

The class "ecdf" inherits from class "stepfun" which is defined
to be "Step functions" and a step function *is* piecewise
constant (also every definition of ecdf in math/statistics
only uses a piecewise constant function).

Of course, it does make sense in some contexts to linearly
(or even "smoothly") interpolate an ecdf, one important context
being versions of "smoothed bootstrap", but the result is not a
proper ecdf anymore. 

I think you should rather define a function that takes an ecdf
(of class "ecdf" from R) as input
and returns a piecewise linear function {resulting from
approxfun() as in your example below}. However that result  may
*NOT* inherit from "ecdf" (nor "stepfun").

And for that reason {returning a different class}, this
extension should NOT become part of ecdf() itself.

If you write such a "ecdf -> interpolated_ecdf" transforming
function, it might be useful to include in the ecdf() help page
later, so "keep us posted".

Regards,
Martin Maechler, ETH Zurich



Norman> my changed function:

N>>   ecdf_linear<-function (x)
N>>   {
N>>x <- sort(x)
N>>n <- length(x)
N>>if (n < 1)
N>>stop("'x' must have 1 or more non-missing values")
N>>vals <- sort(unique(x))
N>>rval <- approxfun(vals, cumsum(tabulate(match(x,vals)))/n,  
N>>   method = "linear", yleft = 0, yright = 1, f = 0,ties = "ordered")
N>>class(rval) <- c("ecdf", "stepfun", class(rval))
N>>attr(rval, "call") <- sys.call()
N>>rval
N>>   }

N>>   test<-c(1,2,7,8,9,10,10,10,12,13,13,13,14)
N>>   constant<-ecdf(test)
N>>   linear<- ecdf_linear(test)
N>>   plot(constant(1:14),type="b")
N>>   points(linear(1:14),type="b",col="red")

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed pbirthday fix

2006-01-23 Thread Martin Maechler
> "ken" == ken knoblauch <[EMAIL PROTECTED]>
> on Mon, 23 Jan 2006 09:43:28 +0100 writes:

ken> Actually, since NaN's are also detected in na.action
ken> operations, a simpler fix might just be to use the
ken> na.rm = TRUE option of min

ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

Well, I liked your first fix better -- thank you for it! --
since it's always good practice to formulate such as to avoid
overflow when possible. 
All things considered, I think I'd go for

   upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

Martin 

Ken> Recent news articles concerning an article from The
Ken> Lancet with fabricated data indicate that in the sample
Ken> containing some 900 or so patients, more than 200 had the
Ken> same birthday.  I was curious and tried out the p and q
Ken> birthday functions but pbirthday could not handle 250
Ken> coincidences with n = 1000.  The calculation of upper
Ken> prior to using uniroot produces NaN,

Ken> upper<-min(n^k/(c^(k-1)),1)

Ken> I was able to get it to work by using logs, however, as
Ken> in the following version

>> function(n, classes = 365, coincident = 2){
>> k <- coincident
>> c <- classes
>> if (coincident < 2) return(1)
>> if (coincident > n) return(0)
>> if (n > classes * (coincident - 1)) return(1)
>> eps <- 1e-14
>> if (qbirthday(1 - eps, classes, coincident) <= n)
>> return(1 - eps)
>> f <- function(p) qbirthday(p, c, k) - n
>> lower <- 0
>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
>> nmin <- uniroot(f, c(lower, upper), tol = eps)
>> nmin$root
>> }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed pbirthday fix

2006-01-23 Thread Martin Maechler
>>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
>>>>> on Mon, 23 Jan 2006 11:52:55 +0100 writes:

>>>>> "ken" == ken knoblauch <[EMAIL PROTECTED]>
>>>>> on Mon, 23 Jan 2006 09:43:28 +0100 writes:

ken> Actually, since NaN's are also detected in na.action
ken> operations, a simpler fix might just be to use the
ken> na.rm = TRUE option of min

ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

MM> Well, I liked your first fix better -- thank you for it! --
MM> since it's always good practice to formulate such as to avoid
MM> overflow when possible. 
MM> All things considered, I think I'd go for

MM> upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

MM> Martin 

Ken> Recent news articles concerning an article from The
Ken> Lancet with fabricated data indicate that in the sample
Ken> containing some 900 or so patients, more than 200 had the
Ken> same birthday.  I was curious and tried out the p and q
Ken> birthday functions but pbirthday could not handle 250
Ken> coincidences with n = 1000.  The calculation of upper
Ken> prior to using uniroot produces NaN,

Ken> upper<-min(n^k/(c^(k-1)),1)

Ken> I was able to get it to work by using logs, however, as
Ken> in the following version

>>> function(n, classes = 365, coincident = 2){
..

>>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
>>> nmin <- uniroot(f, c(lower, upper), tol = eps)
>>> nmin$root
>>> }

Well, now after inspection, I think ``get it to work''
is a bit of an exaggeration, at least for a purist like me
(some famous fortune teller once guessed it may be because I'm ... Swiss)
who doesn't like to lose precision in probability computations
unnecessarily. One can do much better:

The version of [pq]birthday() I've just committed to R-devel *) now gives

> sapply(c(20,50,100,200), function(k) pbirthday(1000, coincident= k))
[1]  8.596245e-08  9.252349e-41 2.395639e-112 1.758236e-285

whereas the 'na.rm=TRUE' fix  would simply give

[1] 8.596245e-08 0.00e+00 0.00e+00 0.00e+00

--
Martin Maechler, ETH Zurich

*) peek at https://svn.r-project.org/R/trunk/src/library/stats/R/pbirthday.R

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] help with read.table() function

2006-01-30 Thread Martin Maechler
> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
> on Sun, 29 Jan 2006 16:35:50 -0500 writes:

Duncan> On 1/29/2006 1:29 PM, Prof Brian Ripley wrote:
>> On Sun, 29 Jan 2006, Marc Schwartz wrote:
>> 
>>> I would argue against this.
>>> 
>>> If this were the default, that is requiring user
>>> interaction, it would break a fair amount of code that I
>>> (and I am sure a lot of others have) where automation is
>>> critical.

>>  I don't see how.  The current default is
>> 
>>> read.table()
>> Error in read.table() : argument "file" is missing, with
>> no default
>> 
>> so the only change is that the default might do something
>> useful.
>> 
>> Nor do I see the change would help, as the same people
>> would still use a character string for 'file' and not
>> omit the argument.  (It seems very unlikely that they
>> would read any documentation that suggested things had
>> changed.)

Duncan> No, but people teaching new users (or answering
Duncan> R-help questions) would have a simpler answer: just
Duncan> use read.table().

but I am not sure that people teaching R should advocate such a 
read.table;  
I they did, the new R users would get the concept that this is
the way how to use R.
I still think R should eventually be used for "Programming with Data"
rather than a GUI for ``clicking results together''.
Hence users should be taught (in the 2nd or 3rd part, not the
1st one of their introduction to R)
to work with R scripts, writing functions etc.

And similar to Marc, I would never want default behavior to
start up a GUI elements: It is also much more error-prone; just
consider the  "choose CRAN mirror" GUI that we had recently
introduced, and the many questions and "bug" reports it produced.

I know that I am biased in my views here;
but I strongly advocate the  "useRs becoming programmeRs" theme
and hence rather keep R consistent as a programming language,
partly agreeing with Gabor here.

>> The same issue could be made over scan(), where the
>> current default is useful.

Duncan> scan() is very useful for small reads, and rarely
Duncan> needed for reading big formatted files, 

{people might disagree with this; given scan() is more efficient
 for large files;  but that's not really the topic here.}

Duncan> so I wouldn't propose to change it.  
good.

Duncan> The inconsistency
Duncan> with read.table would be unfortunate, but no worse
Duncan> than the current one.


>>> A lot of the issues seem to be user errors, file
>>> permission errors, hidden extensions as is pointed out
>>> below and related issues. If there is a legitimate bug
>>> in R resulting in these issues, then let's patch
>>> that. However, I don't think that I can recall
>>> reproducible situations where a bug in R is the root
>>> cause of these problems.

>>  Nor I.
>> 
>> Note that file.choose does not protect you against file
>> permission issues (actually, on a command-line Unix-alike
>> it does nothing much useful at all):
>> 
>>> readLines(file.choose())
>> Enter file name: errs.txt

Duncan> No, it's not helpful here, but again it makes things
Duncan> no worse, and there's always the possibility that
Duncan> someone would improve file.choose().

I strongly prefer the current usage

  read.table(file.choose(), )

which implicitly ``explains'' how the file name is chosen to a
new default
  read.table( .)

I'd like basic R functions not to call menu(), GUI... parts 
unless it's really the main task of that function.

Martin


   .
   .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] help with read.table() function

2006-01-30 Thread Martin Maechler
>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]>
>>>>> on Mon, 30 Jan 2006 09:58:23 -0500 writes:

    Duncan> On 1/30/2006 4:16 AM, Martin Maechler wrote:
>>>>>>> "Duncan" == Duncan Murdoch <[EMAIL PROTECTED]> on
>>>>>>> Sun, 29 Jan 2006 16:35:50 -0500 writes:
>>
Duncan> On 1/29/2006 1:29 PM, Prof Brian Ripley wrote:
>> >> On Sun, 29 Jan 2006, Marc Schwartz wrote:
>> >> 
>> >>> I would argue against this.
>> >>> 
>> >>> If this were the default, that is requiring user >>>
>> interaction, it would break a fair amount of code that I
>> >>> (and I am sure a lot of others have) where automation
>> is >>> critical.
>> 
>> >> I don't see how.  The current default is
>> >> 
>> >>> read.table() >> Error in read.table() : argument
>> "file" is missing, with >> no default
>> >> 
>> >> so the only change is that the default might do
>> something >> useful.
>> >> 
>> >> Nor do I see the change would help, as the same people
>> >> would still use a character string for 'file' and not
>> >> omit the argument.  (It seems very unlikely that they
>> >> would read any documentation that suggested things had
>> >> changed.)
>> 
Duncan> No, but people teaching new users (or answering
Duncan> R-help questions) would have a simpler answer: just
Duncan> use read.table().
>>  but I am not sure that people teaching R should advocate
>> such a read.table; I they did, the new R users would get
>> the concept that this is the way how to use R.

Duncan> I'd say "a way to use R", and I think teachers
Duncan> *should* present such a use.  It insulates users
Duncan> from uninteresting details, just as now it's
Duncan> probably good to advocate using file.choose() rather
Duncan> than explaining paths and escape characters before
Duncan> beginners can do anything with data.  Later on
Duncan> they'll need to learn those things, but not from the
Duncan> beginning.

>> I still think R should eventually be used for
>> "Programming with Data" rather than a GUI for ``clicking
>> results together''.  Hence users should be taught (in the
>> 2nd or 3rd part, not the 1st one of their introduction to
>> R) to work with R scripts, writing functions etc.

Duncan> Right, I agree here too.  This would soften the
Duncan> shock of the 1st introduction, but as soon as the
Duncan> students are ready to look at functions and
Duncan> understand default parameters, they'd be able to see
Duncan> that the default value for the "file" argument is
Duncan> file.choose().  They might become curious about it
Duncan> and call it by itself and discover that it is
Duncan> possible to program GUI elements (assuming that
Duncan> file.choose() calls one).

>> And similar to Marc, I would never want default behavior
>> to start up a GUI elements: It is also much more
>> error-prone; just consider the "choose CRAN mirror" GUI
>> that we had recently introduced, and the many questions
>> and "bug" reports it produced.
>> 
>> I know that I am biased in my views here; but I strongly
>> advocate the "useRs becoming programmeRs" theme and hence
>> rather keep R consistent as a programming language,
>> partly agreeing with Gabor here.

Duncan> I think I disagree with you because I think GUI
Duncan> programming is programming.  I don't want beginners
Duncan> to think that there are two kinds of programs:
Duncan> command-line programs that they can write, and GUI
Duncan> programs that only Microsoft can write.  I want them
Duncan> to think that programming is programming.  Doing
Duncan> complex things is harder than doing easy things, but
Duncan> it's not qualitatively different.

Actually, I completely agree with what you said here.

However we disagree to some extent about the implications
(on teaching R, learning R, ..) of making GUI elements defaults
for basic R functions.  
Also the phrase  "consistent as a programming language"  was
about the fact that for some functions, the default file(name) would be
GUI-dispatching whereas for other functions it would

Re: [Rd] colnames(tapply(...)) (PR#8539)

2006-01-30 Thread Martin Maechler
>>>>> "DavidB" == David Brahm <[EMAIL PROTECTED]>
>>>>> on Mon, 30 Jan 2006 18:39:05 +0100 (CET) writes:

DavidB> Wasn't there once a time when tapply(f,f,sum) (with "f" a vector)
DavidB> returned a vector instead of a 1D array?  Then colnames(x) would 
just
DavidB> give NULL instead of an error.  Sorry my memory isn't more precise.

well, it was very good...

R-0.16  had this 
R-0.63.3 (March 3, 1999)  already didn't anymore, i.e. it
already did return a 1D-array.

So, indeed Karl must have used a *very* old version of R.
Martin Maechler, ETH Zurich

DavidB> -- David Brahm ([EMAIL PROTECTED])=20


DavidB> -Original Message-
DavidB> From: [EMAIL PROTECTED]
DavidB> [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley
DavidB> Sent: Monday, January 30, 2006 3:45 AM
DavidB> To: [EMAIL PROTECTED]
DavidB> Cc: [EMAIL PROTECTED]; r-devel@stat.math.ethz.ch
DavidB> Subject: Re: [Rd] colnames(tapply(...)) (PR#8539)


DavidB> On Mon, 30 Jan 2006 [EMAIL PROTECTED] wrote:

>> I would like to bring to your attention the following error message
>> which didn't appear on previous versions (long time ago?)
>> 
>> Thanks for all your effort
>> 
>> Karl
>> 
>> Version 2.2.1 Patched (2006-01-21 r37153)
>> 
>> > f <- rep(c(1,2),each=3D5)
>> > x <- tapply(f,f,sum)
>> > colnames(x)
>> Error in dn[[2]] : subscript out of bounds

DavidB> What is inappropriate about this?  x is a 1D array, so it does not 
have
DavidB> column names (or columns).  Indeed, the help page says

DavidB> x: a matrix-like R object, with at least two dimensions for
DavidB> 'colnames'.

DavidB> The exact same message appears in 1.6.2, more than three years old 
(and=20
DavidB> the earliest version I still have running). If earlier versions did 
not=20
DavidB> have an error message, that was probably a bug.

DavidB> --=20
DavidB> Brian D. Ripley,  [EMAIL PROTECTED]
DavidB> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
DavidB> University of Oxford, Tel:  +44 1865 272861 (self)
DavidB> 1 South Parks Road, +44 1865 272866 (PA)
DavidB> Oxford OX1 3TG, UKFax:  +44 1865 272595

DavidB> __
DavidB> R-devel@r-project.org mailing list
DavidB> https://stat.ethz.ch/mailman/listinfo/r-devel

DavidB> __
DavidB> R-devel@r-project.org mailing list
DavidB> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] What about a bib file

2006-01-31 Thread Martin Maechler
>>>>> "Vince" == Vincent Carey 525-2265 <[EMAIL PROTECTED]>
>>>>> on Mon, 30 Jan 2006 17:49:37 -0500 (EST) writes:

Vince> Romain Francois suggests that a central bibliographic database
Vince> (possibly in bibtex format) might be useful for reference inclusion
Vince> in R package man pages.  This has been discussed by a small
Vince> group, with one proposal presented for a package-specific bibtex 
database
Vince> placed in a dedicated package subdirectory.  Man page references 
would
Vince> then cite the sources enumerated in the database using their bibtex
Vince> tags.  This approach could encourage better annotation and should
Vince> confer greater accuracy on package:literature referencing.

a very good idea!
I've wished more than once that we had something like that in
place...

My intermediate workaround has been the following, e.g., in
package 'cluster', in man/fanny.Rd,  I have
   \seealso{
 \code{\link{agnes}} for background and references;
 
   }

and then no \references{.} in the fanny.Rd file;  but this
workaround is not very satisfactory,
and I am looking forward to your proposals.

Martin Maechler, ETH Zurich


Vince> This does not rule out a central archive that might include all the
Vince> references cited in base man pages.

Vince> We are doing some work on harvesting the bibliographic citations
Vince> in man pages in an R distribution, and converting them to a regular
Vince> format.  The \references section is free form, so the conversion
Vince> is not trivial, but progress has been made.

Vince> The infrastructure required to use this approach to propagate
Vince> (e.g., bibtex-formatted) bibliographic data into the man pages that
Vince> cite the sources is not yet available, but we hope to have some
Vince> prototypes in the next month.

Vince> [apologies if i mess up the threading on this topic; i did not 
receive
Vince> the original e-mail to r-devel]

Vince> Vince Carey
Vince> [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] SaveImage, LazyLoad, S4 and all that {was "install.R ... files"}

2006-02-03 Thread Martin Maechler
> "Seth" == Seth Falcon <[EMAIL PROTECTED]>
> on Thu, 02 Feb 2006 11:32:42 -0800 writes:

Seth> Thanks for the explaination of LazyLoad, that's very helpful.
Seth> On  1 Feb 2006, [EMAIL PROTECTED] wrote:
>> There is no intention to withdraw SaveImage: yes.  Rather, if
>> lazy-loading is not doing a complete job, we could see if it could
>> be improved.

Seth> It seems to me that LazyLoad does something different with respect to
Seth> packages listed in Depends and/or how it interacts with namespaces.

Seth> I'm testing using the Bioconductor package graph and find that if I
Seth> change SaveImage to LazyLoad I get the following:

Interesting.

I had also the vague feeling that  saveImage  was said to be
important when using  S4 classes and methods; particularly when
some methods are for generics from a different package/Namespace
and other methods for `base' classes (or other classes defined
elsewhere).
This is the case of 'Matrix', my primary experience here.
OTOH, we now only use 'LazyLoad: yes' , not (any more?)
'SaveImage: yes' -- and honestly I don't know / remember why.

Martin


Seth> ** preparing package for lazy loading
Seth> Error in makeClassRepresentation(Class, properties, superClasses, 
prototype,  : 
Seth> couldn't find function "getuuid"  

Seth> Looking at the NAMESPACE for the graph package, it looks like it is
Seth> missing some imports.  I added lines:
Seth> import(Ruuid)
Seth> exportClasses(Ruuid)

Seth> Aside: am I correct in my reading of the extension manual that if one
Seth> uses S4 classes from another package with a namespace, one
Seth> must import the classes and *also* export them?

Seth> Now I see this:

Seth> ** preparing package for lazy loading
Seth> Error in getClass("Ruuid") : "Ruuid" is not a defined class
Seth> Error: unable to load R code in package 'graph'
Seth> Execution halted   

Seth> But Ruuid _is_ defined and exported in the Ruuid package.

Seth> Is there a known difference in how dependencies and imports are
Seth> handled with LazyLoad as opposed to SaveImage?  

Seth> Thanks,

Seth> + seth

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] match gets confused by S4 objects

2006-02-07 Thread Martin Maechler
>>>>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]>
>>>>> on Mon, 6 Feb 2006 19:44:50 + (GMT) writes:

BDR> An S4 object is just a list with attributes, so a
BDR> vector type.  match() works with all vector types
BDR> including lists, as you found out (or could have read).

yes, the internal representation of S4 objects is such -- seen
from a non-S4 perspective.

BDR> If in the future those proposing it do re-implement an
BDR> S4 object as an new SEXP then this will change, but for
BDR> now the cost of detecting objects which might have an
BDR> S4 class defined somewhere is just too high (and would
BDR> fall on those who do not use S4 classes).

Just for further explanation, put into other words and a
slightly changed point of view: 

Yes, many R functions get confused by S4 objects, 
most notably,  c()  (!)

 - because they only look at the "internal representation"

 - and because it's expensive to always ``look twice'';
   particularly from the internal C code.
   There's a relatively simple check from R code which we've
   using for str() :

   >> if(has.class <- !is.null(cl <- attr(object, "class"))) { # S3 or S4 class
   >>## FIXME: a kludge
   >>S4 <- !is.null(attr(cl, "package")) || cl == "classRepresentation"
   >>## better, but needs 'methods':   length(methods::getSlots(cl)) > 0
   >> }

   which --- when only testing for S4-presence --- you could collapse to

  if(!is.null(cl <- attr(object, "class")) &&
 (!is.null(attr(cl, "package")) || 
  cl == "classRepresentation")) {

  ...have.S4.object... 

  }

  but note the comment  >>>>   ## FIXME: a kludge   <<<

The solution has been agreed to be changing the internal
representation of S4 objects making them a new SEXP (basic R
"type"); and as Brian alludes to, the problem is that those in
R-core that want to and are able to do this didn't have the time
for that till now.

Martin Maechler, ETH Zurich


BDR> On Mon, 6 Feb 2006, Seth Falcon wrote:

>> If one accidentally calls match(x, obj), where obj is any S4 instance,
>> the result is NA.
>> 
>> I was expecting an error because, in general, if a match method is not
>> defined for a particular S4 class, I don't know what a reasonable
>> default could be.  Specifically, here's what I see
>> 
>> setClass("FOO", representation(a="numeric"))
>> foo <- new("FOO", a=10)
>> match("a", foo)
>> [1] NA
>> 
>> And my thinking is that this should be an error, along the lines of
>> match("a", function(x) x)
>> 
>> Unless, of course, a specific method for match, table="FOO" has been
>> defined.


BDR> -- 
BDR> Brian D. Ripley,  [EMAIL PROTECTED]
BDR> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
BDR> University of Oxford, Tel:  +44 1865 272861 (self)
BDR> 1 South Parks Road, +44 1865 272866 (PA)
BDR> Oxford OX1 3TG, UKFax:  +44 1865 272595

BDR> __
BDR> R-devel@r-project.org mailing list
BDR> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] match gets confused by S4 objects

2006-02-07 Thread Martin Maechler
> "Seth" == Seth Falcon <[EMAIL PROTECTED]>
> on Tue, 07 Feb 2006 07:20:17 -0800 writes:

Seth> On  7 Feb 2006, [EMAIL PROTECTED] wrote:
>> The solution has been agreed to be changing the internal
>> representation of S4 objects making them a new SEXP (basic R
>> "type"); and as Brian alludes to, the problem is that those in
>> R-core that want to and are able to do this didn't have the time
>> for that till now.

Seth> The explanations from you are Brian are helpful, thanks.  I was aware
Seth> that the issue is the internal representation of S4 objects and was
Seth> hoping there might be a cheap work around until a new SEXP comes
Seth> around.

Seth> It seems that S4 instances are less trivial to detect than one might
Seth> expect before actually trying it.  

Seth> I suppose one work around is to have an S4Basic class that defines
Seth> methods for match(), c(), etc and raises an error.  Then extending
Seth> this class gives you some protection.

well; not so easy for c() !! {see the hoops we had to jump through to do
this for cbind() / rbind() (used in 'Matrix')}.

But it might be interesting; particularly since some have said
they'd expect a considerable performance penalty when all these basic
functions would become S4 generics...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rscript failing with h5 reveals bugs in h5 (and 'R CMD check')

2017-12-28 Thread Martin Maechler
> Duncan Murdoch 
> on Wed, 27 Dec 2017 06:13:12 -0500 writes:

> On 26/12/2017 9:40 AM, Dirk Eddelbuettel wrote:
>> 
>> On 26 December 2017 at 22:14, Sun Yijiang wrote: | Thanks
>> for the solution.  Now I know the work-arounds, but still
>> don't | quite get it. Why does R_DEFAULT_PACKAGES has
>> anything to do with | library(methods)?
>> 
>> Because it governs which packages are loaded by default.
>> And while R also loads 'methods', Rscript does
>> not. Source of endless confusion.

> Mostly irrelevant correction of the jargon: that setting
> controls which packages are "attached" by default.
> library(h5) would be enough to load methods, because h5
> imports things from methods.  But loading doesn't put a
> package on the search list.  library(methods) both loads
> methods (if it hasn't already been loaded), and attaches
> it.

>> 
>> | If library(h5) works, it should just work, not depend
>> on an environment variable.
>> 
>> Every package using S4 will fail under Rscript unless
>> 'methods' explicitly.

> That's not quite true (or quite English, as per
> fortune(112)).  The "gmp" package imports methods, and
> it works in Rscript.  What doesn't work is to expect
> library(h5) or library(gmp) to cause methods functions
> like show() to be available to the user.

But indeed, in this case Sun's  test.R  script did not use any
such user level functions, and it still did not work when
methods is not attached...
and indeed that's the case also with R if you run it without
loading methods e.g. by

  R_DEFAULT_PACKAGES=NULL R CMD BATCH test.R

shows the same error...

===> There is really a bug in  h5 :  It does not import enough
from methods or it would all work fine, even with Rscript !!

===> So we have something relevant to R-devel , actually at
least one bug in R's  checking :

Why did
R   CMD check --as-cran h5

not see that h5 defines methods for initialize() but never
imports that from methods ?
and so "should not work" when methods is not attached

--

After all, the fact that the default packages attached at the
beginning differ between R and Rscript  has contributed to
revealing a bug in both 'h5' and R's checking procedures.

Maybe we should keep Rscript's incompatibility therefore ;-)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fixed BLAS tests for external BLAS library

2018-01-05 Thread Martin Maechler
>>>>> Tomas Kalibera 
>>>>> on Fri, 5 Jan 2018 00:41:47 +0100 writes:

> In practical terms, failing tests are not preventing anyone from using 
> an optimized BLAS/LAPACK implementation they trust. Building R with 
> dynamically linked BLAS on Unix is supported, documented and easy for 
> anyone who builds R from source. It is also how Debian/Ubuntu R packages 
> are built by default, so R uses whichever BLAS is installed in the 
> system and the user does not have to build from source. There is no 
> reason why not to do the same thing with another optimized BLAS on 
> another OS/distribution.

> You may be right that reg-BLAS is too strict (it is testing matrix 
> products, expecting equivalence to naive three-loop algorithm, just part 
> of it really uses BLAS). I just wanted a concrete example to think about 
> as I can't repeat it (e.g. it passes with openblas), but maybe someone 
> else will be able to repeat and possibly adjust.

> Tomas

Yes, indeed!  I strongly agree with Thomas:  This is about
serious quality assurance of an important part of R,
and replacing all identical() checks there with all.equal()
-- which has a default tolerance of allowing __HALF__ of the
   precision being lost  !! -- in the way you, Simon, proposed,
is definitely basically destroying the QC/QA  we have in place there.

As Tomas said, *some* of the checks possibly should be done
all.equal, but with a very a small tolerance; however other
checks should not allow a tolerance, e.g., all the arithmetic involving
very small integer valued numbers should definitely be exact.

That's why Tomas' (private!) reply, asking for specific details
is 100% appropriate, indeed.

With R we have had a philosophy of trying hard to be correct
first, and fast second... and indeed the last 20 years have
shown many cases where R's use (and checks) actually have
reveiled not only inaccuracies but sometimes also bugs in
LAPACK/BLAS implementations where it sometimes seems, some are
only interested in speed, rather than correctness.

Martin Maechler
ETH Zurich

> On 01/04/2018 09:23 PM, Simon Guest wrote:
>> Hi Tomas,
>> 
>> Thanks for your reply.
>> 
>> I find your response curious, however.  Surely the identical() test is 
>> simply incorrect when catering for possibly different BLAS 
>> implementations?  Or is it the case that conformant BLAS 
>> implementations all produce bit-identical results, which seems 
>> unlikely?  (Sorry, I am unfamiliar with the BLAS spec.)  Although 
>> whatever the answer to this theoretical question, the CentOS 7 
>> external BLAS library evidently doesn't produce bit-identical results.
>> 
>> If you don't agree that replacing identical() with all.equal() is 
>> clearly the right thing to do, as demonstrated by the CentOS 7 
>> external BLAS library failing the test, then I think I will give up 
>> now trying to help improve the R sources.  I simply can't justify to 
>> my client more time spent on making this work, when we already have a 
>> local solution (which I hoped others would be able to benefit from).  
>> Ah well.
>> 
>> cheers,
>> Simon
>> 
>> On 5 January 2018 at 00:07, Tomas Kalibera > <mailto:tomas.kalib...@gmail.com>> wrote:
>> 
>> Hi Simon,
>> 
>> we'd need more information to consider this - particularly which
>> expression gives an imprecise result with ACML and what are the
>> computed values, differences. It is not common for optimized BLAS
>> implementations to fail reg-BLAS.R tests, but it is common for
>> them to report numerical differences in tests of various
>> recommended packages where more complicated computations are done
>> (e.g. nlme), on various platforms.
>> 
>> Best
>> Tomas
>> 
>> 
>> On 12/18/2017 08:56 PM, Simon Guest wrote:
>> 
>> We build R with dynamically linked BLAS and LAPACK libraries,
>> in order
>> to use the AMD Core Math Library (ACML) multi-threaded
>> implementation
>> of these routines on our 64 core servers.  This is great, and our
>> users really appreciate it.
>> 
>> However, when building like this, make check fails on the
>> reg-BLAS.R
>> test.  The reason for this is that the expected test output is
>> checked
>> using identical.  By changing all uses of identical in this
>> file to
>> all.equal, the tests pass.
>

Re: [Rd] Better error message in loadNamespace

2018-01-22 Thread Martin Maechler
>>>>> Thomas Lin Pedersen 
>>>>> on Mon, 22 Jan 2018 14:32:27 +0100 writes:

> Hi I’ve just spend a bit of time debugging an error
> arising in `loadNamespace`. The bottom line is that the
> `vI` object is assigned within an `if` block but expected
> to exist for all of the remaining code. In some cases
> where the package library has been corrupted or when it
> resides on a network drive with bad connection this can
> lead to error messages complaining about `vI` object not
> existing. Debugging through the error is difficult, both
> because `loadNamespace` is called recursively through the
> dependency graph and the error can arise at any depth. And
> because the recursive calls are wrapped in `try` so the
> code breaks some distance from the point where the error
> occurred.

> I will suggest mitigating this by adding an `else` clause
> to the `if` block where `vI` gets assigned that warns
> about potential corruption of the library and names the
> package that caused the error.

Not sure this is desirable... in general even though it may well
be desirable in your use case...

You will be aware that this an important function that maybe
called many times, e.g., notably even at R startup time and so
must be very robust [hence the many try* settings] and must use
messages/warnings that are suppressable etc etc.

On reading the source, I tend to agree with you that it looks
odd there is no  else  clause to that if(), but then there may
be subtle good reasons for that we don't see now.

> I can open a bug report if you wish, but I would require a
> bugzilla account for that. Otherwise you’re also welcome
> to take it from here.

I'll do that for you in any case.

Martin Maechler
ETH Zurich


> With best wishes Thomas Lin Pedersen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Inconsistent rank in qr()

2018-01-22 Thread Martin Maechler
> Serguei Sokol 
> on Mon, 22 Jan 2018 17:57:47 +0100 writes:

> Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>> This behavior is noted in the qr documentation, no?
>> 
>> rank - the rank of x as computed by the decomposition(*): always full 
rank in the LAPACK case.
> For a me a "full rank matrix" is a matrix the rank of which is indeed 
min(nrow(A), ncol(A))
> but here the meaning of "always is full rank" is somewhat confusing. Does 
it mean
> that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
> May be there is a jargon where "full rank" is a synonym of min(nrow(A), 
ncol(A)) for any matrix
> but the fix to stick with commonly admitted rank definition (i.e. the 
number of linearly independent
> columns in A) is so easy. Why to discard lapack case from it (even 
properly documented)?

Because 99.5% of caller to qr()  never look at '$rank', 
so why should we compute it every time qr() is called?

==> Matrix :: rankMatrix() does use "qr" as one of its several methods.

--

As wiser people than me have said (I'm paraphrasing, don't find a nice 
citation):

  While the rank of a matrix is a very well defined concept in
  mathematics (theory), its practical computation on a finite
  precision computer is much more challenging.

The ?rankMatrix  help page (package Matrix, part of your R)
   https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
starts with the following 'Description' 

__ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat 
ambigous in practice. We provide several methods, the default corresponding to 
Matlab's definition.

__ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly 
independent columns (or rows); hence rk(A) <= min(n,m).


>>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol  
wrote:
>>> 
>>> Hi,
>>> 
>>> I have noticed different rank values calculated by qr() depending on
>>> LAPACK parameter. When it is FALSE (default) a true rank is estimated 
and returned.
>>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is 
returned
>>> which is only occasionally a true rank.
>>> 
>>> Would not it be more consistent to replace the rank in the latter case 
by something
>>> based on the following pseudo code ?
>>> 
>>> d=abs(diag(qr))
>>> rank=sum(d >= d[1]*tol)
>>> 
>>> Here, we rely on the fact column pivoting is activated in the called 
lapack routine (dgeqp3)
>>> and diagonal term in qr matrix are put in decreasing order (according 
to their absolute values).
>>> 
>>> Serguei.
>>> 
>>> How to reproduce:
>>> 
>>> a=diag(2)
>>> a[2,2]=0
>>> qaf=qr(a, LAPACK=FALSE)
>>> qaf$rank # shows 1. OK it's the true rank value
>>> qat=qr(a, LAPACK=TRUE)
>>> qat$rank #shows 2. Bad, it's not the expected value.
>>> 

> -- 
> Serguei Sokol
> Ingenieur de recherche INRA

> Cellule mathématique
> LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
> 135 Avenue de Rangueil
> 31077 Toulouse Cedex 04

> tel: +33 5 6155 9849
> email: so...@insa-toulouse.fr
> http://www.lisbp.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2018-01-27 Thread Martin Maechler
> Henrik Bengtsson 
> on Thu, 25 Jan 2018 09:30:42 -0800 writes:

> Just following up on this old thread since matrixStats 0.53.0 is now
> out, which supports this use case:

>> x <- rep(TRUE, times = 2^31)

>> y <- sum(x)
>> y
> [1] NA
> Warning message:
> In sum(x) : integer overflow - use sum(as.numeric(.))

>> y <- matrixStats::sum2(x, mode = "double")
>> y
> [1] 2147483648
>> str(y)
> num 2.15e+09

> No coercion is taking place, so the memory overhead is zero:

>> profmem::profmem(y <- matrixStats::sum2(x, mode = "double"))
> Rprofmem memory profiling of:
> y <- matrixStats::sum2(x, mode = "double")

> Memory allocations:
> bytes calls
> total 0

> /Henrik

Thank you, Henrik, for the reminder.

Back in June, I had mentioned to Hervé and R-devel that
'logical' should remain to be treated as 'integer' as in all
arithmetic in (S and) R. Hervé did mention the isum()
function in the C code which is relevant here .. which does have
a LONG INT counter already -- *but* if we consider that sum()
has '...' i.e. a conceptually arbitrary number of long vector
integer arguments that counter won't suffice even there.

Before talking about implementation / patch, I think we should
consider 2 possible goals of a change --- I agree the status quo
is not a real option

1) sum(x) for logical and integer x  would return a double
  in any case and overflow should not happen (unless for
  the case where the result would be larger the
  .Machine$double.max which I think will not be possible
  even with "arbitrary" nargs() of sum.

2) sum(x) for logical and integer x  should return an integer in
   all cases there is no overflow, including returning
   NA_integer_ in case of NAs.
   If there would be an overflow it must be detected "in time"
   and the result should be double.

The big advantage of 2) is that it is back compatible in 99.x %
of use cases, and another advantage that it may be a very small
bit more efficient.  Also, in the case of "counting" (logical),
it is nice to get an integer instead of double when we can --
entirely analogously to the behavior of length() which returns
integer whenever possible.

The advantage of 1) is uniformity.

We should (at least provisionally) decide between 1) and 2) and then go for 
that.
It could be that going for 1) may have bad
compatibility-consequences in package space, because indeed we
had documented sum() would be integer for logical and integer arguments.

I currently don't really have time to
{work on implementing + dealing with the consequences}
for either ..

Martin

> On Fri, Jun 2, 2017 at 1:58 PM, Henrik Bengtsson
>  wrote:
>> I second this feature request (it's understandable that this and
>> possibly other parts of the code was left behind / forgotten after the
>> introduction of long vector).
>> 
>> I think mean() avoids full copies, so in the meanwhile, you can work
>> around this limitation using:
>> 
>> countTRUE <- function(x, na.rm = FALSE) {
>> nx <- length(x)
>> if (nx < .Machine$integer.max) return(sum(x, na.rm = na.rm))
>> nx * mean(x, na.rm = na.rm)
>> }
>> 
>> (not sure if one needs to worry about rounding errors, i.e. where n %% 0 
!= 0)
>> 
>> x <- rep(TRUE, times = .Machine$integer.max+1)
>> object.size(x)
>> ## 8589934632 bytes
>> 
>> p <- profmem::profmem( n <- countTRUE(x) )
>> str(n)
>> ## num 2.15e+09
>> print(n == .Machine$integer.max + 1)
>> ## [1] TRUE
>> 
>> print(p)
>> ## Rprofmem memory profiling of:
>> ## n <- countTRUE(x)
>> ##
>> ## Memory allocations:
>> ##  bytes calls
>> ## total 0
>> 
>> 
>> FYI / related: I've just updated matrixStats::sum2() to support
>> logicals (develop branch) and I'll also try to update
>> matrixStats::count() to count beyond .Machine$integer.max.
>> 
>> /Henrik
>> 
>> On Fri, Jun 2, 2017 at 4:05 AM, Hervé Pagès  wrote:
>>> Hi,
>>> 
>>> I have a long numeric vector 'xx' and I want to use sum() to count
>>> the number of elements that satisfy some criteria like non-zero
>>> values or values lower than a certain threshold etc...
>>> 
>>> The problem is: sum() returns an NA (with a warning) if the count
>>> is greater than 2^31. For example:
>>> 
>>> > xx <- runif(3e9)
>>> > sum(xx < 0.9)
>>> [1] NA
>>> Warning message:
>>> In sum(xx < 0.9) : integer overflow - use sum(as.numeric(.))
>>> 
>>> This already takes a long time and doing sum(as.numeric(.)) would
>>> take even longer and require allocation of 24Gb of memory just to
>>> store an intermediate numeric vector made of 0s and 1s. Plus, having
>>> to do sum(as.numeric(.)) every time I need to count things is not
>>> convenient and is easy to forget.
>>> 
>>> It seems t

Re: [Rd] withTimeout bug, it does not work properly with nlme anymore

2018-01-30 Thread Martin Maechler
>>>>> Ramiro Barrantes 
>>>>> on Mon, 27 Nov 2017 21:02:52 + writes:

> Hello, I was relying on withTimeout (from R.utils) to help
> me stop nlme when it �hangs�.  However, recently this
> stopped working.  I am pasting a reproducible example
> below: withTimeout should stop nlme after 10 seconds but
> the code will generate data for which nlme does not
> converge (or takes too long) and withTimeout does not stop
> it.  I tried this both on a linux (64 bit, CentOS 7, R
> 3.4.1, nlme 3.1-131 R.util 2.6, and also with R 3.2.5) and
> mac (Sierra 10.13.1, R 3.4.2, same versions or nlme and
> R.utils).  It takes over R and I need to use brute-force
> to stop it.  As mentioned, this used to work and it is
> very helpful for the purposes of having a loop where nlme
> goes through many models.

> Thank you in advance for any help, Ramiro

Dear Ramiro,

as I thought you are reporting a bug  about  R.utils  withTimeout(),
I and maybe others have not reacted.

You've addressed this again in a non-public e-mail,
and indeed the underlying bug is really in nlme  which you do
mention implicitly.

I'm appending a version of your example that is not using R.utils
at all and reproducible hangs for me with R 3.4.3, R 3.4.3
patched and R-devel (and almost surely earlier versions of R
which I did not check.

Indeed, the call to nlme() "stalls" // "hangs" / "freezes" /
... R indeed, and cannot be terminated in a regular way, and, as
you, I do need "brute force" to stop it, killing the R process
too.

As the maintainer of the 'nlme'  *is* R-core,
we are asked to fix this, at least making it interruptable.

Still I should not take time for that for the next couple of
weeks as I should fulfill several other day jobs duties,
instead, and so will not promise anything here.

Tested (minimal) patches are welcome!

Here's a version of your script slightly simplified which
exhibits the problem and shows the problem indeed does not
happen in nlminb() -- which I wrongly assumed for a while --
but indeed in nlme's call to own .C() code.

I am looking into fixing this (making it interruptable // detect
the infinite loop).
My guess is that it only happens in degenerate cases like here.

Martin Maechler
ETH Zurich


## From: Ramiro Barrantes 
## To: "r-devel@r-project.org" 
## Subject: [Rd] withTimeout bug, it does not work properly with nlme anymore
## Date: Mon, 27 Nov 2017 21:02:52 +

## Hello,

## I was relying on withTimeout (from R.utils) to help me stop nlme when it
## �hangs�.  However, recently this stopped working.  I am pasting a
## reproducible example below: withTimeout should stop nlme after 10 seconds
## but the code will generate data for which nlme does not converge (or takes
## too long) and withTimeout does not stop it.  I tried this both on a linux
## (64 bit, CentOS 7, R 3.4.1, nlme 3.1-131 R.util 2.6, and also with R
## 3.2.5) and mac (Sierra 10.13.1, R 3.4.2, same versions or nlme and
## R.utils).  It takes over R and I need to use brute-force to stop it.  As
## mentioned, this used to work and it is very helpful for the purposes of
## having a loop where nlme goes through many models.

## Thank you in advance for any help,
## Ramiro

## ((Modifications by Martin Maechler)
dat <- data.frame(

x=c(3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3,3,3,3,3,3,3,3,3,3,3,3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86),

y=c(0.35,0.69,0.57,1.48,6.08,-0.34,0.53,1.66,0.02,4.4,8.42,3.3,2.32,-2.3,7.52,-2.12,3.41,-4.76,7.9,5.04,10.26,-1.42,7.85,-1.88,3.81,-2.59,4.32,5.7,1.18,
 
-1.74,1.81,6.16,4.2,-0.39,1.55,-1.4,1.76,-4.14,-2.36,-0.24,4.8,-7.07,1.34,1.98,0.86,-3.96,-0.61,2.68,-1.65,-2.06,3.67,-0.19,2.33,3.78,2.16,0.35,
 
-5.6,1.32,2.99,4.21,-0.9,4.32,-4.01,2.03,0.9,-0.74,-5.78,5.76,0.52,1.37,-0.9,-4.06,-0.49,-2.39,-2.67,-0.71,-0.4,2.55,0.97,1.96,8.13,-5.93,4.01,0.79,
 -5.61,0.29,4.92,-2.89,-3.24,-3.06,-0.23,0.71,0.75,4.6,1.35, -3.35),
f.block = rep(1:4, 24),
id= paste0("a", rep(c(2,1,3),each=4)))
str(dat)
## 'data.frame':96 obs. of  4 variables:
##  $ x  : num  3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 ...
##  $ y  : num  0.35 0.69 0.57 1.48 6.08 -0.34 0.53 1.66 0.02 4.4 ...
##  $ f.block: num  1 2 3 4 1 2 3 4 1 2 ...
##  $ id : Factor w/ 3 levels "a1","a2","a3": 2 2 2 2 1 1 1 1 3 3 ...

table(dat$id) # 32 x 3 -- indeed the 2 factors are perfectly balanced:
xtabs(~id + f.block, data

Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
> Michael Lawrence 
> on Tue, 30 Jan 2018 15:57:42 -0800 writes:

> I just meant that the minimal contract for as.list() appears to be that it
> returns a VECSXP. To the user, we might say that is.list() will always
> return TRUE.

Indeed. I also agree with Herv'e that the user level
documentation should rather mention  is.list(.) |--> TRUE  than
VECSXP, and interestingly for the experts among us,
the  is.list() primitive gives not only TRUE for  VECSXP  but
also of LISTSXP (the good ole' pairlists).

> I'm not sure we can expect consistency across methods
> beyond that, nor is it feasible at this point to match the
> semantics of the methods package. It deals in "class
> space" while as.list() deals in "typeof() space".

> Michael

Yes, and that *is* the extra complexity we have in R (inherited
from S, I'd say)  which ideally wasn't there and of course is
not there in much younger languages/systems such as julia.

And --- by the way let me preach, for the "class space" ---
do __never__ use

  if(class(obj) == "")

in your code (I see this so often, shockingly to me ...) but rather use

  if(inherits(obj, ""))

instead.

Martin



> On Tue, Jan 30, 2018 at 3:47 PM, Hervé Pagès  wrote:

>> On 01/30/2018 02:50 PM, Michael Lawrence wrote:
>> 
>>> by() does not always return a list. In Gabe's example, it returns an
>>> integer, thus it is coerced to a list. as.list() means that it should 
be a
>>> VECSXP, not necessarily with "list" in the class attribute.
>>> 
>> 
>> The documentation is not particularly clear about what as.list()
>> means for list derivatives. IMO clarifications should stick to
>> simple concepts and formulations like "is.list(x) is TRUE" or
>> "x is a list or a list derivative" rather than "x is a VECSXP".
>> Coercion is useful beyond the use case of implementing a .C entry
>> point and calling as.numeric/as.list/etc... on its arguments.
>> 
>> This is why I was hoping that we could maybe discuss the possibility
>> of making the as.list() contract less vague than just "as.list()
>> must return a list or a list derivative".
>> 
>> Again, I think that 2 things weight quite a lot in that discussion:
>> 1) as.list() returns an object of class "data.frame" on a
>> data.frame (strict coercion). If all what as.list() needed to
>> do was to return a VECSXP, then as.list.default() already does
>> this on a data.frame so why did someone bother adding an
>> as.list.data.frame method that does strict coercion?
>> 2) The S4 coercion system based on as() does strict coercion by
>> default.
>> 
>> H.
>> 
>> 
>>> Michael
>>> 
>>> 
>>> On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès >> > wrote:
>>> 
>>> Hi Gabe,
>>> 
>>> Interestingly the behavior of as.list() on by objects seem to
>>> depend on the object itself:
>>> 
>>> > b1 <- by(1:2, 1:2, identity)
>>> > class(as.list(b1))
>>> [1] "list"
>>> 
>>> > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
>>> > class(as.list(b2))
>>> [1] "by"
>>> 
>>> This is with R 3.4.3 and R devel (2017-12-11 r73889).
>>> 
>>> H.
>>> 
>>> On 01/30/2018 02:33 PM, Gabriel Becker wrote:
>>> 
>>> Dario,
>>> 
>>> What version of R are you using. In my mildly old 3.4.0
>>> installation and in the version of Revel I have lying around
>>> (also mildly old...)  I don't see the behavior I think you are
>>> describing
>>> 
>>> > b = by(1:2, 1:2, identity)
>>> 
>>> > class(as.list(b))
>>> 
>>> [1] "list"
>>> 
>>> > sessionInfo()
>>> 
>>> R Under development (unstable) (2017-12-19 r73926)
>>> 
>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>> 
>>> Running under: OS X El Capitan 10.11.6
>>> 
>>> 
>>> Matrix products: default
>>> 
>>> BLAS:
>>> /Users/beckerg4/local/Rdevel/R
>>> .framework/Versions/3.5/Resources/lib/libRblas.dylib
>>> 
>>> LAPACK:
>>> /Users/beckerg4/local/Rdevel/R
>>> .framework/Versions/3.5/Resources/lib/libRlapack.dylib
>>> 
>>> 
>>> locale:
>>> 
>>> [1]
>>> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> 
>>> attached base packages:
>>> 
>>> [1] stats graphics  grDevices utils datasets
>>> methods   base
>>> 
>>> 
>>> loaded via a namespace (and not attached):
>>> 
>>> [1] compiler_3.5.0
>>> 
>>> >
>>> 
>>> 
>>> As for by not having a class definition, no S3 class has an
>>> explicit definition, so this is somewhat par for the course
>>> here...
>>> 
>>> did I misunderstand something?
>>> 
>>> 
>>> ~G
>>> 
>>> On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès
>>> mailto:hpa...@fredhutc

Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
> Michael Lawrence 
> on Tue, 30 Jan 2018 10:37:38 -0800 writes:

> I agree that it would make sense for the object to have c("by", "list") as
> its class attribute, since the object is known to behave as a list.

Well, but that (list behavior) applies to most non-simple S3
classed objects, say "data.frame", say "lm" to start with real basic ones.

The later part of the discussion, seems more relevant to me.
Adding "list" to the class attribute seems as wrong to me as
e.g. adding "double" to "Date" or "POSIXct" (and many more such cases).

For the present case, we should stay with focusing on  is.list()
being true after as.list() .. the same we would do with
as.numeric() and is.numeric().

Martin

> However, it would may be too disruptive to make this change at this point.
> Hard to predict.

> Michael

> On Mon, Jan 29, 2018 at 5:00 PM, Dario Strbenac 

> wrote:

>> Good day,
>> 
>> I'd like to suggest the addition of an as.list method for a by object 
that
>> actually returns a list of class "list". This would make it safer to do
>> type-checking, because is.list also returns TRUE for a data.frame 
variable
>> and using class(result) == "list" is an alternative that only returns 
TRUE
>> for lists. It's also confusing initially that
>> 
>> > class(x)
>> [1] "by"
>> > is.list(x)
>> [1] TRUE
>> 
>> since there's no explicit class definition for "by" and no mention if it
>> has any superclasses.
>> 
>> --
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2018-02-01 Thread Martin Maechler
>>>>> Hervé Pagès 
>>>>> on Tue, 30 Jan 2018 13:30:18 -0800 writes:

> Hi Martin, Henrik,
> Thanks for the follow up.

> @Martin: I vote for 2) without *any* hesitation :-)

> (and uniformity could be restored at some point in the
> future by having prod(), rowSums(), colSums(), and others
> align with the behavior of length() and sum())

As a matter of fact, I had procrastinated and worked at
implementing '2)' already a bit on the weekend and made it work
- more or less.  It needs a bit more work, and I had also been considering
replacing the numbers in the current overflow check

if (ii++ > 1000) {   \
ii = 0; \
if (s > 9000L || s < -9000L) {  \
if(!updated) updated = TRUE;\
*value = NA_INTEGER;\
warningcall(call, _("integer overflow - use 
sum(as.numeric(.))")); \
return updated; \
}   \
}   \

i.e. think of tweaking the '1000' and '9000L', 
but decided to leave these and add comments there about why. For
the moment.
They may look arbitrary, but are not at all: If you multiply
them (which looks correct, if we check the sum 's' only every 1000-th
time ...((still not sure they *are* correct))) you get  9*10^18
which is only slightly smaller than  2^63 - 1 which may be the
maximal "LONG_INT" integer we have.

So, in the end, at least for now, we do not quite go all they way
but overflow a bit earlier,... but do potentially gain a bit of
speed, notably with the ITERATE_BY_REGION(..) macros
(which I did not show above).

Will hopefully become available in R-devel real soon now.

Martin

> Cheers,
> H.


> On 01/27/2018 03:06 AM, Martin Maechler wrote:
>>>>>>> Henrik Bengtsson 
>>>>>>> on Thu, 25 Jan 2018 09:30:42 -0800 writes:
>> 
>> > Just following up on this old thread since matrixStats 0.53.0 is now
>> > out, which supports this use case:
>> 
>> >> x <- rep(TRUE, times = 2^31)
>> 
>> >> y <- sum(x)
>> >> y
>> > [1] NA
>> > Warning message:
>> > In sum(x) : integer overflow - use sum(as.numeric(.))
>> 
>> >> y <- matrixStats::sum2(x, mode = "double")
>> >> y
>> > [1] 2147483648
>> >> str(y)
>> > num 2.15e+09
>> 
>> > No coercion is taking place, so the memory overhead is zero:
>> 
>> >> profmem::profmem(y <- matrixStats::sum2(x, mode = "double"))
>> > Rprofmem memory profiling of:
>> > y <- matrixStats::sum2(x, mode = "double")
>> 
>> > Memory allocations:
>> > bytes calls
>> > total 0
>> 
>> > /Henrik
>> 
>> Thank you, Henrik, for the reminder.
>> 
>> Back in June, I had mentioned to Hervé and R-devel that
>> 'logical' should remain to be treated as 'integer' as in all
>> arithmetic in (S and) R. Hervé did mention the isum()
>> function in the C code which is relevant here .. which does have
>> a LONG INT counter already -- *but* if we consider that sum()
>> has '...' i.e. a conceptually arbitrary number of long vector
>> integer arguments that counter won't suffice even there.
>> 
>> Before talking about implementation / patch, I think we should
>> consider 2 possible goals of a change --- I agree the status quo
>> is not a real option
>> 
>> 1) sum(x) for logical and integer x  would return a double
>> in any case and overflow should not happen (unless for
>> the case where the result would be larger the
>> .Machine$double.max which I think will not be possible
>> even with "arbitrary" nargs() of sum.
>> 
>> 2) sum(x) for logical and integer x  should return an integer in
>> all cases there is no overflow, including returning
>> NA_integer_ in case of NAs.
>> If there would be an overflow it must be detected "in time"
>> and the result should be double.
>> 
>> The big advantage of 2) is that it is back compatible in 99

Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
>>>>> Michael Lawrence 
>>>>> on Thu, 1 Feb 2018 06:12:20 -0800 writes:

> On Thu, Feb 1, 2018 at 1:21 AM, Martin Maechler 

> wrote:

>> >>>>> Michael Lawrence 
>> >>>>> on Tue, 30 Jan 2018 10:37:38 -0800 writes:
>> 
>> > I agree that it would make sense for the object to have c("by",
>> "list") as
>> > its class attribute, since the object is known to behave as a list.
>> 
>> Well, but that (list behavior) applies to most non-simple S3
>> classed objects, say "data.frame", say "lm" to start with real basic 
ones.
>> 
>> The later part of the discussion, seems more relevant to me.
>> Adding "list" to the class attribute seems as wrong to me as
>> e.g. adding "double" to "Date" or "POSIXct" (and many more such cases).
>> 
>> 
> There's a distinction though. Date and POSIXct should not really behave as
> double values (an implementation detail), but "by" is expected to behave 
as
> a list (when it is one).

yes, you are right  As I'm "never"(*) using by(), I'm glad
to leave this issue to you.

Martin

---
*) Never  [James Bond, 1983]

> For the present case, we should stay with focusing on  is.list()
>> being true after as.list() .. the same we would do with
>> as.numeric() and is.numeric().
>> 
>> Martin
>> 
>> > However, it would may be too disruptive to make this change at this
>> point.
>> > Hard to predict.
>> 
>> > Michael
>> 
>> > On Mon, Jan 29, 2018 at 5:00 PM, Dario Strbenac <
>> dstr7...@uni.sydney.edu.au>
>> > wrote:
>> 
>> >> Good day,
>> >>
>> >> I'd like to suggest the addition of an as.list method for a by
>> object that
>> >> actually returns a list of class "list". This would make it safer
>> to do
>> >> type-checking, because is.list also returns TRUE for a data.frame
>> variable
>> >> and using class(result) == "list" is an alternative that only
>> returns TRUE
>> >> for lists. It's also confusing initially that
>> >>
>> >> > class(x)
>> >> [1] "by"
>> >> > is.list(x)
>> >> [1] TRUE
>> >>
>> >> since there's no explicit class definition for "by" and no mention
>> if it
>> >> has any superclasses.
>> >>
>> >> --
>> >> Dario Strbenac
>> >> University of Sydney
>> >> Camperdown NSW 2050
>> >> Australia
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >>
>> 
>> > [[alternative HTML version deleted]]
>> 
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list method for by Objects

2018-02-02 Thread Martin Maechler
>>>>> Henrik Bengtsson 
>>>>> on Thu, 1 Feb 2018 10:26:23 -0800 writes:

> On Thu, Feb 1, 2018 at 12:14 AM, Martin Maechler
>  wrote:
>>>>>>> Michael Lawrence 
>>>>>>> on Tue, 30 Jan 2018 15:57:42 -0800 writes:
>> 
>> > I just meant that the minimal contract for as.list() appears to be 
that it
>> > returns a VECSXP. To the user, we might say that is.list() will always
>> > return TRUE.
>> 
>> Indeed. I also agree with Herv'e that the user level
>> documentation should rather mention  is.list(.) |--> TRUE  than
>> VECSXP, and interestingly for the experts among us,
>> the  is.list() primitive gives not only TRUE for  VECSXP  but
>> also of LISTSXP (the good ole' pairlists).
>> 
>> > I'm not sure we can expect consistency across methods
>> > beyond that, nor is it feasible at this point to match the
>> > semantics of the methods package. It deals in "class
>> > space" while as.list() deals in "typeof() space".
>> 
>> > Michael
>> 
>> Yes, and that *is* the extra complexity we have in R (inherited
>> from S, I'd say)  which ideally wasn't there and of course is
>> not there in much younger languages/systems such as julia.
>> 
>> And --- by the way let me preach, for the "class space" ---
>> do __never__ use
>> 
>> if(class(obj) == "")
>> 
>> in your code (I see this so often, shockingly to me ...) but rather use
>> 
>> if(inherits(obj, ""))
>> 
>> instead.

> Second this one.  But, soon (*) the former will at least give the
> correct answer when length(class(obj)) == 1 
> and produce an error
> otherwise.

Not quite; I think you you did not get the real danger in using
'class(.) == *':
What you say above would only be true if there were only S3 classes!
Try the following small R snippet

myDate <- setClass("myDate", contains = "Date")
## Object of class "myDate"
## [1] "2018-02-02"
(d <- myDate(Sys.Date()))
class(d) == "Date"  # is FALSE (hence of length 1)
inherits(d, "Date") # is TRUE

> So, several of these cases will be caught at run-time in a
> near future.

Maybe.  But all the others are  still wrong, as I show above.
Martin

> (*) When _R_CHECK_LENGTH_1_CONDITION_=true becomes the default
> behavior - hopefully by R 3.5.0.

>> 
>> Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2018-02-05 Thread Martin Maechler
>>>>> Martin Maechler 
>>>>> on Thu, 1 Feb 2018 16:34:04 +0100 writes:

> >>>>> Hervé Pagès 
> >>>>> on Tue, 30 Jan 2018 13:30:18 -0800 writes:
> 
> > Hi Martin, Henrik,
> > Thanks for the follow up.
> 
> > @Martin: I vote for 2) without *any* hesitation :-)
> 
> > (and uniformity could be restored at some point in the
> > future by having prod(), rowSums(), colSums(), and others
> > align with the behavior of length() and sum())
> 
> As a matter of fact, I had procrastinated and worked at
> implementing '2)' already a bit on the weekend and made it work
> - more or less.  It needs a bit more work, and I had also been considering
> replacing the numbers in the current overflow check
> 
>   if (ii++ > 1000) {   \
>   ii = 0; \
>   if (s > 9000L || s < -9000L) {  \
>   if(!updated) updated = TRUE;\
>   *value = NA_INTEGER;\
>   warningcall(call, _("integer overflow - use 
> sum(as.numeric(.))")); \
>   return updated; \
>   }   \
>   }   \
> 
> i.e. think of tweaking the '1000' and '9000L', 
> but decided to leave these and add comments there about why. For
> the moment.
> They may look arbitrary, but are not at all: If you multiply
> them (which looks correct, if we check the sum 's' only every 1000-th
> time ...((still not sure they *are* correct))) you get  9*10^18
> which is only slightly smaller than  2^63 - 1 which may be the
> maximal "LONG_INT" integer we have.
> 
> So, in the end, at least for now, we do not quite go all they way
> but overflow a bit earlier,... but do potentially gain a bit of
> speed, notably with the ITERATE_BY_REGION(..) macros
> (which I did not show above).
> 
> Will hopefully become available in R-devel real soon now.
>
> Martin

After finishing that... I challenged myself that one should be able to do
better, namely "no overflow" (because of large/many
integer/logical), and so introduced  irsum()  which uses a double 
precision accumulator for integer/logical  ... but would really
only be used when the 64-bit int accumulator would get close to
overflow.
The resulting code is not really beautiful, and also contains a
a comment " (a waste, rare; FIXME ?) "
If anybody feels like finding a more elegant version without the
"waste" case, go ahead and be our guest ! 

Testing the code does need access to a platform with enough GB
RAM, say 32 (and I have run the checks only on servers with >
100 GB RAM). This concerns the new checks at the (current) end
of /tests/reg-large.R

In R-devel svn rev >= 74208  for a few minutes now.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

2018-02-22 Thread Martin Maechler
> Gabriel Becker 
> on Wed, 21 Feb 2018 07:11:44 -0800 writes:

> Hi all,
> For the record this approach isnt 100% backwards compatible, because
> names(mergeddf) will e incompatibly different. Thatx why i claimed
> bakcwards compatable-ish

exactly.

> That said its still worth considering imho because of the reasons stated
> (and honestly one particular simple reading of the docs might suggest that
> this was thr intended behavior all along). Im not a member of Rcore 
through
> so i cant do said considering myself.

I agree with Scott, Frederik and you that this changes seems
worth considering.
As Duncan Murdoch has mentioned, this alone may not be
sufficient.

In addition to your proposed patch (which I have simplified, not
using intersection() but working with underlying  match()
directly), it is little work to introduce an extra argument, I'm
calling  'no.dups = TRUE'  which when set to false would mirror
current R's behavior... and documenting it, then also documents the
new behavior (to some extent).

My plan is to commit it soonish ;-)
Martin

> Best,
> ~G

> On Feb 20, 2018 7:15 PM,  wrote:

> Hi Scott,

> I tried the new patch and can confirm that it has the advertised
> behavior on a couple of test cases. I think it makes sense to apply
> it, because any existing code which refers to a second duplicate
> data.frame column by name is already broken, while if the reference is
> by numerical index then changing the column name shouldn't break it.

> I don't know if you need to update the documentation as part of your
> patch, or if whoever applies it would be happy to do that. Somebody
> from R core want to weigh in on this?

> I attach a file with the test example from your original email as well
> as a second test case I added with two "by" columns.

> Thanks,

> Frederick

> On Wed, Feb 21, 2018 at 10:06:21AM +1100, Scott Ritchie wrote:
>> Hi Frederick,
>> 
>> It looks like I didn't overwrite the patch.diff file after the last 
edits.
>> Here's the correct patch (attached and copied below):
>> 
>> Index: src/library/base/R/merge.R
>> ===
>> --- src/library/base/R/merge.R (revision 74280)
>> +++ src/library/base/R/merge.R (working copy)
>> @@ -157,6 +157,14 @@
>> }
>> 
>> if(has.common.nms) names(y) <- nm.y
>> +## If by.x %in% names(y) then duplicate column names still 
arise,
>> +## apply suffixes to just y - this keeps backwards compatibility
>> +## when referring to by.x in the resulting data.frame
>> +dupe.keyx <- intersect(nm.by, names(y))
>> +if(length(dupe.keyx)) {
>> +  if(nzchar(suffixes[2L]))
>> +names(y)[match(dupe.keyx, names(y), 0L)] <- paste(dupe.keyx,
>> suffixes[2L], sep="")
>> +}
>> nm <- c(names(x), names(y))
>> if(any(d <- duplicated(nm)))
>> if(sum(d) > 1L)
>> 
>> Best,
>> 
>> Scott
>> 
>> On 21 February 2018 at 08:23,  wrote:
>> 
>> > Hi Scott,
>> >
>> > I think that's a good idea and I tried your patch on my copy of the
>> > repository. But it looks to me like the recent patch is identical to
>> > the previous one, can you confirm this?
>> >
>> > Frederick
>> >
>> > On Mon, Feb 19, 2018 at 07:19:32AM +1100, Scott Ritchie wrote:
>> > > Thanks Gabriel,
>> > >
>> > > I think your suggested approach is 100% backwards compatible
>> > >
>> > > Currently in the case of duplicate column names only the first can be
>> > > indexed by its name. This will always be the column appearing in 
by.x,
>> > > meaning the column in y with the same name cannot be accessed.
> Appending
>> > > ".y" (suffixes[2L]) to this column means it can now be accessed, 
while
>> > > keeping the current behaviour of making the key columns always
> accessible
>> > > by using the names provided to by.x.
>> > >
>> > > I've attached a new patch that has this behaviour.
>> > >
>> > > Best,
>> > >
>> > > Scott
>> > >
>> > > On 19 February 2018 at 05:08, Gabriel Becker 
>> > wrote:
>> > >
>> > > > It seems like there is a way that is backwards compatible-ish in 
the
>> > sense
>> > > > mentioned and still has the (arguably, but a good argument I think)
>> > better
>> > > > behavior:
>> > > >
>> > > > if by.x is 'name', (AND by.y is not also 'name'), then x's 'name'
>> > column
>> > > > is called name and y's 'name' column (not used int he merge) is
>> > changed to
>> > > > name.y.
>> > > >
>> > > > Now of course this would still change output, but it would change
> it to
>> > > > something I think would be better, while retaining the 'merge
> columns
>> > > > reta

Re: [Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

2018-02-23 Thread Martin Maechler
>>>>> Scott Ritchie 
>>>>> on Fri, 23 Feb 2018 12:32:41 +1100 writes:

> Thanks Martin!
> Can you clarify the functionality of the 'no.dups' argument so I can 
change
> my patch to `data.table:::merge.data.table` accordingly?

> - When `no.dups=TRUE` will the suffix to the by.x column name? Or will it
> take the functionality of the second functionality where only the column 
in
> y has the suffix added?
> - When `no.dups=FALSE` will the output be the same as it currently (no
> suffix added to either column)? Or will add the suffix to the column in y?

I had started from your patch... and worked from there.
So, there's no need (and use) to provide another one.

I also needed to update the man page, add a regression test, add
an entry to NEWS.Rd ...

Just wait until I commit..
Martin




> Best,

> Scott

> On 22 February 2018 at 22:31, Martin Maechler 
> wrote:

>> >>>>> Gabriel Becker 
>> >>>>> on Wed, 21 Feb 2018 07:11:44 -0800 writes:
>> 
>> > Hi all,
>> > For the record this approach isnt 100% backwards compatible, because
>> > names(mergeddf) will e incompatibly different. Thatx why i claimed
>> > bakcwards compatable-ish
>> 
>> exactly.
>> 
>> > That said its still worth considering imho because of the reasons
>> stated
>> > (and honestly one particular simple reading of the docs might
>> suggest that
>> > this was thr intended behavior all along). Im not a member of Rcore
>> through
>> > so i cant do said considering myself.
>> 
>> I agree with Scott, Frederik and you that this changes seems
>> worth considering.
>> As Duncan Murdoch has mentioned, this alone may not be
>> sufficient.
>> 
>> In addition to your proposed patch (which I have simplified, not
>> using intersection() but working with underlying  match()
>> directly), it is little work to introduce an extra argument, I'm
>> calling  'no.dups = TRUE'  which when set to false would mirror
>> current R's behavior... and documenting it, then also documents the
>> new behavior (to some extent).
>> 
>> My plan is to commit it soonish ;-)
>> Martin
>> 
>> > Best,
>> > ~G
>> 
>> > On Feb 20, 2018 7:15 PM,  wrote:
>> 
>> > Hi Scott,
>> 
>> > I tried the new patch and can confirm that it has the advertised
>> > behavior on a couple of test cases. I think it makes sense to apply
>> > it, because any existing code which refers to a second duplicate
>> > data.frame column by name is already broken, while if the reference
>> is
>> > by numerical index then changing the column name shouldn't break it.
>> 
>> > I don't know if you need to update the documentation as part of your
>> > patch, or if whoever applies it would be happy to do that. Somebody
>> > from R core want to weigh in on this?
>> 
>> > I attach a file with the test example from your original email as
>> well
>> > as a second test case I added with two "by" columns.
>> 
>> > Thanks,
>> 
>> > Frederick
>> 
>> > On Wed, Feb 21, 2018 at 10:06:21AM +1100, Scott Ritchie wrote:
>> >> Hi Frederick,
>> >>
>> >> It looks like I didn't overwrite the patch.diff file after the last
>> edits.
>> >> Here's the correct patch (attached and copied below):
>> >>
>> >> Index: src/library/base/R/merge.R
>> >> ===
>> >> --- src/library/base/R/merge.R (revision 74280)
>> >> +++ src/library/base/R/merge.R (working copy)
>> >> @@ -157,6 +157,14 @@
>> >> }
>> >>
>> >> if(has.common.nms) names(y) <- nm.y
>> >> +## If by.x %in% names(y) then duplicate column names still
>> arise,
>> >> +## apply suffixes to just y - this keeps backwards
>> compatibility
>> >> +## when referring to by.x in the resulting data.frame
>> >> +dupe.keyx <- intersect(nm.by, names(y))
>> >> +if(length(dupe.keyx)) {
>> >> +  

Re: [Rd] Problem with R_registerRoutines

2018-02-23 Thread Martin Maechler
>   
> on Fri, 23 Feb 2018 15:43:43 + writes:

> Thanks a lot for your answer Jeroen!
> I should have mentioned that I had actually only checked with the 
win-builder, as I did not have R-devel installed on my computer.
> But based on your answer I installed R-devel locally on a Linux-server 
(Redhat), and the package could be checked without the NOTE. So you might be 
right that this is a windows issue. However, another package that I am 
maintaining does not get any notes from the check on the win-builder (including 
fortran-code), so there is still something I don't understand here.

> Anyway, does this mean that the package might be accepted on CRAN without 
further changes?

at least not automatically, and not very probably in my gut
feeling... but I may be wrong.

Did you use  R CMD check --as-cran  
on Linux ?

Martin

> Thanks,
> Jon


> 
> From: Jeroen Ooms [jeroeno...@gmail.com]
> Sent: 23 February 2018 13:36
> To: SKOIEN Jon (JRC-ISPRA)
> Cc: r-devel
> Subject: Re: [Rd] Problem with R_registerRoutines

> On Windows this warning may be a false positive if R cannot find
> "objdump.exe" which is required for this check. I think this is
> actually a bug in R because it should be looking for "objdump.exe"
> inside BINPREF (where gcc is) rather than on the PATH.

> Can you check if you get the same warning if you upload the package to
> https://win-builder.r-project.org ?






> On Fri, Feb 23, 2018 at 10:28 AM,   wrote:
>> Dear list,
>> 
>> I am trying to update a package to pass the CRAN-checks.
>> But I am struggling with the following note:
>> 
>> File 'psgp/libs/i386/psgp.dll':
>> Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
>> File 'psgp/libs/x64/psgp.dll':
>> Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
>> 
>> It is good practice to register native routines and to disable symbol
>> search.
>> 
>> 
>> I did already run:
>> tools::package_native_routine_registration_skeleton(".")
>> This gave me some code, including a function R_init_psgp, which includes 
calls to the functions above, and also the names of the C++ functions to be 
called from R.
>> I first saved this code in registerDynamicSymbol.c and added 
.registration = TRUE to useDynLib in the NAMESPACE file.
>> I still get the error above. As I saw that the file has different names 
in other packages, I have also tried to save it psgp_init.c, and in init.cpp, 
still with the same error message.
>> 
>> I have read the relevant part of the R extensions manual, but could not 
find anything that could help me with this problem.
>> I have had a look at the similar files in other packages (including one 
of my own, which works), and the initialization seems fine to me.
>> There is surely something I have overlooked, is anyone able to give me a 
hint to where I might look? The code is in C++, not sure if that could have 
anything to do with the problem?
>> 
>> Thanks,
>> Jon
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix

2018-03-01 Thread Martin Maechler
>>>>> Michael Chirico 
>>>>> on Tue, 27 Feb 2018 20:18:34 +0800 writes:

Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* sparse)

MM: modified to commented R code,  slightly changed from your post:


## I am attempting to use the lars package with a sparse input feature matrix,
## but the following fails:

library(Matrix)
library(lars)
data(diabetes) # from 'lars'
##UAagghh! not like this -- both attach() *and*   as.data.frame()  are horrific!
##UA  attach(diabetes)
##UA  x = as(as.matrix(as.data.frame(x)), 'dgCMatrix')
x <- as(unclass(diabetes$x), "dgCMatrix")
lars(x, y, intercept = FALSE)
## Error in scale.default(x, FALSE, normx) :
##   length of 'scale' must equal the number of columns of 'x'

## More specifically, scale.default fails as called from lars():
normx <- new("dgeMatrix",
  x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L),
  Dimnames = list(NULL,
  c("x.age", "x.sex", "x.bmi", "x.map", "x.tc",
"x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")))
scale.default(x, center=FALSE, scale = normx)
## Error in scale.default(x, center = FALSE, scale = normx) :
##   length of 'scale' must equal the number of columns of 'x'

>  The problem is that this check fails because is.numeric(normx) is FALSE:

>  if (is.numeric(scale) && length(scale) == nc)

>  So, the error message is misleading. In fact length(scale) is the same as
>  nc.

Correct, twice.

>  At a minimum, the error message needs to be repaired; do we also want to
>  attempt as.numeric(normx) (which I believe would have allowed scale to work
>  in this case)?

It seems sensible to allow  both 'center' and 'scale' to only
have to *obey*  as.numeric(.)  rather than fulfill is.numeric(.).

Though that is not a bug in scale()  as its help page has always
said that 'center' and 'scale' should either be a logical value
or a numeric vector.

For that reason I can really claim a bug in 'lars' which should
really not use

   scale(x, FALSE, normx)

but rather

   scale(x, FALSE, scale = as.numeric(normx))

and then all would work.

> -

>  (I'm aware that there's some import issues in lars, as the offending line
>  to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*%
>  (x^2 is TRUE -- it's simply that lars doesn't import the appropriate S4
>  methods)

>  Michael Chirico

Yes, 'lars' has _not_ been updated since  Spring 2013, notably
because its authors have been saying (for rather more than 5
years I think) that one should really use 

 require("glmnet")

instead.

Your point is still valid that it would be easy to enhance
base :: scale.default()  so it'd work in more cases.

Thank you for that.  I do plan to consider such a change in
R-devel (planned to become R 3.5.0 in April).

Martin Maechler,
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unclosed parenthesis in grep.Rd

2018-03-05 Thread Martin Maechler
> Hugh Parsonage 
> on Mon, 5 Mar 2018 13:39:24 +1100 writes:

> Lines 129-131: \code{grep(value = FALSE)} returns a vector
> of the indices of the elements of \code{x} that yielded a
> match (or not, for \code{invert = TRUE}. This will be an
> integer vector unless the input

> There should be a closing parenthesis after \code{invert =
> TRUE}

Thank you, Hugh!  I've added the ')' now.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   3   4   5   6   7   8   9   10   >