Re: [Rd] sweep sanity checking?

2007-08-22 Thread Turner, Heather
Petr Savicky kindly brought this thread to my attention as I'm afraid it
had passed me by. As one of the contributors to the earlier discussion
on adding warnings to sweep I would like to give my support to Petr's
proposed patch.

For the record I should say that Petr was right to point out that the
use of MARGIN in my examples did not make sense
https://stat.ethz.ch/pipermail/r-devel/2007-July/046487.html
so I have no quibble with that.

I think it is sensible too, to use the dim attribute of STATS as the
basis of the test, when the dim attribute is present. This provides a
way to control the strength of the test in the case of sweeping out a
vector, as Petr describes in his message below. I think that the
proposed patch successfully brings together the different views on what
should be tested, which was the stumbling block last time around
https://stat.ethz.ch/pipermail/r-help/2005-June/074037.html

Even if people set check.margin = FALSE for reasons of speed, this in
itself should be a useful check, since they will need to be confident
that the test is unnecessary.

Heather

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Petr Savicky
Sent: 08 August 2007 07:54
To: r-devel@r-project.org
Subject: Re: [Rd] sweep sanity checking?

Thanks to Martin Maechler for his comments, advice and for pointing
out the speed problem. Thanks also to Ben Bolker for tests of speed,
which confirm that for small arrays, a slow down by a factor of about
1.2 - 1.5 may occur. Now, I would like to present a new version of
sweep,
which is simpler and has an option to avoid the test. This is expected
to be used in scripts, where the programmer is quite sure that the
usage is correct and speed is required. The new version differs from
the previous one in the following:

1. The option check.margin has a different meaning. It defaults to TRUE
   and it determines whether the test is performed or not.

2. Since check.margin has the meaning above, it cannot be used
   to select, which test should be performed. This depends on the
   type of STATS. The suggested sweep function contains two tests:
   - a vector test by Heather Turner, which is used, if STATS 
 has no dim attribute and, hence, is a vector (STATS should
 not be anything else than a vector or an array)
   - an array test used if STATS has dim attribute.
   The vector test allows some kinds of recycling, while the array test
   does not. Hence, in the most common case, where x is a matrix
   and STATS is a vector, if the user likes to be warned if the length
   of the vector is not exactly the right one, the following call is
   suggested: sweep(x,MARGIN,as.array(STATS)). Otherwise, a warning
   will be generated only if length(STATS) does not divide the specified
   dimension of x, which is nrow(x) (MARGIN=1) or ncol(x) (MARGIN=2).

3. If STATS is an array, then the test is more restrictive than in
   the previous version. It is now required that after deleting
   dimensions with one level, the remaining dimensions coincide.
   The previous version allowed additionally the cases, when dim(STATS)
   is a prefix of dim(x)[MARGIN], for example, if dim(STATS) = k1 and
   dim(x)[MARGIN] = c(k1,k2).

The code of the tests in the suggested sweep is based on the previous
suggestions
 https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin
Hankin
 https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather
Turner
 https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben
Bolker
with some further modifications.

The modification of sweep.Rd was prepared by Ben Bolker and me.

I would like to encourage everybody who likes to express his opinion
on the patch to do it now. In my opinion, the suggestion of the
new code stabilized in the sense that I will not modify it unless
there is a negative feedback.

A patch against R-devel_2007-08-06 is attached. It contains tabs. If
they
are corrupted by email transfer, use the link
  http://www.cs.cas.cz/~savicky/R-devel/patch-sweep
which is an identical copy.

Petr Savicky.



--- R-devel_2007-08-06/src/library/base/R/sweep.R   2007-07-27
17:51:13.0 +0200
+++ R-devel_2007-08-06-sweep/src/library/base/R/sweep.R 2007-08-07
10:30:12.383672960 +0200
@@ -14,10 +14,29 @@
 #  A copy of the GNU General Public License is available at
 #  http://www.r-project.org/Licenses/
 
-sweep <- function(x, MARGIN, STATS, FUN = "-", ...)
+sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=TRUE, ...)
 {
 FUN <- match.fun(FUN)
 dims <- dim(x)
+   if (check.margin) {
+   dimmargin <- dims[MARGIN]
+   dimstats <- dim(STATS)
+   lstats <- length(STATS)
+   if (lstats > prod(dimmargin)) {
+   warning("length of STATS greater than the extent
of dim(x)[MARGIN]")
+   } else if (is.null(d

[Rd] paste() with NAs .. change worth persuing?

2007-08-22 Thread Martin Maechler

Consider this example code

 c1 <- letters[1:7]; c2 <- LETTERS[1:7]
 c1[2] <- c2[3:4] <- NA
 rbind(c1,c2)

  ##   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
  ## c1 "a"  NA   "c"  "d"  "e"  "f"  "g" 
  ## c2 "A"  "B"  NA   NA   "E"  "F"  "G" 

  paste(c1,c2)

  ## -> [1] "a A"  "NA B" "c NA" "d NA" "e E"  "f F"  "g G" 

where a more logical result would have entries 2:4 equal to
  NA 
i.e.,  as.character(NA)
akaNA_character_

Is this worth persuing, or does anyone see why not?

Regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste() with NAs .. change worth persuing?

2007-08-22 Thread Duncan Murdoch
On 8/22/2007 11:50 AM, Martin Maechler wrote:
> Consider this example code
> 
>  c1 <- letters[1:7]; c2 <- LETTERS[1:7]
>  c1[2] <- c2[3:4] <- NA
>  rbind(c1,c2)
> 
>   ##   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
>   ## c1 "a"  NA   "c"  "d"  "e"  "f"  "g" 
>   ## c2 "A"  "B"  NA   NA   "E"  "F"  "G" 
> 
>   paste(c1,c2)
> 
>   ## -> [1] "a A"  "NA B" "c NA" "d NA" "e E"  "f F"  "g G" 
> 
> where a more logical result would have entries 2:4 equal to
>   NA 
> i.e.,  as.character(NA)
> akaNA_character_
> 
> Is this worth persuing, or does anyone see why not?

A fairly common use of paste is to put together reports for human 
consumption.  Currently we have

 > p <- as.character(NA)
 > paste("the value of p is", p)
[1] "the value of p is NA"

which looks reasonable. Would this become

 > p <- as.character(NA)
 > paste("the value of p is", p)
[1] NA

under your proposal?  (In a quick search I was unable to find a real 
example where this would happen, but it would worry me...)

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] compiling R under cygwin

2007-08-22 Thread Denham Robert
>> For various reasons,
>I think it is only courteous to mention some good reasons if you want
to take up people's time.

Some of the reasons we would like a cygwin version aren't necessarily
good reasons.  We have been using cygwin for sometime, mostly to deal
with scripting in a combined windows/unix environment.  We have a setup
which allows windows users to run many scripts in the same way as unix
users.  These scripts are often python or shell scripts.  We have R
installed on the unix machines, and the system administrators would like
to be able to have R on windows in the same environment.  This set up
also means that the administrator can fairly easily maintain the version
of software used on all user's machines.  Probably this could all be
managed and still use the native windows version of R, but the
administrator is familiar with cygwin and they could manage this
software in the same way they manage other packages. 

We would like to be able to use linux machines on pc's but unfortunately
we have restrictions imposed on us that prevent this.  This restriction
also goes as far as the use of virtual machines.  My personal preference
would be to run linux on my work pc, and use a virtual machine to run
windows software, such as ArcGIS and Imagine, that are not available for
linux.  This does not seem to be an option for us.

One thing I was interested in was knowing if there are others who also
would like a cygwin version.  From the replies to my post, and from a
search of the mailing list archive, I think that there is little demand
for this.  We would, however, be prepared to help in some way for the
few people who are interested.   



Robert Denham
Environmental Statistician
Remote Sensing Centre
Telephone 07 3896 9899 
www.nrw.qld.gov.au
 
Department of Natural Resources & Water
QScape Building, 80 Meiers Road, Indooroopilly Qld 4068

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 21 August 2007 9:53 PM
To: Duncan Murdoch
Cc: Denham Robert; r-devel@r-project.org
Subject: Re: [Rd] compiling R under cygwin

Yes,

> What is the advantage of building this?

was my question too.  If you want a Unix-like version of R on PC
hardware running Windows why not run a Unix-like OS under a virtual
machine?

Quite a lot of the details are wrong: using FLIBS, BLAS_LIBS and LIBS as
intended will solve most of the problems.  I would use --disable-nls
--disable-mbcs as you don't need them (and in particular you don't
benefit from MBCS support on Windows unless you are in a CJK locale).

Note that 2.5.1 is released and there is unlikely to be a 2.5.2, so any
changes would be made only to R-devel.  It there is a convincing case to
tailor a build for Cygwin there we can probably do so rather easily, but
the need for ongoing support would be a worry.

(If platforms are not used and in particular not tested in the
alpha/beta testing phases then the ability to build on them crumbles
away.  We seems to be down to regular testers on Linux, Windows, MacOS
X, Solaris and FreeBSD, with some help on AIX after a patch with none.)

On Tue, 21 Aug 2007, Duncan Murdoch wrote:

> Denham Robert wrote:
>> For various reasons,

I think it is only courteous to mention some good reasons if you want to
take up people's time.

>> it suits our workplace to have a cygwin version of R.  I am pretty 
>> sure that cygwin is still not a supported environment for R, but we 
>> have managed to compile R-2.5.1 under cygwin without too many dramas.

>> Our procedure is described below.  We still have a few problems 
>> compiling libraries without manually changing files from .so to .dll,

>> but it seems ok.
>>
> I would expect other subtle problems as well, because Cygwin is not a 
> normal Unix.  I don't know whether any of these differences matter to 
> R, but some things to look out for are:
>
> - you can't unlink a file while it is open
> - filenames are not case sensitive
> - file permissions have strange defaults (everything is executable)
> - I think the executable format still needs to be Windows format
> - There's no such thing as a ptty
> - You'll probably need X11 for graphics, and will lose support for 
> Windows metafile output (wmf)
>>
>> I was wondering whether this information is likely to be useful to 
>> others, and if we should spend any time looking in to ways in which 
>> the configure/build/install code could be modified to allow a 
>> standard install.
>>
> What is the advantage of building this?  I don't think we want to 
> support platforms just for the sake of supporting more platforms, but 
> if there's a real need for it, that would be different.
>
> Duncan Murdoch
>>
>> Notes on building R under cygwin:
>>
>> export FFLAGS=-O3
>> export CFLAGS=-O3
>> export CXXFLAGS=-O3
>> export OBJCFLAGS=-O3
>> export FCFLAGS=-O3
>> export LDFLAGS='-lblas -lg2c -lintl'
>>
>> export R_OSTYPE=unix
>>
>> ./configure --prefix=/opt/freeware/R/R-2.5.1 \ 
>> --with-tcl-config=/usr/lib/tclConfi

Re: [Rd] compiling R under cygwin

2007-08-22 Thread Prof Brian Ripley
On Thu, 23 Aug 2007, Denham Robert wrote:

>>> For various reasons,
>> I think it is only courteous to mention some good reasons if you want
> to take up people's time.
>
> Some of the reasons we would like a cygwin version aren't necessarily
> good reasons.  We have been using cygwin for sometime, mostly to deal
> with scripting in a combined windows/unix environment.  We have a setup
> which allows windows users to run many scripts in the same way as unix
> users.  These scripts are often python or shell scripts.  We have R
> installed on the unix machines, and the system administrators would like
> to be able to have R on windows in the same environment.  This set up
> also means that the administrator can fairly easily maintain the version
> of software used on all user's machines.  Probably this could all be
> managed and still use the native windows version of R, but the
> administrator is familiar with cygwin and they could manage this
> software in the same way they manage other packages.

Yes, it could almost certainly be done with Rterm.exe.

The issue I came across was the so-called 'posix file paths' that Cygwin 
uses.  Most (but not all) Windows programs accept file paths with / as the 
path separator, and most (but not all, e.g. tar) Cygwin programs accept 
paths of the forn c:/path/to/file.  So provided you use that as your
format, interworking with Unix and Unix-like shells work fine.  It used to 
be the case that if you had just one drive C: then Cygwin programs 
produced paths of the form /path/to/file that also worked on Windows.  Now 
they produce /cygdrive/c/path/to/file that works nowhere else.

In general this is a minor nuisance, but I needed to be able to 
cross-build R in an environment where I only have Cygwin-based 
cross-compilers, and there the path issues bit me: I needed a version of R 
that accepted and returned Cygwin-style paths.  So I made the configure 
changes necessary to build R under Cygwin, and had it running in 20 mins.

> We would like to be able to use linux machines on pc's but unfortunately
> we have restrictions imposed on us that prevent this.  This restriction
> also goes as far as the use of virtual machines.  My personal preference
> would be to run linux on my work pc, and use a virtual machine to run
> windows software, such as ArcGIS and Imagine, that are not available for
> linux.  This does not seem to be an option for us.
>
> One thing I was interested in was knowing if there are others who also
> would like a cygwin version.  From the replies to my post, and from a
> search of the mailing list archive, I think that there is little demand
> for this.  We would, however, be prepared to help in some way for the
> few people who are interested.

As I said earlier, it builds out of the box in R-devel (with suitable 
options documented in the R-admin manual).  No guarantees that it will 
continue to do so unless tested in the alpha/beta phase though.  As no 
other platform we use nowadays requires that shared objects/dynamic 
libraries have all imports satisfied at build time, this is liable to get 
broken.

But I would encourage people to use Rterm.exe if it can be made to do what 
you need.


>
>
>
> Robert Denham
> Environmental Statistician
> Remote Sensing Centre
> Telephone 07 3896 9899
> www.nrw.qld.gov.au
>
> Department of Natural Resources & Water
> QScape Building, 80 Meiers Road, Indooroopilly Qld 4068
>
> -Original Message-
> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, 21 August 2007 9:53 PM
> To: Duncan Murdoch
> Cc: Denham Robert; r-devel@r-project.org
> Subject: Re: [Rd] compiling R under cygwin
>
> Yes,
>
>> What is the advantage of building this?
>
> was my question too.  If you want a Unix-like version of R on PC
> hardware running Windows why not run a Unix-like OS under a virtual
> machine?
>
> Quite a lot of the details are wrong: using FLIBS, BLAS_LIBS and LIBS as
> intended will solve most of the problems.  I would use --disable-nls
> --disable-mbcs as you don't need them (and in particular you don't
> benefit from MBCS support on Windows unless you are in a CJK locale).
>
> Note that 2.5.1 is released and there is unlikely to be a 2.5.2, so any
> changes would be made only to R-devel.  It there is a convincing case to
> tailor a build for Cygwin there we can probably do so rather easily, but
> the need for ongoing support would be a worry.
>
> (If platforms are not used and in particular not tested in the
> alpha/beta testing phases then the ability to build on them crumbles
> away.  We seems to be down to regular testers on Linux, Windows, MacOS
> X, Solaris and FreeBSD, with some help on AIX after a patch with none.)
>
> On Tue, 21 Aug 2007, Duncan Murdoch wrote:
>
>> Denham Robert wrote:
>>> For various reasons,
>
> I think it is only courteous to mention some good reasons if you want to
> take up people's time.
>
>>> it suits our workplace to have a cygwin version of R.  I am pretty
>>> sure that