[Rd] Automatic implementation of "trivial" constraints in optimization

2007-08-07 Thread Christophe Pouzat
Hi all,

I am wondering if anyone has implemented (or at least tried to) an automatic
reparametrization in order to satisfy "trivial" constraints (in the sense of
Dennis & Schnabel, 1983) in optimization problems.
To be perhaps clearer let us consider a simple bi-exponential model for some
recorded signal (sorry for the LaTex notations I hope they aren't too
confusing):
$s(t) = A (p_{fast} \exp(-\frac{t}{t_{fast}}) + p_{slow}
\exp(-\frac{t}{t_{slow}}) $
where we would have: $t_{fast} > 0$ and $t_{slow} > 0$ , $0 < p_{fast},
p_{slow} < 1$ and $p_{fast}+p_{slow}=1$. In addition we could want to
enforce $A>0$, say, because our signal corresponds physically to a
concentration and for ease of interpretation having $t_{fast} < t_{slow}$
would not hurt. We could then reparametrize our problem with:
$A = \exp (\alpha)$
$t_{fast} = \exp (\tau_{fast})$
$t_{slow} = t_{fast} + \exp (\tau_{slow})$
$p_{fast} = 0.5 + 0.5 \frac{\pi_{fast}}{\sqrt{1+\pi_{fast}^2}}$
$p_{slow} = 1 - p_{fast}$
and the four parameters: $\alpha, \tau_{fast}, \tau_{slow}, \pi_{fast}$
would be unconstrained.
One could then think of allowing "optim" users to specify more general
boundaries ("lower", "upper") which would be used to generate a parameter
transformation function. This function, say u2c (for "unconstrained to
constrained"),  would take a vector of parameters as one of its arguments
and would return another vector (perhaps of a longer length). Then optim
would internaly optimize f(u2c(p)) instead of f. Clearly more sophisticated
constraint specifications (than "lower" and "upper") would be required, like
a way to declare that some parameters form a simplex or that others are
encoding a variance-covariance matrix (in the latter case the pdLogChol
function of nlme already provides the reparametrization if I'm not wrong).

Assuming that such a functionality does not exist yet in R (if it does,
sorry to have missed it) do you guys think that:
1) it's not necessary because users can take care of it for themselves
2) it would be complicated and not general enough
3) it would be worth trying

Thanks for your opinions,

Christophe

-- 
A Master Carpenter has many tools and is expert with most of them. If you
only know how to use a hammer, every problem starts to look like a nail.
Stay away from that trap.
Richard B Johnson.
--

Christophe Pouzat
Laboratoire de Physiologie Cerebrale
CNRS UMR 8118
UFR biomedicale de l'Universite Paris V
45, rue des Saints Peres
75006 PARIS
France

tel: +33 (0)1 42 86 38 28
fax: +33 (0)1 42 86 38 30

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Optimization in R

2007-08-07 Thread Duncan Murdoch
Those are small parts of the calculation, not the whole thing.  The 
original point was that optim() is a very thin wrapper around the code 
to do the optimization.  I just don't see a need to make it more 
complicated so it can be used to wrap other methods.  Authors of new 
optimization methods can just create new functions, following the 
pattern set by optim(), and it will be easier for almost everyone.

Duncan Murdoch

On 06/08/2007 11:18 PM, Andrew Robinson wrote:
>  ... Variance and correlation model classes in nlme.
> 
> Cheers
> 
> Andrew
> 
> On Mon, Aug 06, 2007 at 09:55:38PM -0500, hadley wickham wrote:
>> On 8/4/07, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
>>> On 04/08/2007 2:53 PM, Gabor Grothendieck wrote:
 The example of generic functions.
>>> Show me an example where we have a list of ways to do a calculation
>>> passed as an argument (analogous to the method argument of optim), where
>>> the user is allowed to add his own function to the list.
>> Bin width selection in hist?  Family functions for glm?  Those come
>> quickly to my mind, but I'm sure there are others.
>>
>> Hadley
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Optimization in R

2007-08-07 Thread hadley wickham
On 8/7/07, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> Those are small parts of the calculation, not the whole thing.  The
> original point was that optim() is a very thin wrapper around the code
> to do the optimization.  I just don't see a need to make it more
> complicated so it can be used to wrap other methods.  Authors of new
> optimization methods can just create new functions, following the
> pattern set by optim(), and it will be easier for almost everyone.

Another alternative would be to describe a common interface to
optimisation functions (like the modelling functions).  Otherwise it
becomes a hassle to switch in and out different functions because they
each have slightly different interfaces (eg. clustering and
classification algorithms).

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sligthly OT Re: Makefile for embedding OpenBUGS in R package

2007-08-07 Thread Hin-Tak Leung
Prof Brian Ripley wrote:
> OpenBUGS is distributed under GPL2, so this seems not to apply.
> It is distributed as source and as binaries: the difficulty is that it 
> is written in Object Pascal for which a compiler is not readily available.

Argh, I just thought of a proper technical reason, and I think I have 
spotted a possible bug in the original poster's code! Some choose to do
dlopen() when the DLL/so is in a non-standard/non-system location, as an
alternative to setting LD_LIBRARY_PATH explicitly or other link-loader
magics.

The line:
handle = dlopen("./brugs.so", RTLD_LAZY);

Seems to suggest this, However, the problem with this code, is that
the current directory  (./) may not be where the user thinks it is.
I think the user meant to prepend $R_HOME/library//inst/ 
somehow to "brugs.so", and dlopen'ing 
"$R_HOME/library//inst/brugs.so" instead.

Hin-Tak



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Embedded nuls in strings

2007-08-07 Thread Herve Pages
Hi,

?rawToChar
 'rawToChar' converts raw bytes either to a single character string
 or a character vector of single bytes.  (Note that a single
 character string could contain embedded nuls.)

Allowing embedded nuls in a string might be an interesting experiment but it
seems to cause some troubles to most of the string manipulation functions.

A string with an embedded 0:

  raw0 <- as.raw(c(65:68, 0 , 70))
  string0 <- rawToChar(raw0)

> string0
[1] "ABCD\0F"

nchar() should return 6:
> nchar(string0)
[1] 4

In addition this embedded nul seems to break almost all string 
manipulation/searching
functions:
  grep("F", string0)
  strsplit(string0, split=NULL, fixed=TRUE)[[1]]
  tolower(string0)
  chartr("F", "x", string0)
  substr(string0, 6, 6)
  ...
  etc...

Not very surprisingly, they all seem to treat string0 as if it was "ABCD"!

Cheers,
H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Steven McKinney
I get similar results on an Apple Mac G5
running OS X, though nchar() works.

>   raw0 <- as.raw(c(65:68, 0 , 70))
>   string0 <- rawToChar(raw0)
> raw0
[1] 41 42 43 44 00 46
> string0
[1] "ABCD\0F"

> nchar(string0)
[1] 6

> grep("F", string0)
integer(0)
>   strsplit(string0, split=NULL, fixed=TRUE)[[1]]
[1] "A" "B" "C" "D"
>   tolower(string0)
[1] "abcd"
>   chartr("F", "x", string0)
[1] "ABCD"
>   substr(string0, 6, 6)
[1] ""
> 
> sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "splines"   "stats" "graphics"  "grDevices" "utils" "datasets"  
"methods"   "base" 
> 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Herve Pages
Sent: Tue 8/7/2007 2:06 PM
To: r-devel@r-project.org
Subject: [Rd] Embedded nuls in strings
 
Hi,

?rawToChar
 'rawToChar' converts raw bytes either to a single character string
 or a character vector of single bytes.  (Note that a single
 character string could contain embedded nuls.)

Allowing embedded nuls in a string might be an interesting experiment but it
seems to cause some troubles to most of the string manipulation functions.

A string with an embedded 0:

  raw0 <- as.raw(c(65:68, 0 , 70))
  string0 <- rawToChar(raw0)

> string0
[1] "ABCD\0F"

nchar() should return 6:
> nchar(string0)
[1] 4

In addition this embedded nul seems to break almost all string 
manipulation/searching
functions:
  grep("F", string0)
  strsplit(string0, split=NULL, fixed=TRUE)[[1]]
  tolower(string0)
  chartr("F", "x", string0)
  substr(string0, 6, 6)
  ...
  etc...

Not very surprisingly, they all seem to treat string0 as if it was "ABCD"!

Cheers,
H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Duncan Murdoch
On 07/08/2007 5:06 PM, Herve Pages wrote:
> Hi,
> 
> ?rawToChar
>  'rawToChar' converts raw bytes either to a single character string
>  or a character vector of single bytes.  (Note that a single
>  character string could contain embedded nuls.)
> 
> Allowing embedded nuls in a string might be an interesting experiment but it
> seems to cause some troubles to most of the string manipulation functions.
> 
> A string with an embedded 0:
> 
>   raw0 <- as.raw(c(65:68, 0 , 70))
>   string0 <- rawToChar(raw0)
> 
>> string0
> [1] "ABCD\0F"
> 
> nchar() should return 6:
>> nchar(string0)
> [1] 4

You don't state your R version.  The default type of counting in nchar() 
has recently changed from "bytes" (where 6 is correct) to "chars" (where 
4 is correct).

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Herve Pages
Duncan Murdoch wrote:
> On 07/08/2007 5:06 PM, Herve Pages wrote:
>> Hi,
>>
>> ?rawToChar
>>  'rawToChar' converts raw bytes either to a single character string
>>  or a character vector of single bytes.  (Note that a single
>>  character string could contain embedded nuls.)
>>
>> Allowing embedded nuls in a string might be an interesting experiment
>> but it
>> seems to cause some troubles to most of the string manipulation
>> functions.
>>
>> A string with an embedded 0:
>>
>>   raw0 <- as.raw(c(65:68, 0 , 70))
>>   string0 <- rawToChar(raw0)
>>
>>> string0
>> [1] "ABCD\0F"
>>
>> nchar() should return 6:
>>> nchar(string0)
>> [1] 4
> 
> You don't state your R version.  The default type of counting in nchar()
> has recently changed from "bytes" (where 6 is correct) to "chars" (where
> 4 is correct).


Oops, sorry:

> sessionInfo()
R version 2.6.0 Under development (unstable) (2007-07-02 r42107)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] rcompgen_0.1-15


And indeed:
  raw0 <- as.raw(c(65:68, 0 , 70))
  string0 <- rawToChar(raw0)

> nchar(string0, type="chars")
[1] 4
> nchar(string0, type="bytes")
[1] 6


In addition to the string functions already mentioned before, it's worth noting 
that
'paste' doesn't seem to be "embedded nul aware" neither:

> paste(string0, "G", sep="")
[1] "ABCDG"

Same for serialization:

> save(string0, file="string0.rda")
> load("string0.rda")
> string0
[1] "ABCD"

One comment about the nchar man page:
  'chars' The number of human-readable characters.

"human-readable" seems to be used for "everything but a nul" here which can be 
confusing.
For example one would generally think of ascii codes 1 to 31 as non 
"human-readable" but
nchar() seems to disagree:

> string1 <- rawToChar(as.raw(1:31))
> string1
[1]
"\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037"
> nchar(string1, type="chars")
[1] 31


Cheers,
H.


> 
> Duncan Murdoch
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Duncan Murdoch
On 07/08/2007 6:29 PM, Herve Pages wrote:
> Duncan Murdoch wrote:
>> On 07/08/2007 5:06 PM, Herve Pages wrote:
>>> Hi,
>>>
>>> ?rawToChar
>>>  'rawToChar' converts raw bytes either to a single character string
>>>  or a character vector of single bytes.  (Note that a single
>>>  character string could contain embedded nuls.)
>>>
>>> Allowing embedded nuls in a string might be an interesting experiment
>>> but it
>>> seems to cause some troubles to most of the string manipulation
>>> functions.
>>>
>>> A string with an embedded 0:
>>>
>>>   raw0 <- as.raw(c(65:68, 0 , 70))
>>>   string0 <- rawToChar(raw0)
>>>
 string0
>>> [1] "ABCD\0F"
>>>
>>> nchar() should return 6:
 nchar(string0)
>>> [1] 4
>> You don't state your R version.  The default type of counting in nchar()
>> has recently changed from "bytes" (where 6 is correct) to "chars" (where
>> 4 is correct).
> 
> 
> Oops, sorry:
> 
>> sessionInfo()
> R version 2.6.0 Under development (unstable) (2007-07-02 r42107)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-15
> 
> 
> And indeed:
>   raw0 <- as.raw(c(65:68, 0 , 70))
>   string0 <- rawToChar(raw0)
> 
>> nchar(string0, type="chars")
> [1] 4
>> nchar(string0, type="bytes")
> [1] 6
> 
> 
> In addition to the string functions already mentioned before, it's worth 
> noting that
> 'paste' doesn't seem to be "embedded nul aware" neither:
> 
>> paste(string0, "G", sep="")
> [1] "ABCDG"
> 
> Same for serialization:
> 
>> save(string0, file="string0.rda")
>> load("string0.rda")
>> string0
> [1] "ABCD"

Of these, I'd say the serialization is the only case where it would be 
reasonable to fix the behaviour.  R depends on C run-time functions for 
most of the string operations, and they'll stop at a null.  So if this 
isn't documented behaviour, it should be, but it's not reasonable to 
rewrite the C run-time string functions just to handle such weird 
objects.  Functions like "grep" require thousands of lines of code, not 
written by us, and in my opinion maintaining changes to it is not 
something the R project should take on.

As to serialization:  there's a comment in the source that embedded 
nulls are handled by it, and that's true up to R-patched, but not in 
R-devel.  Looks like someone has introduced a bug.

Duncan Murdoch
> 
> One comment about the nchar man page:
>   'chars' The number of human-readable characters.
> 
> "human-readable" seems to be used for "everything but a nul" here which can 
> be confusing.
> For example one would generally think of ascii codes 1 to 31 as non 
> "human-readable" but
> nchar() seems to disagree:
> 
>> string1 <- rawToChar(as.raw(1:31))
>> string1
> [1]
> "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037"
>> nchar(string1, type="chars")
> [1] 31

No, "human-readable" also has other meanings in multi-byte encodings. 
If an e-acute is encoded in two bytes in your locale, it still only 
counts as one human-readable character.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Herve Pages
Duncan Murdoch wrote:
> On 07/08/2007 6:29 PM, Herve Pages wrote:
[...]
>> Same for serialization:
>>
>>> save(string0, file="string0.rda")
>>> load("string0.rda")
>>> string0
>> [1] "ABCD"
> 
> Of these, I'd say the serialization is the only case where it would be
> reasonable to fix the behaviour.  R depends on C run-time functions for
> most of the string operations, and they'll stop at a null.  So if this
> isn't documented behaviour, it should be, but it's not reasonable to
> rewrite the C run-time string functions just to handle such weird
> objects.  Functions like "grep" require thousands of lines of code, not
> written by us, and in my opinion maintaining changes to it is not
> something the R project should take on.

I was not (of course) suggesting to fix all the string manipulation functions.
I'm just wondering why R would try to support embedded nuls in the first
place given that they can only be a source of troubles.

What about this:

  > string0
  [1] "ABCD\0F"
  > string0 == "ABCD"
  [1] TRUE

string0 is obviously different from "ABCD"!

Maybe it's easier to change the semantic of rawToChar() so it doesn't return
a string with embedded nuls. More generally speaking, base functions should
always return "clean" strings.

> 
> As to serialization:  there's a comment in the source that embedded
> nulls are handled by it, and that's true up to R-patched, but not in
> R-devel.  Looks like someone has introduced a bug.
> 
> Duncan Murdoch
>>
>> One comment about the nchar man page:
>>   'chars' The number of human-readable characters.
>>
>> "human-readable" seems to be used for "everything but a nul" here
>> which can be confusing.
>> For example one would generally think of ascii codes 1 to 31 as non
>> "human-readable" but
>> nchar() seems to disagree:
>>
>>> string1 <- rawToChar(as.raw(1:31))
>>> string1
>> [1]
>> "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037"
>>
>>> nchar(string1, type="chars")
>> [1] 31
> 
> No, "human-readable" also has other meanings in multi-byte encodings. If
> an e-acute is encoded in two bytes in your locale, it still only counts
> as one human-readable character.
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Embedded nuls in strings

2007-08-07 Thread Duncan Murdoch
On 07/08/2007 9:13 PM, Herve Pages wrote:
> Duncan Murdoch wrote:
>> On 07/08/2007 6:29 PM, Herve Pages wrote:
> [...]
>>> Same for serialization:
>>>
 save(string0, file="string0.rda")
 load("string0.rda")
 string0
>>> [1] "ABCD"
>> Of these, I'd say the serialization is the only case where it would be
>> reasonable to fix the behaviour.  R depends on C run-time functions for
>> most of the string operations, and they'll stop at a null.  So if this
>> isn't documented behaviour, it should be, but it's not reasonable to
>> rewrite the C run-time string functions just to handle such weird
>> objects.  Functions like "grep" require thousands of lines of code, not
>> written by us, and in my opinion maintaining changes to it is not
>> something the R project should take on.
> 
> I was not (of course) suggesting to fix all the string manipulation functions.
> I'm just wondering why R would try to support embedded nuls in the first
> place given that they can only be a source of troubles.

I think this predates raw vectors, so this would have been the only way 
to handle strings with embedded nulls.  C has problems with those, but 
not all other languages do.

> 
> What about this:
> 
>   > string0
>   [1] "ABCD\0F"
>   > string0 == "ABCD"
>   [1] TRUE
> 
> string0 is obviously different from "ABCD"!

This is documented behaviour, from ?Comparison:

"When comparisons are made between character strings, parts of the
  strings after embedded 'nul' characters are ignored.  (This is
  necessary as the position of 'nul' in the collation sequence is
  undefined, and we want one of '<', '==' and '>' to be true for any
  comparison.)"

But notice

 > identical(string0, "ABCD")
[1] FALSE

This is documented as

  "Comparison of character strings allows for embedded 'nul'
  characters."

Duncan Murdoch

> 
> Maybe it's easier to change the semantic of rawToChar() so it doesn't return
> a string with embedded nuls. More generally speaking, base functions should
> always return "clean" strings.
> 
>> As to serialization:  there's a comment in the source that embedded
>> nulls are handled by it, and that's true up to R-patched, but not in
>> R-devel.  Looks like someone has introduced a bug.
>>
>> Duncan Murdoch
>>> One comment about the nchar man page:
>>>   'chars' The number of human-readable characters.
>>>
>>> "human-readable" seems to be used for "everything but a nul" here
>>> which can be confusing.
>>> For example one would generally think of ascii codes 1 to 31 as non
>>> "human-readable" but
>>> nchar() seems to disagree:
>>>
 string1 <- rawToChar(as.raw(1:31))
 string1
>>> [1]
>>> "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037"
>>>
 nchar(string1, type="chars")
>>> [1] 31
>> No, "human-readable" also has other meanings in multi-byte encodings. If
>> an e-acute is encoded in two bytes in your locale, it still only counts
>> as one human-readable character.
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sweep sanity checking?

2007-08-07 Thread Petr Savicky
Thanks to Martin Maechler for his comments, advice and for pointing
out the speed problem. Thanks also to Ben Bolker for tests of speed,
which confirm that for small arrays, a slow down by a factor of about
1.2 - 1.5 may occur. Now, I would like to present a new version of sweep,
which is simpler and has an option to avoid the test. This is expected
to be used in scripts, where the programmer is quite sure that the
usage is correct and speed is required. The new version differs from
the previous one in the following:

1. The option check.margin has a different meaning. It defaults to TRUE
   and it determines whether the test is performed or not.

2. Since check.margin has the meaning above, it cannot be used
   to select, which test should be performed. This depends on the
   type of STATS. The suggested sweep function contains two tests:
   - a vector test by Heather Turner, which is used, if STATS 
 has no dim attribute and, hence, is a vector (STATS should
 not be anything else than a vector or an array)
   - an array test used if STATS has dim attribute.
   The vector test allows some kinds of recycling, while the array test
   does not. Hence, in the most common case, where x is a matrix
   and STATS is a vector, if the user likes to be warned if the length
   of the vector is not exactly the right one, the following call is
   suggested: sweep(x,MARGIN,as.array(STATS)). Otherwise, a warning
   will be generated only if length(STATS) does not divide the specified
   dimension of x, which is nrow(x) (MARGIN=1) or ncol(x) (MARGIN=2).

3. If STATS is an array, then the test is more restrictive than in
   the previous version. It is now required that after deleting
   dimensions with one level, the remaining dimensions coincide.
   The previous version allowed additionally the cases, when dim(STATS)
   is a prefix of dim(x)[MARGIN], for example, if dim(STATS) = k1 and
   dim(x)[MARGIN] = c(k1,k2).

The code of the tests in the suggested sweep is based on the previous 
suggestions
 https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin Hankin
 https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather Turner
 https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben Bolker
with some further modifications.

The modification of sweep.Rd was prepared by Ben Bolker and me.

I would like to encourage everybody who likes to express his opinion
on the patch to do it now. In my opinion, the suggestion of the
new code stabilized in the sense that I will not modify it unless
there is a negative feedback.

A patch against R-devel_2007-08-06 is attached. It contains tabs. If they
are corrupted by email transfer, use the link
  http://www.cs.cas.cz/~savicky/R-devel/patch-sweep
which is an identical copy.

Petr Savicky.


--- R-devel_2007-08-06/src/library/base/R/sweep.R   2007-07-27 
17:51:13.0 +0200
+++ R-devel_2007-08-06-sweep/src/library/base/R/sweep.R 2007-08-07 
10:30:12.383672960 +0200
@@ -14,10 +14,29 @@
 #  A copy of the GNU General Public License is available at
 #  http://www.r-project.org/Licenses/
 
-sweep <- function(x, MARGIN, STATS, FUN = "-", ...)
+sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=TRUE, ...)
 {
 FUN <- match.fun(FUN)
 dims <- dim(x)
+   if (check.margin) {
+   dimmargin <- dims[MARGIN]
+   dimstats <- dim(STATS)
+   lstats <- length(STATS)
+   if (lstats > prod(dimmargin)) {
+   warning("length of STATS greater than the extent of 
dim(x)[MARGIN]")
+   } else if (is.null(dimstats)) { # STATS is a vector
+   cumDim <- c(1, cumprod(dimmargin))
+   upper <- min(cumDim[cumDim >= lstats])
+   lower <- max(cumDim[cumDim <= lstats])
+   if (upper %% lstats != 0 || lstats %% lower != 0)
+   warning("STATS does not recycle exactly across 
MARGIN")
+   } else {
+   dimmargin <- dimmargin[dimmargin > 1]
+   dimstats <- dimstats[dimstats > 1]
+   if (length(dimstats) != length(dimmargin) || 
any(dimstats != dimmargin))
+   warning("length(STATS) or dim(STATS) do not 
match dim(x)[MARGIN]")
+   }
+   }
 perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
 FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
 }
--- R-devel_2007-08-06/src/library/base/man/sweep.Rd2007-07-27 
17:51:35.0 +0200
+++ R-devel_2007-08-06-sweep/src/library/base/man/sweep.Rd  2007-08-07 
10:29:45.517757200 +0200
@@ -11,7 +11,7 @@
   statistic.
 }
 \usage{
-sweep(x, MARGIN, STATS, FUN="-", \dots)
+sweep(x, MARGIN, STATS, FUN="-", check.margin=TRUE, \dots)
 }
 \arguments{
   \item{x}{an array.}
@@ -22,8 +22,18 @@
 case of binary ope