Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-31 Thread Suharto Anggono Suharto Anggono via R-devel
Function 'aggregate.data.frame' in R has taken a different route. With 
drop=FALSE, the function is also applied to subset corresponding to combination 
of grouping variables that doesn't appear in the data (example 2 in 
https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Because 'default' is used only when simplification happens, putting 'default' 
after 'simplify' in the argument list may be more logical. Anyway, it doesn't 
affect call to 'tapply' because the argument 'default' must be specified by 
name.

With the code using
if(missing(default)) ,
I consider the stated default value of 'default',
default = NA ,
misleading because the code doesn't use it. Also,
tapply(1:3, 1:3, as.raw)
is not the same as
tapply(1:3, 1:3, as.raw, default = NA) .
The accurate statement is the code in
if(missing(default)) ,
but it involves the local variable 'ans'.

As far as I know, the result of function 'array' in is not a classed object and 
the default method of  `[<-` will be used in the 'tapply' code portion.

As far as I know, the result of 'lapply' is a list without class. So, 'unlist' 
applied to it uses the default method and the 'unlist' result is a vector or a 
factor.

With the change, the result of
tapply(1:3, 1:3, factor, levels=3:1)
is of mode "character". The value is from the internal code, not from the 
factor levels. It is worse than before the change, where it is really the 
internal code, integer.
In the documentation, the description of argument 'simplify' says: "If 'TRUE' 
(the default), then if 'FUN' always returns a scalar, 'tapply' returns an array 
with the mode of the scalar."

To initialize array, a zero-length vector can also be used.

For 'xtabs', I think that it is better if the result has storage mode "integer" 
if 'sum' results are of storage mode "integer", as in R 3.3.2. As 'default' 
argument for 'tapply', 'xtabs' can use 0L, or use 0L or 0 depending on storage 
mode of the summed quantity.


> Henrik Bengtsson 
> on Fri, 27 Jan 2017 09:46:15 -0800 writes:

> On Fri, Jan 27, 2017 at 12:34 AM, Martin Maechler
>  wrote:
>> 
>> > On Jan 26, 2017 07:50, "William Dunlap via R-devel"
>>  > wrote:
>> 
>> > It would be cool if the default for tapply's init.value
>> could be > FUN(X[0]), so it would be 0 for FUN=sum or
>> FUN=length, TRUE for > FUN=all, -Inf for FUN=max, etc.
>> But that would take time and would > break code for which
>> FUN did not work on length-0 objects.
>> 
>> > Bill Dunlap > TIBCO Software > wdunlap tibco.com
>> 
>> I had the same idea (after my first post), so I agree
>> that would be nice. One could argue it would take time
>> only if the user is too lazy to specify the value, and we
>> could use tryCatch(FUN(X[0]), error = NA) to safeguard
>> against those functions that fail for 0 length arg.
>> 
>> But I think the main reason for _not_ setting such a
>> default is back-compatibility.  In my proposal, the new
>> argument would not be any change by default and so all
>> current uses of tapply() would remain unchanged.
>> 
>>> Henrik Bengtsson  on
>>> Thu, 26 Jan 2017 07:57:08 -0800 writes:
>> 
>> > On a related note, the storage mode should try to match
>> ans[[1]] (or > unlist:ed and) when allocating 'ansmat' to
>> avoid coercion and hence a full > copy.
>> 
>> Yes, related indeed; and would fall "in line" with Bill's
>> idea.  OTOH, it could be implemented independently, by
>> something like
>> 
>> if(missing(init.value)) init.value <- if(length(ans))
>> as.vector(NA, mode=storage.mode(ans[[1]])) else NA

> I would probably do something like:

>   ans <- unlist(ans, recursive = FALSE, use.names = FALSE)
>   if (length(ans)) storage.mode(init.value) <- storage.mode(ans[[1]])
>   ansmat <- array(init.value, dim = extent, dimnames = namelist)

> instead.  That completely avoids having to use missing() and the value
> of 'init.value' will be coerced later if not done upfront.  use.names
> = FALSE speeds up unlist().

Thank you, Henrik.
That's a good idea to do the unlist() first, and with 'use.names=FALSE'.
I'll copy that.

On the other hand, "brutally" modifying  'init.value' (now called 'default')
even when the user has specified it is not acceptable I think.
You are right that it would be coerced anyway subsequently, but
the coercion will happen in whatever method of  `[<-` will be
appropriate.
Good S3 and S4 programmers will write such methods for their classes.

For that reason, I'm even more conservative now, only fiddle in
case of an atomic 'ans' and make use of the corresponding '['
method rather than as.vector(.) ... because that will fulfill
the following new regression test {not fulfilled in current R}:

identical(tapply(1:3, 1:3, as.raw),
  array(as.raw(1:3), 3L, dimnames=list(1:3)))

Also, I've done a few more things -- treating if(.) .

[Rd] rnbinom Returns Error that says optional argument is missing

2017-01-31 Thread Thomas Roh
I am trying to reset the default arguments in the rnbinom function with the
following example code:

params <- c("size" = 1, "mu" = 1)
formals(rnbinom)[names(params)] <- params
rnbinom(n = 10)

It returns the following:

Error in rnbinom(n = 10) : argument "prob" is missing, with no default

If I set the defaults with this code:

params <- c("size" = 1, "prob" = .5)
formals(rnbinom)[names(params)] <- params
rnbinom(n = 10)

The function works correctly. The documentation specifies that you can set
mu or prob with size. I understand that the problem lies in default
arguments are evaluated as missing, but it seems unintentional that setting
"prob" and "size" defaults will actually evaluate.

Here is the function call:

function (n, size, prob, mu)

{

if (!missing(mu)) {

if (!missing(prob))

stop("'prob' and 'mu' both specified")

.Call(C_rnbinom_mu, n, size, mu)

}

else .Call(C_rnbinom, n, size, prob)

}





-- 
Thomas Roh
thms...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-31 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Tue, 31 Jan 2017 15:43:53 + writes:

> Function 'aggregate.data.frame' in R has taken a different route. With 
drop=FALSE, the function is also applied to subset corresponding to combination 
of grouping variables that doesn't appear in the data (example 2 in 
https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Interesting point (I couldn't easily find 'the example 2' though).
However, aggregate.data.frame() is a considerably more
sophisticated function and one goal was to change tapply() as
little as possible for compatibility (and maintenance!) reasons .

> Because 'default' is used only when simplification happens, putting 'default' 
> after 'simplify' in the argument list may be more logical. 

Yes, from this point of view, you are right; I had thought about
that too; on the other hand, it belongs "closely" to the 'FUN'
and I think that's why I had decided not to change the proposal..

> Anyway, it doesn't affect call to 'tapply' because the argument 'default' 
> must be specified by name.

Exactly.. so we keep the order as is.

> With the code using
>if(missing(default)) ,
> I consider the stated default value of 'default',
>default = NA ,
> misleading because the code doesn't use it. 

I know and I also had thought about it and decided to keep it 
in the spirit of "self documentation" because  "in spirit", the
default still *is* NA.

> Also,
>  tapply(1:3, 1:3, as.raw)
> is not the same as
>  tapply(1:3, 1:3, as.raw, default = NA) .
> The accurate statement is the code in
> if(missing(default)) ,
> but it involves the local variable 'ans'.

exactly.  But putting that whole expression in there would look
confusing to those using  str(tapply), args(tapply) or similar
inspection to quickly get a glimpse of the function user "interface".
That's why we typically don't do that and rather slightly cheat
with the formal default, for the above "didactical" purposes.

If you are puristic about this, then missing() should almost never
be used when the function argument has a formal default.

I don't have a too strong opinion here, and we do have quite a
few other cases, where the formal default argument is not always
used because of   if(missing(.))  clauses.

I think I could be convinced to drop the '= NA' from the formal
argument list..


> As far as I know, the result of function 'array' in is not a classed 
object and the default method of  `[<-` will be used in the 'tapply' code 
portion.

> As far as I know, the result of 'lapply' is a list without class. So, 
'unlist' applied to it uses the default method and the 'unlist' result is a 
vector or a factor.

You may be right here
  ((or not:  If a package author makes array() into an S3 generic and defines
S3method(array, *) and she or another make tapply() into a
generic with methods,  are we really sure that this code
would not be used ??))

still, the as.raw example did not easily work without a warning
when using as.vector() .. or similar.

> With the change, the result of

> tapply(1:3, 1:3, factor, levels=3:1)

> is of mode "character". The value is from the internal code, not from the 
factor levels. It is worse than before the change, where it is really the 
internal code, integer.

I agree that this change is not desirable.
One could argue that it was quite a "lucky coincidence" that the previous
code returned the internal integer codes though..


> In the documentation, the description of argument 'simplify' says: "If 
'TRUE' (the default), then if 'FUN' always returns a scalar, 'tapply' returns 
an array with the mode of the scalar."


> To initialize array, a zero-length vector can also be used.

yes, of course; but my  ans[0L][1L]  had the purpose to get the
correct mode specific version of NA .. which works for raw (by
getting '00' because "raw" has *no* NA!).

So it seems I need an additional   !is.factor(ans)  there ...
a bit ugly.


-

> For 'xtabs', I think that it is better if the result has storage mode 
> "integer" if 'sum' results are of storage mode "integer", as in R 3.3.2. 

you are right, that *is* preferable

>  As 'default' argument for 'tapply', 'xtabs' can use 0L, or use 0L or 0 
> depending on storage mode of the summed quantity.

indeed, that will be an improvement there!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rnbinom Returns Error that says optional argument is missing

2017-01-31 Thread Joris Meys
Hi Thomas,

This seems fully expected behaviour. Obviously unspecified arguments are
evaluated as missing regardless of a default value. So if you set mu as a
default, the function will call C_rnbinom with n, size and prob. As prob is
not specified you get the error one would expect. Specifying a default
value for prob also makes rnbinom call C_rnbinom, but in this case there is
a prob value so it works.

I don't know what you consider "unintentional", but everything works as
expected and imho as intended as well. Changing formals to a function comes
with no guarantees, and setting a default value for an argument that
previously had none, comes with the risk of breaking things (like you
noticed)

If you want to use a default value for mu, you have to change the body of
the function as well, eg:

> formals(rnbinom)[c('size','mu')] <- c(1,1)
> body(rnbinom) <- quote(.Call(C_rnbinom_mu, n, size, mu))
> rnbinom(10)
 [1] 0 4 2 0 3 0 4 0 0 2

That's really hacking away and something I would never suggest to people,
but it works.

Hope this explains
Cheers
Joris


On Tue, Jan 31, 2017 at 5:39 PM, Thomas Roh  wrote:

> I am trying to reset the default arguments in the rnbinom function with the
> following example code:
>
> params <- c("size" = 1, "mu" = 1)
> formals(rnbinom)[names(params)] <- params
> rnbinom(n = 10)
>
> It returns the following:
>
> Error in rnbinom(n = 10) : argument "prob" is missing, with no default
>
> If I set the defaults with this code:
>
> params <- c("size" = 1, "prob" = .5)
> formals(rnbinom)[names(params)] <- params
> rnbinom(n = 10)
>
> The function works correctly. The documentation specifies that you can set
> mu or prob with size. I understand that the problem lies in default
> arguments are evaluated as missing, but it seems unintentional that setting
> "prob" and "size" defaults will actually evaluate.
>
> Here is the function call:
>
> function (n, size, prob, mu)
>
> {
>
> if (!missing(mu)) {
>
> if (!missing(prob))
>
> stop("'prob' and 'mu' both specified")
>
> .Call(C_rnbinom_mu, n, size, mu)
>
> }
>
> else .Call(C_rnbinom, n, size, prob)
>
> }
>
>
>
>
>
> --
> Thomas Roh
> thms...@gmail.com
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unexpected EOF in R-patched_2017-01-30

2017-01-31 Thread Avraham Adler
Hello.

When trying to unpack today's version of R-patched, I get the following error:

C:\R>tar -xf R-patched_2017-01-30.tar.gz

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

I got the same error for R-patched_2017-01-30.tar.gz but not for R-3.3.2.tar.gz.

Thank you,

Avi

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected EOF in R-patched_2017-01-30

2017-01-31 Thread peter dalgaard

> On 31 Jan 2017, at 18:56 , Avraham Adler  wrote:
> 
> Hello.
> 
> When trying to unpack today's version of R-patched,

>From which source? The files from cran.r-project.org seems OK, both those in 
>src/base-prerelease and those from ETHZ. Also, is it not "tar -xfz" when 
>reading a compressed file?

-pd 

> I get the following error:
> 
> C:\R>tar -xf R-patched_2017-01-30.tar.gz
> 
> gzip: stdin: unexpected end of file
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
> 
> I got the same error for R-patched_2017-01-30.tar.gz but not for 
> R-3.3.2.tar.gz.
> 
> Thank you,
> 
> Avi
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected EOF in R-patched_2017-01-30

2017-01-31 Thread Avraham Adler
On Tue, Jan 31, 2017 at 3:30 PM, peter dalgaard  wrote:
>
>> On 31 Jan 2017, at 18:56 , Avraham Adler  wrote:
>>
>> Hello.
>>
>> When trying to unpack today's version of R-patched,
>
> From which source? The files from cran.r-project.org seems OK, both those in 
> src/base-prerelease and those from ETHZ. Also, is it not "tar -xfz" when 
> reading a compressed file?
>
> -pd

>From 

Also, while passing z is not in the instructions given in Installation
and Administration [1], I tried passing -xzf and it did not work. I
believe f has to be last if the file name follows immediately.

[1]  


Thanks,

Avi

>> I get the following error:
>>
>> C:\R>tar -xf R-patched_2017-01-30.tar.gz
>>
>> gzip: stdin: unexpected end of file
>> tar: Unexpected EOF in archive
>> tar: Unexpected EOF in archive
>> tar: Error is not recoverable: exiting now
>>
>> I got the same error for R-patched_2017-01-30.tar.gz but not for 
>> R-3.3.2.tar.gz.
>>
>> Thank you,
>>
>> Avi
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel