Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2017-06-06 Thread Martin Maechler
> Hervé Pagès 
> on Fri, 2 Jun 2017 04:05:15 -0700 writes:

> Hi, I have a long numeric vector 'xx' and I want to use
> sum() to count the number of elements that satisfy some
> criteria like non-zero values or values lower than a
> certain threshold etc...

> The problem is: sum() returns an NA (with a warning) if
> the count is greater than 2^31. For example:

>> xx <- runif(3e9) sum(xx < 0.9)
>[1] NA Warning message: In sum(xx < 0.9) : integer
> overflow - use sum(as.numeric(.))

> This already takes a long time and doing
> sum(as.numeric(.)) would take even longer and require
> allocation of 24Gb of memory just to store an intermediate
> numeric vector made of 0s and 1s. Plus, having to do
> sum(as.numeric(.)) every time I need to count things is
> not convenient and is easy to forget.

> It seems that sum() on a logical vector could be modified
> to return the count as a double when it cannot be
> represented as an integer.  Note that length() already
> does this so that wouldn't create a precedent. Also and
> FWIW prod() avoids the problem by always returning a
> double, whatever the type of the input is (except on a
> complex vector).

> I can provide a patch if this change sounds reasonable.

This sounds very reasonable,  thank you Hervé, for the report,
and even more for a (small) patch.

Martin

> Cheers, H.

> -- 
> Hervé Pagès

> Program in Computational Biology Division of Public Health
> Sciences Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
> 98109-1024

> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:
> (206) 667-1319

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-06 Thread Martin Maechler
> Kirill Müller 
> on Mon, 5 Jun 2017 17:30:20 +0200 writes:

> Hi I've noted a minor inconsistency in the documentation:
> Current R-exts reads

> s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), &ipx);

> but I believe it has to be

> PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx);

> because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

   SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.

Martin

> Best regards
> Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)

2017-06-06 Thread Michael Lawrence
Thanks for the report. The issue is that one cannot set special attributes
like names, dim, dimnames, etc on S4 objects. I was aready working on this
and will have a fix soon.

> a2 <- new("A2")
> dim(a2) <- c(2, 3)
Error in dim(a2) <- c(2, 3) : invalid first argument


On Mon, Jun 5, 2017 at 6:08 PM, Hervé Pagès  wrote:

> Hi,
>
> It's nice to be able to define S4 classes with slots that correspond
> to standard attributes:
>
>   setClass("A1", slots=c(names="character"))
>   setClass("A2", slots=c(dim="integer"))
>   setClass("A3", slots=c(dimnames="list"))
>
> By doing this, one gets a few methods for free:
>
>   a1 <- new("A1", names=letters[1:3])
>   names(a1) # "a" "b" "c"
>   a2 <- new("A2", dim=4:3)
>   nrow(a2)  # 4
>   a3 <- new("A3", dimnames=list(NULL, letters[1:3]))
>   colnames(a3)  # "a" "b" "c"
>
> However, when it comes to subclassing, some of these slots cause
> problems. I can extend A1:
>
>   setClass("B1", contains="A1")
>
> but trying to extend A2 or A3 produces an error (with a non-informative
> message in the 1st case and a somewhat obscure one in the 2nd):
>
>   setClass("B2", contains="A2")
>   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
>   #   invalid first argument
>
>   setClass("B3", contains="A3")
>   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
>   #   'dimnames' applied to non-array
>
> So it seems that the presence of a "dim" or "dimnames" slot prevents a
> class from being extended. Is this expected? I couldn't find anything
> in TFM about this. Sorry if I missed it.
>
> Thanks,
> H.
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [bug] droplevels() also drop object attributes (comment…)

2017-06-06 Thread Martin Maechler
> Martin Maechler 
> on Tue, 16 May 2017 11:01:23 +0200 writes:

> Serge Bibauw 
> on Mon, 15 May 2017 11:59:32 -0400 writes:

>> Hi,

>> Just reporting a small bug… not really a big deal, but I
>> don’t think that is intended: droplevels() also drops all
>> object’s attributes.

> Yes.  The help page for droplevels (or the simple
> definition of 'droplevels.factor') clearly indicate that
> the method for factors is really just a call to factor(x,
> exclude = *)

> and that _is_ quite an important base function whose
> semantic should not be changed lightly. Still, let's
> continue :

> Looking a bit, I see that the current behavior of factor()
> {and hence droplevels} has been unchanged in this respect
> for the whole history of R, well, at least for more than
> 17 years (R 1.0.1, April 2000).

> I'd agree there _is_ a bug, at least in the documentation
> which does *not* mention that currently, all attributes
> are dropped but "names", "levels" (and "class").

> OTOH, factor() would only need a small change to make it
> preserve all attributes (but "class" and "levels" which
> are set explicitly).

> I'm sure this will break some checks in some packages.  Is
> it worth it?

> e.g., our own R  QC checks currently check (the printing of) the
> following (in tests/reg-tests-2.R ):

>   > ## some tests of factor matrices
>   > A <- factor(7:12)
>   > dim(A) <- c(2, 3)
>   > A
>[,1] [,2] [,3]
>   [1,] 7911  
>   [2,] 810   12  
>   Levels: 7 8 9 10 11 12
>   > str(A)
>factor [1:2, 1:3] 7 8 9 10 ...
>- attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ...
>   > A[, 1:2]
>[,1] [,2]
>   [1,] 79   
>   [2,] 810  
>   Levels: 7 8 9 10 11 12
>   > A[, 1:2, drop=TRUE]
>   [1] 7  8  9  10
>   Levels: 7 8 9 10
> 
> with the proposed change to factor(),
> the last call would change its result:
> 
>   > A[, 1:2, drop=TRUE]
>[,1] [,2]
>   [1,] 79   
>   [2,] 810  
>   Levels: 7 8 9 10

> because 'drop=TRUE' calls factor(..) and that would also
> preserve the "dim" attribute.  I would think that the
> changed behavior _is_ better, and is also according to
> documentation, because the help page for [.factor explains
> that 'drop = TRUE' drops levels, but _not_ that it
> transforms a factor matrix into a factor (vector).

> Martin

I'm finally coming back to this.
It still seems to make sense to change factor() and hence
droplevels() behavior here, and plan to commit this change
within a day.

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Unexpected interaction between missing() and a blank expression

2017-06-06 Thread Hong Ooi via R-devel
This is something I came across just now:

f <- function(x) missing(x)
z <- quote(expr=)

f(z)
# TRUE

The object z contains the equivalent of a missing function argument. Another 
method for generating a missing arg would be alist(a=)$a .

Should f(z) return TRUE in this case? I interpret missing() as checking whether 
the parent function call had a value supplied for the given argument. Here, I 
have supplied an argument (z), so I would expect f to return FALSE.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unexpected interaction between missing() and a blank expression

2017-06-06 Thread peter dalgaard

> On 6 Jun 2017, at 18:50 , Hong Ooi via R-devel  wrote:
> 
> This is something I came across just now:
> 
> f <- function(x) missing(x)
> z <- quote(expr=)
> 
> f(z)
> # TRUE
> 
> The object z contains the equivalent of a missing function argument. Another 
> method for generating a missing arg would be alist(a=)$a .
> 
> Should f(z) return TRUE in this case? I interpret missing() as checking 
> whether the parent function call had a value supplied for the given argument. 
> Here, I have supplied an argument (z), so I would expect f to return FALSE.

Missing values propagate in R, e.g.

> f <- function(x) missing(x)
> g <- function(y) f(y)
> g()
[1] TRUE

This is technically done by having a "missing" object, which is not really 
intended to be visible to users, but pops up in a few esoteric constructions. 
Trying do anything constructive with the missing object usually leads to grief, 
or at least surprises, e.g.:

> z <-quote(expr=)
> z <- z
Error: argument "z" is missing, with no default

-pd
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)

2017-06-06 Thread Michael Lawrence
I've fixed this and will commit soon.

Disregard my dim<-() example; that behaves as expected (the class needs a
dim<-() method).

Michael

On Tue, Jun 6, 2017 at 5:16 AM, Michael Lawrence  wrote:

> Thanks for the report. The issue is that one cannot set special attributes
> like names, dim, dimnames, etc on S4 objects. I was aready working on this
> and will have a fix soon.
>
> > a2 <- new("A2")
> > dim(a2) <- c(2, 3)
> Error in dim(a2) <- c(2, 3) : invalid first argument
>
>
> On Mon, Jun 5, 2017 at 6:08 PM, Hervé Pagès  wrote:
>
>> Hi,
>>
>> It's nice to be able to define S4 classes with slots that correspond
>> to standard attributes:
>>
>>   setClass("A1", slots=c(names="character"))
>>   setClass("A2", slots=c(dim="integer"))
>>   setClass("A3", slots=c(dimnames="list"))
>>
>> By doing this, one gets a few methods for free:
>>
>>   a1 <- new("A1", names=letters[1:3])
>>   names(a1) # "a" "b" "c"
>>   a2 <- new("A2", dim=4:3)
>>   nrow(a2)  # 4
>>   a3 <- new("A3", dimnames=list(NULL, letters[1:3]))
>>   colnames(a3)  # "a" "b" "c"
>>
>> However, when it comes to subclassing, some of these slots cause
>> problems. I can extend A1:
>>
>>   setClass("B1", contains="A1")
>>
>> but trying to extend A2 or A3 produces an error (with a non-informative
>> message in the 1st case and a somewhat obscure one in the 2nd):
>>
>>   setClass("B2", contains="A2")
>>   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
>>   #   invalid first argument
>>
>>   setClass("B3", contains="A3")
>>   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
>>   #   'dimnames' applied to non-array
>>
>> So it seems that the presence of a "dim" or "dimnames" slot prevents a
>> class from being extended. Is this expected? I couldn't find anything
>> in TFM about this. Sorry if I missed it.
>>
>> Thanks,
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-06 Thread Kirill Müller



On 06.06.2017 10:07, Martin Maechler wrote:

Kirill Müller 
 on Mon, 5 Jun 2017 17:30:20 +0200 writes:

 > Hi I've noted a minor inconsistency in the documentation:
 > Current R-exts reads

 > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), &ipx);

 > but I believe it has to be

 > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx);

 > because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.
Thanks, Martin, this sounds reasonable. I've put together a patch for 
review [1], a diff for applying to SVN (via `cat | patch -p1`) would be 
[2]. The code compiles on my system.



-Kirill


[1] https://github.com/krlmlr/r-source/pull/5/files

[2] https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff




Martin

 > Best regards
 > Kirill


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Philosophy behind converting Fortran to C for use in R

2017-06-06 Thread Avraham Adler
Hello.

This is not a question about a bug or even best practices; rather I'm
trying to understand the philosophy or theory as to why certain
portions of the R codebase are written as they are. If this question
is better posed elsewhere, please point me in the proper direction.

In the thread about the issues with the Tukey line, Martin said [1]:

> when this topic came up last (for me) in Dec. 2014, I did spend about 2 days 
> work (or more?)
> to get the FORTRAN code from the 1981 - book (which is abbreviated the "ABC 
> of EDA")
> from a somewhat useful OCR scan into compilable Fortran code and then f2c'ed,
> wrote an R interface function found problems…

I have seen this in the R source code and elsewhere, that native
Fortran is converted to C via f2c and then run as C within R. This is
notwithstanding R's ability to use Fortran, either directly through
.Fortran() [2] or via .Call() using simple helper C-wrappers [3].

I'm curious as to the reason. Is it because much of the code was
written before Fortran 90 compilers were freely available? Does it
help with maintenance or make debugging easier? Is it faster or more
likely to compile cleanly?

Thank you,

Avi

[1] https://stat.ethz.ch/pipermail/r-devel/2017-May/074363.html
[2] Such as kmeans does for the Hartigan-Wong method in the stats package
[2] Such as the mvtnorm package does

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Philosophy behind converting Fortran to C for use in R

2017-06-06 Thread William Dunlap via R-devel
Here are three reasons for converting Fortran code, especially older
Fortran code, to C:

1. The C-Fortran interface is not standardized.  Various Fortran compilers
pass logical and character arguments in various ways.  Various Fortran
compilers mangle function and common block names in variousl ways.  You can
avoid that problem by restricting R to using a certain Fortran compiler,
but that can make porting R to a new platform difficult.

2. By default, variables in Fortran routines are not allocated on the
stack, but are statically allocated, making recursion hard.

3. New CS graduates tend not to know Fortran.

(There are good reasons for not translating as well, risk and time being
the main ones.)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Jun 6, 2017 at 1:27 PM, Avraham Adler 
wrote:

> Hello.
>
> This is not a question about a bug or even best practices; rather I'm
> trying to understand the philosophy or theory as to why certain
> portions of the R codebase are written as they are. If this question
> is better posed elsewhere, please point me in the proper direction.
>
> In the thread about the issues with the Tukey line, Martin said [1]:
>
> > when this topic came up last (for me) in Dec. 2014, I did spend about 2
> days work (or more?)
> > to get the FORTRAN code from the 1981 - book (which is abbreviated the
> "ABC of EDA")
> > from a somewhat useful OCR scan into compilable Fortran code and then
> f2c'ed,
> > wrote an R interface function found problems…
>
> I have seen this in the R source code and elsewhere, that native
> Fortran is converted to C via f2c and then run as C within R. This is
> notwithstanding R's ability to use Fortran, either directly through
> .Fortran() [2] or via .Call() using simple helper C-wrappers [3].
>
> I'm curious as to the reason. Is it because much of the code was
> written before Fortran 90 compilers were freely available? Does it
> help with maintenance or make debugging easier? Is it faster or more
> likely to compile cleanly?
>
> Thank you,
>
> Avi
>
> [1] https://stat.ethz.ch/pipermail/r-devel/2017-May/074363.html
> [2] Such as kmeans does for the Hartigan-Wong method in the stats package
> [2] Such as the mvtnorm package does
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprisingly, S4 classes with a "dim" or "dimnames" slot are final (in the Java sense)

2017-06-06 Thread Hervé Pagès

Thanks Michael for taking care of this.  H.

On 06/06/2017 11:48 AM, Michael Lawrence wrote:

I've fixed this and will commit soon.

Disregard my dim<-() example; that behaves as expected (the class needs
a dim<-() method).

Michael

On Tue, Jun 6, 2017 at 5:16 AM, Michael Lawrence mailto:micha...@gene.com>> wrote:

Thanks for the report. The issue is that one cannot set special
attributes like names, dim, dimnames, etc on S4 objects. I was
aready working on this and will have a fix soon.

 > a2 <- new("A2")
 > dim(a2) <- c(2, 3)
Error in dim(a2) <- c(2, 3) : invalid first argument


On Mon, Jun 5, 2017 at 6:08 PM, Hervé Pagès mailto:hpa...@fredhutch.org>> wrote:

Hi,

It's nice to be able to define S4 classes with slots that correspond
to standard attributes:

   setClass("A1", slots=c(names="character"))
   setClass("A2", slots=c(dim="integer"))
   setClass("A3", slots=c(dimnames="list"))

By doing this, one gets a few methods for free:

   a1 <- new("A1", names=letters[1:3])
   names(a1) # "a" "b" "c"
   a2 <- new("A2", dim=4:3)
   nrow(a2)  # 4
   a3 <- new("A3", dimnames=list(NULL, letters[1:3]))
   colnames(a3)  # "a" "b" "c"

However, when it comes to subclassing, some of these slots cause
problems. I can extend A1:

   setClass("B1", contains="A1")

but trying to extend A2 or A3 produces an error (with a
non-informative
message in the 1st case and a somewhat obscure one in the 2nd):

   setClass("B2", contains="A2")
   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
   #   invalid first argument

   setClass("B3", contains="A3")
   # Error in attr(prototype, slotName) <- attr(pri, slotName) :
   #   'dimnames' applied to non-array

So it seems that the presence of a "dim" or "dimnames" slot
prevents a
class from being extended. Is this expected? I couldn't find
anything
in TFM about this. Sorry if I missed it.

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org 
Phone: (206) 667-5791 
Fax: (206) 667-1319 

__
R-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel