[Rd] object.size vs lobstr::obj_size

2020-02-19 Thread Stefan Schreiber
I have posted this question on R-help where it was suggested to me
that I might get a better response on R-devel. So far I have gotten no
response. The post I am talking about is here:
https://stat.ethz.ch/pipermail/r-help/2020-February/465700.html

My apologies for cross-posting, which I am aware is impolite and I
should have posted on R-devel in the first place - but I wasn't sure.

Here is my question again:

I am currently working through Advanced R by H. Wickham and came
across the `lobstr::obj_size` function which appears to calculate the
size of an object by taking into account whether the same object has
been referenced multiple times, e.g.

x <- runif(1e6)
y <- list(x, x, x)
lobstr::obj_size(y)
# 8,000,128 B

# versus:
object.size(y)
# 24000224 bytes

Reading through `?object.size` in the "Details" it reads: [...] but
does not detect if elements of a list are shared [...].

My questions are:

(1) is the result of `obj_size()` the "correct" one when it comes to
actual size used in memory?

(2) And if yes, why wouldn't `object.size()` be updated to reflect the
more precise calculation of an object in question similar to
`obj_size()`?

There are probably valid reasons for this and any insight would be
greatly appreciated.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dimnames incoherence?

2020-02-19 Thread Serguei Sokol

Hi,

I was bitten by a little incoherence in dimnames assignment or may be I 
missed some point.
Here is the case. If I assign row names via dimnames(a)[[1]], when 
nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1 
it's OK. Is one of this case works unexpectedly? Both? Neither?


a=as.matrix(1)
dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

aa=as.matrix(1:2)
dimnames(aa)[[1]]=c("a", "b") # OK

In the second case, dimnames(aa) is not a list (like in the first case) 
but it works.

I would expect that the both work or neither.

Your thoughts are welcome.
Best,
Serguei.

PS the same apply for dimnames(a)[[2]]<-.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Mageia 7

Matrix products: default
BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

locale:
 [1] LC_CTYPE=fr_FR.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=fr_FR.UTF-8    LC_COLLATE=fr_FR.UTF-8
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
 [7] LC_PAPER=fr_FR.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets methods
[8] base

other attached packages:
[1] multbxxc_1.0.1    rmumps_5.2.1-11
[3] arrApply_2.1  RcppArmadillo_0.9.800.4.0
[5] Rcpp_1.0.3    slam_0.1-47
[7] nnls_1.4

loaded via a namespace (and not attached):
[1] compiler_3.6.1   tools_3.6.1  codetools_0.2-16

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dimnames incoherence?

2020-02-19 Thread Martin Maechler
> Serguei Sokol 
> on Wed, 19 Feb 2020 15:21:21 +0100 writes:

> Hi,
> I was bitten by a little incoherence in dimnames assignment or may be I 
> missed some point.
> Here is the case. If I assign row names via dimnames(a)[[1]], when 
> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1 
> it's OK. Is one of this case works unexpectedly? Both? Neither?

> a=as.matrix(1)
> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

> aa=as.matrix(1:2)
> dimnames(aa)[[1]]=c("a", "b") # OK

> In the second case, dimnames(aa) is not a list (like in the first case) 
> but it works.
> I would expect that the both work or neither.

I agree (even though I'm strongly advising people to use '<-'
instead of '=');
which in this case helps you get the name of the function really
involved:  It is  `dimnames<-`  (which is implemented in C
entirely, for matrices and other arrays).


> Your thoughts are welcome.

I think we'd be happy if you report this formally on R's
bugzilla - https://bugs.r-project.org/ - as a bug.

  --> https://www.r-project.org/bugs.html

>From reading bugs.html you note that you should ask for an account there;
as I'm one of the people who get such request by e-mail, in this
case, I can do it directly (if you confirm you'd want in a
private e-mail).

> Best,
> Serguei.

> PS the same apply for dimnames(a)[[2]]<-.

(of course)

NB *and*  importantly, the buglet is still in current versions of R 

Best,
Martin

>> sessionInfo()
> R version 3.6.1 (2019-07-05)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Mageia 7

> Matrix products: default
> BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

> locale:
>  [1] LC_CTYPE=fr_FR.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=fr_FR.UTF-8    LC_COLLATE=fr_FR.UTF-8
>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>  [7] LC_PAPER=fr_FR.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets methods
> [8] base

> other attached packages:
> [1] multbxxc_1.0.1    rmumps_5.2.1-11
> [3] arrApply_2.1  RcppArmadillo_0.9.800.4.0
> [5] Rcpp_1.0.3    slam_0.1-47
> [7] nnls_1.4

> loaded via a namespace (and not attached):
> [1] compiler_3.6.1   tools_3.6.1  codetools_0.2-16

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dimnames incoherence?

2020-02-19 Thread William Dunlap via R-devel
How far would you like to go with the automatic creation of dimnames in
nested replacement operations on arrays?  It currently works nicely with [<-
   > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[3] <- list("One")
   > str(a)
num[1:2, 0 , 1]
- attr(*, "dimnames")=List of 3
 ..$ : NULL
 ..$ : NULL
 ..$ : chr "One"

It works most of the time (except for length=1) for [[<-
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[1]] <- c("X1","X2")
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[2]] <- character()
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[3]] <- "Z1"
  Error in dimnames(a)[[3]] <- "Z1" : 'dimnames' must be a list

It does not work at all for names<-.
> a <- array(numeric(), dim=c(2,0,1)); names(dimnames(a)) <- c("X","Y","Z")
Error in names(dimnames(a)) <- c("X", "Y", "Z") :
  attempt to set an attribute on NULL
> a <- array(numeric(), dim=c(2,0,1)); dimnames(a)<-vector("list",3);
names(dimnames(a)) <- c("X","Y","Z")
> str(a)
 num[1:2, 0 , 1]
 - attr(*, "dimnames")=List of 3
  ..$ X: NULL
  ..$ Y: NULL
  ..$ Z: NULL

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 19, 2020 at 6:24 AM Serguei Sokol 
wrote:

> Hi,
>
> I was bitten by a little incoherence in dimnames assignment or may be I
> missed some point.
> Here is the case. If I assign row names via dimnames(a)[[1]], when
> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
> it's OK. Is one of this case works unexpectedly? Both? Neither?
>
> a=as.matrix(1)
> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
>
> aa=as.matrix(1:2)
> dimnames(aa)[[1]]=c("a", "b") # OK
>
> In the second case, dimnames(aa) is not a list (like in the first case)
> but it works.
> I would expect that the both work or neither.
>
> Your thoughts are welcome.
> Best,
> Serguei.
>
> PS the same apply for dimnames(a)[[2]]<-.
>
>  > sessionInfo()
> R version 3.6.1 (2019-07-05)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Mageia 7
>
> Matrix products: default
> BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
>
> locale:
>   [1] LC_CTYPE=fr_FR.UTF-8   LC_NUMERIC=C
>   [3] LC_TIME=fr_FR.UTF-8LC_COLLATE=fr_FR.UTF-8
>   [5] LC_MONETARY=fr_FR.UTF-8LC_MESSAGES=fr_FR.UTF-8
>   [7] LC_PAPER=fr_FR.UTF-8   LC_NAME=C
>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] multbxxc_1.0.1rmumps_5.2.1-11
> [3] arrApply_2.1  RcppArmadillo_0.9.800.4.0
> [5] Rcpp_1.0.3slam_0.1-47
> [7] nnls_1.4
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.1   tools_3.6.1  codetools_0.2-16
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dimnames incoherence?

2020-02-19 Thread Martin Maechler
> Martin Maechler 
> on Wed, 19 Feb 2020 18:06:57 +0100 writes:

> Serguei Sokol 
> on Wed, 19 Feb 2020 15:21:21 +0100 writes:

>> Hi,
>> I was bitten by a little incoherence in dimnames assignment or may be I 
>> missed some point.
>> Here is the case. If I assign row names via dimnames(a)[[1]], when 
>> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1 
>> it's OK. Is one of this case works unexpectedly? Both? Neither?

>> a=as.matrix(1)
>> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

>> aa=as.matrix(1:2)
>> dimnames(aa)[[1]]=c("a", "b") # OK

>> In the second case, dimnames(aa) is not a list (like in the first case) 
>> but it works.
>> I would expect that the both work or neither.

> I agree (even though I'm strongly advising people to use '<-'
> instead of '=');
> which in this case helps you get the name of the function really
> involved:  It is  `dimnames<-`  (which is implemented in C
> entirely, for matrices and other arrays).

As a matter of fact, I wrote too quickly, the culprit here is
the  `[[<-`  function (rather than `dimnames<-`),
which has a special "inconsistency" feature when used to "add to NULL";
almost surely inherited from S,  but I now think we should
consider dropping on the occasion of aiming for  R 4.0.0 :

It's documented in ?Extract  that  length 1  `[[.]]`-assignment works
specially for NULL (and dimnames(.) are NULL here).

Note you need to read and understand one of the tougher sections
in the official  'R Language Definition'  Manual,
section -- 3.4.4 Subset assignment ---
i.e.,
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment

notably this part: 

Nesting of complex assignments is evaluated recursively 

 names(x)[3] <- "Three"

is equivalent to

 `*tmp*` <- x
 x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
 rm(`*tmp*`)

and then, apply this to ourdimnames(a)[[1]] <- "a"
and so  replace

 -  'names<-' by 'dimnames<-'
 -  '[<-' by '[[<-'

--

Here is the rest of my analysis as valid R code
{this is not new, Peter Dalgaard had explained this 10 or 20
 years ago to a mailing list audience IIRC} : 

## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` :

`[[<-`(NULL, 1,   "a" ) # gives  "a"  (*not* a list)
`[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b"))  !!

##==> in C code: in  subassign.c  [ ~/R/D/r-devel/R/src/main/subassign.c ]
##==> function (~ 340 lines)
##do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho)
## has
"
line svn r.  svn auth. c.o.d.e...
 --  - --
1741   4166  ihaka if (isNull(x)) {
1742  45446 ripley if (isNull(y)) {
1743  76166   luke UNPROTECT(2); /* args, y */
1744   4166  ihaka return x;
1745  45446 ripley }
1746  35680murdoch if (length(y) == 1)
1747  68094   luke x = allocVector(TYPEOF(y), 0);
1748  24954 ripley else
1749  68094   luke x = allocVector(VECSXP, 0);
1750   1820  ihaka }
 --  - --
"
## so clearly, in case the value is of length 1, no list is created .

## For dimnames<-  Replacing NULL by list()  should be done in both cases , and 
then things work :
`[[<-`(list(), 1,   "a" ) # gives list( "a" )
`[[<-`(list(), 1, c("a","b")) # gives list(c("a","b"))  !!

## but the problem here is that  `[[<-` at this time in the game
## does *not* know that it comes from dimnames<- 

---

If we change the behavior  NULL--[[--assignment from

 `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)

to

 `[[<-`(NULL, 1, "a" ) # gives  list("a")


then we have more consistency there *and* your bug is fixed too.
Of course, in other situations back-compatibility would be
broken as well.

At the moment, I think we (R Core Team) should consider doing
that here.

Martin



>> Your thoughts are welcome.

> I think we'd be happy if you report this formally on R's
> bugzilla - https://bugs.r-project.org/ - as a bug.

--> https://www.r-project.org/bugs.html

>> From reading bugs.html you note that you should ask for
>> an account there;
> as I'm one of the people who get such request by e-mail,
> in this case, I can do it directly (if you confirm you'd
> want in a private e-mail).

>> Best, Serguei.

>> PS the same apply for dimnames(a)[[2]]<-.

> (of course)

> NB *and* importantly, the buglet is still in current
> versions of R

> Best, Martin

>>> sessionInfo()
>> R version 3.6.1 (2019-07-05) Platform:
>> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7

>> Matrix products: default BLAS/LAPACK:
>> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

>>