[Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Sven E. Templer
Hello together,

the function `Sys.readlink` uses the system's readlink command to resolve 
symlink paths. On OSX/BSD the command has a different meaning than on Linux [1].

There exists the tool 'realpath', which seems suitable for the task, at least 
applied at the command line level [2]. It is used in `normalizePath`.

I suggest (at least the latter) to
* use realpath instead readlink within Sys.readlink (do_readlink -> 
do_normalizepath)
* link to `normalizePath` in the Rd document, eventually mentioning the 
difference

Many thanks,
Sven

[1] see
https://www.freebsd.org/cgi/man.cgi?query=readlink
vs
http://linux.die.net/man/1/readlink

[2]
https://www.freebsd.org/cgi/man.cgi?query=realpath
http://linux.die.net/man/1/realpath


---

Sven E. Templer
Bioinformatics Core Group

Max Planck Institute for Biology of Ageing
Joseph-Stelzmann-Strasse 9b
50931 Cologne, Germany

Phone: 0049 (0)221 37970 325
temp...@age.mpg.de
http://www.age.mpg.de/the-science/core-facilities/bioinformatics/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Mikko Korpela
On 29.02.2016 10:34, Sven E. Templer wrote:
> Hello together,
> 
> the function `Sys.readlink` uses the system's readlink command to resolve 
> symlink paths. On OSX/BSD the command has a different meaning than on Linux 
> [1].
> 
> There exists the tool 'realpath', which seems suitable for the task, at least 
> applied at the command line level [2]. It is used in `normalizePath`.
> 
> I suggest (at least the latter) to
> * use realpath instead readlink within Sys.readlink (do_readlink -> 
> do_normalizepath)
> * link to `normalizePath` in the Rd document, eventually mentioning the 
> difference
> 
> Many thanks,
> Sven
> 
> [1] see
> https://www.freebsd.org/cgi/man.cgi?query=readlink
> vs
> http://linux.die.net/man/1/readlink
> 
> [2]
> https://www.freebsd.org/cgi/man.cgi?query=realpath
> http://linux.die.net/man/1/realpath

What do you mean by "different meaning"? How are the command line tools
[1] relevant when R is using the C function 'readlink'?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html
https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2
http://man7.org/linux/man-pages/man2/readlink.2.html

-- 
Mikko Korpela
Aalto University School of Science
Department of Computer Science

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Sven Templer
Hello,

sorry for not being clear enough.

My problem is represented with the following code, running on OSX:

mkdir ~/test
ln -s ~/test ~/testlink
touch ~/test/foo
Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); 
normalizePath(c("~/test/foo","~/testlink/foo"))'

I expected `Sys.readlink` to show the same output as `normalizePath`.
Also, I think the readlink.h imported to R to be the same as from the system's 
`readlink` command, thus mimicking the command line difference.

Am I wrong with the latter? Anyway, the behaviour is irritating, thus the 
request to at least mention `normalizePath` in the Rd of `Sys.readlink`.

Best,
Sven


> On 29 Feb 2016, at 11:44, Mikko Korpela  wrote:
> 
> On 29.02.2016 10:34, Sven E. Templer wrote:
>> Hello together,
>> 
>> the function `Sys.readlink` uses the system's readlink command to resolve 
>> symlink paths. On OSX/BSD the command has a different meaning than on Linux 
>> [1].
>> 
>> There exists the tool 'realpath', which seems suitable for the task, at 
>> least applied at the command line level [2]. It is used in `normalizePath`.
>> 
>> I suggest (at least the latter) to
>> * use realpath instead readlink within Sys.readlink (do_readlink -> 
>> do_normalizepath)
>> * link to `normalizePath` in the Rd document, eventually mentioning the 
>> difference
>> 
>> Many thanks,
>> Sven
>> 
>> [1] see
>> https://www.freebsd.org/cgi/man.cgi?query=readlink
>> vs
>> http://linux.die.net/man/1/readlink
>> 
>> [2]
>> https://www.freebsd.org/cgi/man.cgi?query=realpath
>> http://linux.die.net/man/1/realpath
> 
> What do you mean by "different meaning"? How are the command line tools
> [1] relevant when R is using the C function 'readlink'?
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html
> https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2
> http://man7.org/linux/man-pages/man2/readlink.2.html
> 
> -- 
> Mikko Korpela
> Aalto University School of Science
> Department of Computer Science

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Sven Templer

> On 29 Feb 2016, at 11:59, Sven Templer  wrote:
> 
> Also, I think the readlink.h imported to R to be the same as from the 
> system's `readlink` command, thus mimicking the command line difference.

Please ignore this statement, sorry.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Simon Urbanek

> On Feb 29, 2016, at 5:59 AM, Sven Templer  wrote:
> 
> Hello,
> 
> sorry for not being clear enough.
> 
> My problem is represented with the following code, running on OSX:
> 
> mkdir ~/test
> ln -s ~/test ~/testlink
> touch ~/test/foo
> Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); 
> normalizePath(c("~/test/foo","~/testlink/foo"))'
> 
> I expected `Sys.readlink` to show the same output as `normalizePath`.


Why? To quote from the Sys.readlink() docs:

Value:

 A character vector of the same length as ‘paths’.  The entries are
 the path of the file linked to, ‘""’ if the path is not a symbolic
 link.

since you are referring to a file and not a link the result is as expected "" - 
both on OS X and Linux.


> Also, I think the readlink.h imported to R to be the same as from the 
> system's `readlink` command, thus mimicking the command line difference.
> 
> Am I wrong with the latter? Anyway, the behaviour is irritating, thus the 
> request to at least mention `normalizePath` in the Rd of `Sys.readlink`.
> 
> Best,
> Sven
> 
> 
>> On 29 Feb 2016, at 11:44, Mikko Korpela  wrote:
>> 
>> On 29.02.2016 10:34, Sven E. Templer wrote:
>>> Hello together,
>>> 
>>> the function `Sys.readlink` uses the system's readlink command to resolve 
>>> symlink paths. On OSX/BSD the command has a different meaning than on Linux 
>>> [1].
>>> 
>>> There exists the tool 'realpath', which seems suitable for the task, at 
>>> least applied at the command line level [2]. It is used in `normalizePath`.
>>> 
>>> I suggest (at least the latter) to
>>> * use realpath instead readlink within Sys.readlink (do_readlink -> 
>>> do_normalizepath)
>>> * link to `normalizePath` in the Rd document, eventually mentioning the 
>>> difference
>>> 
>>> Many thanks,
>>> Sven
>>> 
>>> [1] see
>>> https://www.freebsd.org/cgi/man.cgi?query=readlink
>>> vs
>>> http://linux.die.net/man/1/readlink
>>> 
>>> [2]
>>> https://www.freebsd.org/cgi/man.cgi?query=realpath
>>> http://linux.die.net/man/1/realpath
>> 
>> What do you mean by "different meaning"? How are the command line tools
>> [1] relevant when R is using the C function 'readlink'?
>> 
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html
>> https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2
>> http://man7.org/linux/man-pages/man2/readlink.2.html
>> 
>> -- 
>> Mikko Korpela
>> Aalto University School of Science
>> Department of Computer Science
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Sys.readlink (on BSD vs Linux)

2016-02-29 Thread Sven Templer
Yes,

`Sys.readlink` is returning values as explained/expected.
I was very confused by mixing C library functions with coreutils and not 
reading careful enough, please excuse me for that.
A link to `normalizePath` would be of help in the 'See Also' section, in my 
opinion.

Regards,
Sven

> On 29 Feb 2016, at 16:02, Simon Urbanek  wrote:
> 
> 
>> On Feb 29, 2016, at 5:59 AM, Sven Templer  wrote:
>> 
>> Hello,
>> 
>> sorry for not being clear enough.
>> 
>> My problem is represented with the following code, running on OSX:
>> 
>> mkdir ~/test
>> ln -s ~/test ~/testlink
>> touch ~/test/foo
>> Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); 
>> normalizePath(c("~/test/foo","~/testlink/foo"))'
>> 
>> I expected `Sys.readlink` to show the same output as `normalizePath`.
> 
> 
> Why? To quote from the Sys.readlink() docs:
> 
> Value:
> 
> A character vector of the same length as ‘paths’.  The entries are
> the path of the file linked to, ‘""’ if the path is not a symbolic
> link.
> 
> since you are referring to a file and not a link the result is as expected "" 
> - both on OS X and Linux.
> 
> 
>> Also, I think the readlink.h imported to R to be the same as from the 
>> system's `readlink` command, thus mimicking the command line difference.
>> 
>> Am I wrong with the latter? Anyway, the behaviour is irritating, thus the 
>> request to at least mention `normalizePath` in the Rd of `Sys.readlink`.
>> 
>> Best,
>> Sven
>> 
>> 
>>> On 29 Feb 2016, at 11:44, Mikko Korpela  wrote:
>>> 
>>> On 29.02.2016 10:34, Sven E. Templer wrote:
 Hello together,
 
 the function `Sys.readlink` uses the system's readlink command to resolve 
 symlink paths. On OSX/BSD the command has a different meaning than on 
 Linux [1].
 
 There exists the tool 'realpath', which seems suitable for the task, at 
 least applied at the command line level [2]. It is used in `normalizePath`.
 
 I suggest (at least the latter) to
 * use realpath instead readlink within Sys.readlink (do_readlink -> 
 do_normalizepath)
 * link to `normalizePath` in the Rd document, eventually mentioning the 
 difference
 
 Many thanks,
 Sven
 
 [1] see
 https://www.freebsd.org/cgi/man.cgi?query=readlink
 vs
 http://linux.die.net/man/1/readlink
 
 [2]
 https://www.freebsd.org/cgi/man.cgi?query=realpath
 http://linux.die.net/man/1/realpath
>>> 
>>> What do you mean by "different meaning"? How are the command line tools
>>> [1] relevant when R is using the C function 'readlink'?
>>> 
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html
>>> https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2
>>> http://man7.org/linux/man-pages/man2/readlink.2.html
>>> 
>>> -- 
>>> Mikko Korpela
>>> Aalto University School of Science
>>> Department of Computer Science
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Source code of early S versions

2016-02-29 Thread Barry Rowlingson
According to Wikipedia:

"In 1980 the first version of S was distributed outside Bell
Laboratories and in 1981 source versions were made available."

but I've been unable to locate any version of S online. Does anyone
have a copy, somewhere, rusting away on an old hard disk or slowly
flaking off a tape? I've had a rummage round the CMU Statlib on
archive.org but no sign of it, and its hard to search for "S"
generally.

 Obviously this would be for archaeological purposes, but there's
bound to be someone out there who'd like to try and compile it on a
modern system. It might at least be nice to see it in a nice format on
Gitlab, for example. But maybe there's licensing problems.

 Anyone interested in the history of S should read Richard Becker's
article from the mid 90s:

http://sas.uwaterloo.ca/~rwoldfor/software/R-code/historyOfS.pdf

Barry

[apologies if S talk is off-topic. Surprisingly I've just discovered
the S-news mailing list still runs, but looking at the recent archive
I don't think I'd get much success there]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] Support many columns in model.matrix

2016-02-29 Thread Martin Maechler
> Karl Millar via R-devel 
> on Fri, 26 Feb 2016 15:58:20 -0800 writes:

> Generating a model matrix with very large numbers of
> columns overflows the stack and/or runs very slowly, due
> to the implementation of TrimRepeats().

> This patch modifies it to use Rf_duplicated() to find the
> duplicates.  This makes the running time linear in the
> number of columns and eliminates the recursive function
> calls.

Thank you, Karl.
I've committed this (very slightly modified) to R-devel,

(also after looking for a an example that runs on a non-huge
 computer and shows the difference) :

nF <- 11 ; set.seed(1)
lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), 
letters[1:nF])
str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels))
## 'data.frame':128 obs. of  11 variables:
##  $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ...
##  $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ...
##  $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ...
##  $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ...
##  $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ...
##  $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ...
##  $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ...
##  $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ...
##  $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
##  $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ...
##  $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## 
## [1] 139968

system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = 
"contr.helmert")))
##  user  system elapsed 
## 0.255   0.033   0.287  --- *with* the patch on my desktop (16 GB)
## 1.489   0.031   1.522  --- for R-patched (i.e. w/o the patch)

> dim(mff)
[1]128 139968
> object.size(mff)
154791504 bytes

---

BTW: These example would gain tremendously if I finally got
 around to provide

   model.matrix(, sparse = TRUE)

which would then produce a Matrix-package sparse matrix.

Even for this somewhat small case, a sparse matrix is a factor
of 13.5 x smaller :

> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); 
> as.vector( s1/s2 )
[1] 13.47043

I'm happy to collaborate with you on adding such a (C level)
interface to sparse matrices for this case.

Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Function name exported incorrectly in DLL, strange entries in tmp.def

2016-02-29 Thread Stravs, Michael
Hi,

I originally posted this on the Rcpp github tracker, but it was suggested I 
post it here.

I tried to compile the package https://github.com/khabbazian/sparseAHC/ under 
Windows. The package requires C++11 so I had to install the R devel build with 
gcc 4.9.3, and the latest Rtools.

I got compilation and installation to work using Rcpp (0.12.3, from CRAN 
source). Package loads fine. However, when I tried to use the functions:
* the Rcpp exported function ```sparseAHC_dgCIsSymmetric``` works correctly
* the Rcpp exported function ```sparseAHC_run_sparseAHC``` doesn't work.

I could not see anything wrong with the source files and therefore looked at 
the DLL with DependencyWalker. Interestingly:
* ```sparseAHC_dgCIsSymmetric``` is named correctly
* ```sparseAHC_run_sparseAHC``` is named
```sparseAHC_run_sparseAHC.weak._ZNSt4listIiSaIiEE7emplaceIJiEEESt14_List_iteratorIiESt20_List_const_iteratorIiEDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorIdSaIdEE19_M_emplace_back_auxIJdEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorIiSaIiEE19_M_emplace_back_auxIJRKiEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE12emplace_backIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE19_M_emplace_back_auxIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorI7EdgeObjEESt10_Select1stIS5_ESt4lessIiESaIS5_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJRS1_EESG_IJESt17_Rb_tree_iteratorIS5_ESt23_Rb_tree_const_iteratorIS5_EDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorIiEESt10_Select1stIS4_ESt4lessIiESaIS4_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJOiEESF_IJESt17_Rb_tree_iteratorIS4_ESt23_Rb_tree_const_iteratorIS4_EDpOT_._ZNK4Rcpp14not_compatible4whatEv
 instead!

To find out what is going on, I compiled again and captured the ```tmp.def``` 
which is generated during compilation. As one can see, directly behind the 
problematic function name there are a lot of entries starting with ```.weak``` 
that are apparently incorrectly picked up upon by the linker:
```
[...]
ZZN4Rcpp8internal12exitRNGScopeEvE3fun
_ZZN4Rcpp8internal13enterRNGScopeEvE3fun
sparseAHC_dgCIsSymmetric
sparseAHC_run_sparseAHC
.weak._ZNSt4listIiSaIiEE7emplaceIJiEEESt14_List_iteratorIiESt20_List_const_iteratorIiEDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt6vectorIdSaIdEE19_M_emplace_back_auxIJdEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt6vectorIiSaIiEE19_M_emplace_back_auxIJRKiEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE12emplace_backIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE19_M_emplace_back_auxIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorI7EdgeObjEESt10_Select1stIS5_ESt4lessIiESaIS5_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJRS1_EESG_IJESt17_Rb_tree_iteratorIS5_ESt23_Rb_tree_const_iteratorIS5_EDpOT_._ZNK4Rcpp14not_compatible4whatEv
.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorIiEESt10_Select1stIS4_ESt4lessIiESaIS4_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJOiEESF_IJESt17_Rb_tree_iteratorIS4_ESt23_Rb_tree_const_iteratorIS4_EDpOT_._ZNK4Rcpp14not_compatible4whatEv
_Z12order_leavesRN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEEi
_Z13run_sparseAHCN5Eigen12SparseMatrixIdLi0EiEEN4Rcpp6VectorILi16ENS2_15PreserveStorageEEE
_Z14dgCIsSymmetricN5Eigen12SparseMatrixIdLi0EiEEd
[...]
```

I cannot find this problem documented anywhere. But it seems that somehow 
additional exports are generated that start with ```.weak```, and the linker 
mangles all of them into one function name.

Help?

Michael Stravs
Eawag
Umweltchemie
BU E 23
�berlandstrasse 133
8600 D�bendorf
+41 58 765 6742


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Source code of early S versions

2016-02-29 Thread John Chambers
The Wikipedia statement may be a bit misleading.

S was never open source.  Source versions would only have been available with a 
nondisclosure agreement, and relatively few copies would have been distributed 
in source.  There was a small but valuable "beta test" network, mainly 
university statistics departments.

And two shameless plugs:

1.  there is a chapter on the history of all this in my forthcoming book on 
"Extending R"

2. Rick Becker will give a keynote talk on the history of S at the useR! 2016 
conference (user2016.org); 2016 is the 40th anniversary of the first work on S.

John

PS:  somehow "historical" would be less unnerving than "archeological"


On Feb 29, 2016, at 8:40 AM, Barry Rowlingson  
wrote:

> According to Wikipedia:
> 
> "In 1980 the first version of S was distributed outside Bell
> Laboratories and in 1981 source versions were made available."
> 
> but I've been unable to locate any version of S online. Does anyone
> have a copy, somewhere, rusting away on an old hard disk or slowly
> flaking off a tape? I've had a rummage round the CMU Statlib on
> archive.org but no sign of it, and its hard to search for "S"
> generally.
> 
> Obviously this would be for archaeological purposes, but there's
> bound to be someone out there who'd like to try and compile it on a
> modern system. It might at least be nice to see it in a nice format on
> Gitlab, for example. But maybe there's licensing problems.
> 
> Anyone interested in the history of S should read Richard Becker's
> article from the mid 90s:
> 
> http://sas.uwaterloo.ca/~rwoldfor/software/R-code/historyOfS.pdf
> 
> Barry
> 
> [apologies if S talk is off-topic. Surprisingly I've just discovered
> the S-news mailing list still runs, but looking at the recent archive
> I don't think I'd get much success there]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] Support many columns in model.matrix

2016-02-29 Thread Karl Millar via R-devel
Thanks.

Couldn't you implement model.matrix(..., sparse = TRUE)  with a small
amount of R code similar to MatrixModels::model.Matrix ?

On Mon, Feb 29, 2016 at 10:01 AM, Martin Maechler
 wrote:
>> Karl Millar via R-devel 
>> on Fri, 26 Feb 2016 15:58:20 -0800 writes:
>
> > Generating a model matrix with very large numbers of
> > columns overflows the stack and/or runs very slowly, due
> > to the implementation of TrimRepeats().
>
> > This patch modifies it to use Rf_duplicated() to find the
> > duplicates.  This makes the running time linear in the
> > number of columns and eliminates the recursive function
> > calls.
>
> Thank you, Karl.
> I've committed this (very slightly modified) to R-devel,
>
> (also after looking for a an example that runs on a non-huge
>  computer and shows the difference) :
>
> nF <- 11 ; set.seed(1)
> lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), 
> letters[1:nF])
> str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels))
> ## 'data.frame':128 obs. of  11 variables:
> ##  $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ...
> ##  $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ...
> ##  $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ...
> ##  $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ...
> ##  $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ...
> ##  $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ...
> ##  $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ...
> ##  $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ...
> ##  $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
> ##  $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ...
> ##  $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
> ##
> ## [1] 139968
>
> system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = 
> "contr.helmert")))
> ##  user  system elapsed
> ## 0.255   0.033   0.287  --- *with* the patch on my desktop (16 GB)
> ## 1.489   0.031   1.522  --- for R-patched (i.e. w/o the patch)
>
>> dim(mff)
> [1]128 139968
>> object.size(mff)
> 154791504 bytes
>
> ---
>
> BTW: These example would gain tremendously if I finally got
>  around to provide
>
>model.matrix(, sparse = TRUE)
>
> which would then produce a Matrix-package sparse matrix.
>
> Even for this somewhat small case, a sparse matrix is a factor
> of 13.5 x smaller :
>
>> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); 
>> as.vector( s1/s2 )
> [1] 13.47043
>
> I'm happy to collaborate with you on adding such a (C level)
> interface to sparse matrices for this case.
>
> Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)

2016-02-29 Thread Duncan Murdoch
I have just committed your first patch (the strlen() replacement) to 
R-devel, and will soon put it in R-patched as well.  I wont have time to 
look at this again before the 3.2.4 release, so your file.show() patch 
isn't going to make it unless someone else gets to it.


There's still a faint chance that I'll do more in R-devel before 3.3.0, 
but I think it's best if there were bug reports about both of these 
problems so they don't get forgotten.  Since the first one is mainly a 
Windows problem, I'll write that one up; I'd appreciate it if you could 
write up the file.show() issue, after checking against R-devel rev 70247 
or higher.


Duncan Murdoch

On 25/02/2016 5:54 AM, Mikko Korpela wrote:

On 25.02.2016 11:31, Mikko Korpela wrote:

On 23.02.2016 14:06, Mikko Korpela wrote:

On 23.02.2016 11:37, Martin Maechler wrote:

nospam@altfeld-im de 
 on Mon, 22 Feb 2016 18:45:59 +0100 writes:


 > Dear R developers
 > I think I have found a bug that can be reproduced with two lines of code
 > and I am very thankful to get your first assessment or feed-back on my
 > report.

 > If this is the wrong mailing list or I did something wrong
 > (e. g. semi "anonymous" email address to protect my privacy and defend
 > unwanted spam) please let me know since I am new here.

 > Thank you very much :-)

 > J. Altfeld

Dear J.,
(yes, a bit less anonymity would be very welcomed here!),

You are right, this is a bug, at least in the documentation, but
probably "all real", indeed,

but read on.

 > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote:
 >>
 >>
 >> If I execute the code from the "?write.table" examples section
 >>
 >> x <- data.frame(a = I("a \" quote"), b = pi)
 >> # (ommited code)
 >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE")
 >>
 >> the resulting CSV file has a size of 6 bytes which is too short
 >> (truncated):
 >>
 >> """,3

reproducibly, yes.
If you look at what write.csv does
and then simplify, you can get a similar wrong result by

   write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE")

which results in a file with one line

""" 3

and if you debug  write.table() you see that its building blocks
here are
 file <- file(, encoding = fileEncoding)

awriteLines(*, file=file)  for the column headers,

and then "deeper down" C code which I did not investigate.


I took a look at connections.c. There is a call to strlen() that gets
confused by null characters. I think the obvious fix is to avoid the
call to strlen() as the size is already known:

Index: src/main/connections.c
===
--- src/main/connections.c  (revision 70213)
+++ src/main/connections.c  (working copy)
@@ -369,7 +369,7 @@
/* is this safe? */
warning(_("invalid char string in output conversion"));
*ob = '\0';
-   con->write(outbuf, 1, strlen(outbuf), con);
+   con->write(outbuf, 1, ob - outbuf, con);
} while(again && inb > 0);  /* it seems some iconv signal -1 on
   zero-length input */
  } else




But just looking a bit at such a file() object with writeLines()
seems slightly revealing, as e.g., 'eol' does not seem to
"work" for this encoding:

 > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding = "UTF-16LE")
 > writeLines(LETTERS[3:1], ff); writeLines("|", ff); writeLines(">a", ff)
 > close(ff)
 > file.show(fn)
 CBA|>
 > file.size(fn)
 [1] 5
 >


With the patch applied:

 > readLines(fn, encoding="UTF-16LE", skipNul=TRUE)
 [1] "C"  "B"  "A"  "|"  ">a"
 > file.size(fn)
 [1] 22

I just realized that I was misusing the encoding argument of
readLines(). The code above works by accident, but the following would
be more appropriate:

 > ff <- file(fn, open="r", encoding="UTF-16LE")
 > readLines(ff)
 [1] "C"  "B"  "A"  "|"  ">a"
 > close(ff)

Testing on Linux, with the patch applied. (As noted by Duncan Murdoch,
the patch is incomplete on Windows.)

Before inspecting the file with readLines() I tried file.show() but it
did not work as expected. On Linux using a UTF-8 locale, the result of
trying to show the truly UTF-16LE encoded file with

 > file.show(fn, encoding="UTF-16LE")

was a pager showing "<43>" (quotes not included) followed by several
empty lines.

With the following patch, the command works correctly (in this case, on
this platform, not tested comprehensively). The idea is to read the
input file "raw" in order to avoid problems with null characters. The
input then needs to be split into lines after iconv(), or it could be
written to the output file with cat() if the style of line termination
characters does not matter. The 'perl = TRUE' is for assumed performance
advantage only. It can be removed, or one might want to test if there is
a

Re: [Rd] Source code of early S versions

2016-02-29 Thread Barry Rowlingson
On Mon, Feb 29, 2016 at 6:17 PM, John Chambers  wrote:
> The Wikipedia statement may be a bit misleading.
>
> S was never open source.  Source versions would only have been available with 
> a nondisclosure agreement, and relatively few copies would have been 
> distributed in source.  There was a small but valuable "beta test" network, 
> mainly university statistics departments.

So it was free (or at least distribution cost only), but with a
nondisclosure agreement? Did binaries circulate freely, legally or
otherwise? Okay, guess I'll read the book.

 I'm sure I saw S source early in my career (1990 or so), possibly on
an early Sun 3/60 system or even the on-the-way-out Whitechapel MG-1
workstations.

> And two shameless plugs:
>
> 1.  there is a chapter on the history of all this in my forthcoming book on 
> "Extending R"

 That will sit nicely on the shelf next to "Extending The S System"
that Allan Wilks gave me :)

> PS:  somehow "historical" would be less unnerving than "archeological"

 At least I didn't say palaeontological.

Thanks for the response.

Barry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Source code of early S versions

2016-02-29 Thread Jari Oksanen

> On 29 Feb 2016, at 20:54 pm, Barry Rowlingson  
> wrote:
> 
> On Mon, Feb 29, 2016 at 6:17 PM, John Chambers  wrote:
>> The Wikipedia statement may be a bit misleading.
>> 
>> S was never open source.  Source versions would only have been available 
>> with a nondisclosure agreement, and relatively few copies would have been 
>> distributed in source.  There was a small but valuable "beta test" network, 
>> mainly university statistics departments.
> 
> So it was free (or at least distribution cost only), but with a
> nondisclosure agreement? Did binaries circulate freely, legally or
> otherwise? Okay, guess I'll read the book.
> 
I don’t think I have seen S source, but some other Bell software has license of 
this type:

C THIS INFORMATION IS PROPRIETARY AND IS THE
 
C PROPERTY OF BELL TELEPHONE LABORATORIES,  
 
C INCORPORATED.  ITS REPRODUCTION OR DISCLOSURE 
 
C TO OTHERS, EITHER ORALLY OR IN WRITING, IS
 
C PROHIBITED WITHOUT WRITTEN PRERMISSION OF 
 
C BELL LABORATORIES. 

C IT IS UNDERSTOOD THAT THESE MATERIALS WILL BE USED FOR
 
C EDUCATIONAL AND INSTRUCTIONAL PURPOSES ONLY.

(Obviously in FORTRAN)

So the code was “open” in the sense that you could see the code, and it had to 
be “open", because source code  was the only way to distribute software before 
the era of widespread platforms allowing binary distributions (such as VAX/VMS 
or Intel/MS-DOS). However, the license in effect says that although you can see 
the code, you are not even allowed to tell anybody that you have seen it. I 
don’t know how this is interpreted currently, but you may ask the current 
owner, Nokia.

Cheers, Jari Oksanen
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)

2016-02-29 Thread Mikko Korpela
The file.show() issue is now in the bug tracker. I used a slightly
different example to demonstrate the problem.

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16738

- Mikko

On 29.02.2016 20:30, Duncan Murdoch wrote:
> I have just committed your first patch (the strlen() replacement) to
> R-devel, and will soon put it in R-patched as well.  I wont have time to
> look at this again before the 3.2.4 release, so your file.show() patch
> isn't going to make it unless someone else gets to it.
> 
> There's still a faint chance that I'll do more in R-devel before 3.3.0,
> but I think it's best if there were bug reports about both of these
> problems so they don't get forgotten.  Since the first one is mainly a
> Windows problem, I'll write that one up; I'd appreciate it if you could
> write up the file.show() issue, after checking against R-devel rev 70247
> or higher.
> 
> Duncan Murdoch
> 
> On 25/02/2016 5:54 AM, Mikko Korpela wrote:
>> On 25.02.2016 11:31, Mikko Korpela wrote:
>>> On 23.02.2016 14:06, Mikko Korpela wrote:
 On 23.02.2016 11:37, Martin Maechler wrote:
>> nospam@altfeld-im de 
>>  on Mon, 22 Feb 2016 18:45:59 +0100 writes:
>
>  > Dear R developers
>  > I think I have found a bug that can be reproduced with two
> lines of code
>  > and I am very thankful to get your first assessment or
> feed-back on my
>  > report.
>
>  > If this is the wrong mailing list or I did something wrong
>  > (e. g. semi "anonymous" email address to protect my privacy
> and defend
>  > unwanted spam) please let me know since I am new here.
>
>  > Thank you very much :-)
>
>  > J. Altfeld
>
> Dear J.,
> (yes, a bit less anonymity would be very welcomed here!),
>
> You are right, this is a bug, at least in the documentation, but
> probably "all real", indeed,
>
> but read on.
>
>  > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote:
>  >>
>  >>
>  >> If I execute the code from the "?write.table" examples section
>  >>
>  >> x <- data.frame(a = I("a \" quote"), b = pi)
>  >> # (ommited code)
>  >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE")
>  >>
>  >> the resulting CSV file has a size of 6 bytes which is too
> short
>  >> (truncated):
>  >>
>  >> """,3
>
> reproducibly, yes.
> If you look at what write.csv does
> and then simplify, you can get a similar wrong result by
>
>write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE")
>
> which results in a file with one line
>
> """ 3
>
> and if you debug  write.table() you see that its building blocks
> here are
>  file <- file(, encoding = fileEncoding)
>
> a  writeLines(*, file=file)  for the column headers,
>
> and then "deeper down" C code which I did not investigate.

 I took a look at connections.c. There is a call to strlen() that gets
 confused by null characters. I think the obvious fix is to avoid the
 call to strlen() as the size is already known:

 Index: src/main/connections.c
 ===
 --- src/main/connections.c(revision 70213)
 +++ src/main/connections.c(working copy)
 @@ -369,7 +369,7 @@
   /* is this safe? */
   warning(_("invalid char string in output conversion"));
   *ob = '\0';
 -con->write(outbuf, 1, strlen(outbuf), con);
 +con->write(outbuf, 1, ob - outbuf, con);
   } while(again && inb > 0);  /* it seems some iconv signal -1 on
  zero-length input */
   } else


>
> But just looking a bit at such a file() object with writeLines()
> seems slightly revealing, as e.g., 'eol' does not seem to
> "work" for this encoding:
>
>  > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding =
> "UTF-16LE")
>  > writeLines(LETTERS[3:1], ff); writeLines("|", ff);
> writeLines(">a", ff)
>  > close(ff)
>  > file.show(fn)
>  CBA|>
>  > file.size(fn)
>  [1] 5
>  >

 With the patch applied:

  > readLines(fn, encoding="UTF-16LE", skipNul=TRUE)
  [1] "C"  "B"  "A"  "|"  ">a"
  > file.size(fn)
  [1] 22
>>> I just realized that I was misusing the encoding argument of
>>> readLines(). The code above works by accident, but the following would
>>> be more appropriate:
>>>
>>>  > ff <- file(fn, open="r", encoding="UTF-16LE")
>>>  > readLines(ff)
>>>  [1] "C"  "B"  "A"  "|"  ">a"
>>>  > close(ff)
>>>
>>> Testing on Linux, with the patch applied. (As noted by Duncan Murdoch,
>>> the patch is incomplete on Windows.)
>> Before

[Rd] Milestone: 8000 packages on CRAN

2016-02-29 Thread Henrik Bengtsson
Another 1000 packages were added to CRAN, which took less than 7
months. Today (February 29, 2017), the Comprehensive R Archive Network
(CRAN) [1] reports:

“Currently, the CRAN package repository features 8002 available packages.”

The rate with which new packages are added to CRAN is increasing.  In
2014-2015 we had 1000 packages added to CRAN in 355 days (2.8 per
day), the following 1000 packages took 287 days (3.5 per day) and now
the most recent 1000 packages clocked in at an impressive 201 days
(5.0 per day).  Since the start of CRAN 18.9 years ago on April 23,
1997 [2], there has been on average one new package appearing on CRAN
every 20.6 hours - it is actually more frequent than that because
dropped/archived packages are not accounted for. The 8000 packages on
CRAN are maintained by ~4279 people [3].

Thanks to the CRAN team and to all package developers. You can give
back by carefully reporting bugs to the maintainers, properly citing
any packages you use in your publications, cf. citation("pkg name")
and help out helping others using the R.

Milestones:

2016-02-29: 8000 packages [this post]
2015-08-12: 7000 packages [11]
2014-10-29: 6000 packages [10]
2013-11-08: 5000 packages [9]
2012-08-23: 4000 packages [8]
2011-05-12: 3000 packages [7]
2009-10-04: 2000 packages [6]
2007-04-12: 1000 packages [5]
2004-10-01: 500 packages [4]
2003-04-01: 250 packages [4]

These data are for CRAN only. There are many more packages elsewhere,
e.g. R-Forge, Bioconductor, Github etc.

[1] http://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] http://www.r-pkg.org/
[4] Private data
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html

Thanks

Henrik
(a long-term fan)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel