[Rd] Sys.readlink (on BSD vs Linux)
Hello together, the function `Sys.readlink` uses the system's readlink command to resolve symlink paths. On OSX/BSD the command has a different meaning than on Linux [1]. There exists the tool 'realpath', which seems suitable for the task, at least applied at the command line level [2]. It is used in `normalizePath`. I suggest (at least the latter) to * use realpath instead readlink within Sys.readlink (do_readlink -> do_normalizepath) * link to `normalizePath` in the Rd document, eventually mentioning the difference Many thanks, Sven [1] see https://www.freebsd.org/cgi/man.cgi?query=readlink vs http://linux.die.net/man/1/readlink [2] https://www.freebsd.org/cgi/man.cgi?query=realpath http://linux.die.net/man/1/realpath --- Sven E. Templer Bioinformatics Core Group Max Planck Institute for Biology of Ageing Joseph-Stelzmann-Strasse 9b 50931 Cologne, Germany Phone: 0049 (0)221 37970 325 temp...@age.mpg.de http://www.age.mpg.de/the-science/core-facilities/bioinformatics/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sys.readlink (on BSD vs Linux)
On 29.02.2016 10:34, Sven E. Templer wrote: > Hello together, > > the function `Sys.readlink` uses the system's readlink command to resolve > symlink paths. On OSX/BSD the command has a different meaning than on Linux > [1]. > > There exists the tool 'realpath', which seems suitable for the task, at least > applied at the command line level [2]. It is used in `normalizePath`. > > I suggest (at least the latter) to > * use realpath instead readlink within Sys.readlink (do_readlink -> > do_normalizepath) > * link to `normalizePath` in the Rd document, eventually mentioning the > difference > > Many thanks, > Sven > > [1] see > https://www.freebsd.org/cgi/man.cgi?query=readlink > vs > http://linux.die.net/man/1/readlink > > [2] > https://www.freebsd.org/cgi/man.cgi?query=realpath > http://linux.die.net/man/1/realpath What do you mean by "different meaning"? How are the command line tools [1] relevant when R is using the C function 'readlink'? http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2 http://man7.org/linux/man-pages/man2/readlink.2.html -- Mikko Korpela Aalto University School of Science Department of Computer Science __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sys.readlink (on BSD vs Linux)
Hello, sorry for not being clear enough. My problem is represented with the following code, running on OSX: mkdir ~/test ln -s ~/test ~/testlink touch ~/test/foo Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); normalizePath(c("~/test/foo","~/testlink/foo"))' I expected `Sys.readlink` to show the same output as `normalizePath`. Also, I think the readlink.h imported to R to be the same as from the system's `readlink` command, thus mimicking the command line difference. Am I wrong with the latter? Anyway, the behaviour is irritating, thus the request to at least mention `normalizePath` in the Rd of `Sys.readlink`. Best, Sven > On 29 Feb 2016, at 11:44, Mikko Korpela wrote: > > On 29.02.2016 10:34, Sven E. Templer wrote: >> Hello together, >> >> the function `Sys.readlink` uses the system's readlink command to resolve >> symlink paths. On OSX/BSD the command has a different meaning than on Linux >> [1]. >> >> There exists the tool 'realpath', which seems suitable for the task, at >> least applied at the command line level [2]. It is used in `normalizePath`. >> >> I suggest (at least the latter) to >> * use realpath instead readlink within Sys.readlink (do_readlink -> >> do_normalizepath) >> * link to `normalizePath` in the Rd document, eventually mentioning the >> difference >> >> Many thanks, >> Sven >> >> [1] see >> https://www.freebsd.org/cgi/man.cgi?query=readlink >> vs >> http://linux.die.net/man/1/readlink >> >> [2] >> https://www.freebsd.org/cgi/man.cgi?query=realpath >> http://linux.die.net/man/1/realpath > > What do you mean by "different meaning"? How are the command line tools > [1] relevant when R is using the C function 'readlink'? > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html > https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2 > http://man7.org/linux/man-pages/man2/readlink.2.html > > -- > Mikko Korpela > Aalto University School of Science > Department of Computer Science __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sys.readlink (on BSD vs Linux)
> On 29 Feb 2016, at 11:59, Sven Templer wrote: > > Also, I think the readlink.h imported to R to be the same as from the > system's `readlink` command, thus mimicking the command line difference. Please ignore this statement, sorry. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sys.readlink (on BSD vs Linux)
> On Feb 29, 2016, at 5:59 AM, Sven Templer wrote: > > Hello, > > sorry for not being clear enough. > > My problem is represented with the following code, running on OSX: > > mkdir ~/test > ln -s ~/test ~/testlink > touch ~/test/foo > Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); > normalizePath(c("~/test/foo","~/testlink/foo"))' > > I expected `Sys.readlink` to show the same output as `normalizePath`. Why? To quote from the Sys.readlink() docs: Value: A character vector of the same length as ‘paths’. The entries are the path of the file linked to, ‘""’ if the path is not a symbolic link. since you are referring to a file and not a link the result is as expected "" - both on OS X and Linux. > Also, I think the readlink.h imported to R to be the same as from the > system's `readlink` command, thus mimicking the command line difference. > > Am I wrong with the latter? Anyway, the behaviour is irritating, thus the > request to at least mention `normalizePath` in the Rd of `Sys.readlink`. > > Best, > Sven > > >> On 29 Feb 2016, at 11:44, Mikko Korpela wrote: >> >> On 29.02.2016 10:34, Sven E. Templer wrote: >>> Hello together, >>> >>> the function `Sys.readlink` uses the system's readlink command to resolve >>> symlink paths. On OSX/BSD the command has a different meaning than on Linux >>> [1]. >>> >>> There exists the tool 'realpath', which seems suitable for the task, at >>> least applied at the command line level [2]. It is used in `normalizePath`. >>> >>> I suggest (at least the latter) to >>> * use realpath instead readlink within Sys.readlink (do_readlink -> >>> do_normalizepath) >>> * link to `normalizePath` in the Rd document, eventually mentioning the >>> difference >>> >>> Many thanks, >>> Sven >>> >>> [1] see >>> https://www.freebsd.org/cgi/man.cgi?query=readlink >>> vs >>> http://linux.die.net/man/1/readlink >>> >>> [2] >>> https://www.freebsd.org/cgi/man.cgi?query=realpath >>> http://linux.die.net/man/1/realpath >> >> What do you mean by "different meaning"? How are the command line tools >> [1] relevant when R is using the C function 'readlink'? >> >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html >> https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2 >> http://man7.org/linux/man-pages/man2/readlink.2.html >> >> -- >> Mikko Korpela >> Aalto University School of Science >> Department of Computer Science > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sys.readlink (on BSD vs Linux)
Yes, `Sys.readlink` is returning values as explained/expected. I was very confused by mixing C library functions with coreutils and not reading careful enough, please excuse me for that. A link to `normalizePath` would be of help in the 'See Also' section, in my opinion. Regards, Sven > On 29 Feb 2016, at 16:02, Simon Urbanek wrote: > > >> On Feb 29, 2016, at 5:59 AM, Sven Templer wrote: >> >> Hello, >> >> sorry for not being clear enough. >> >> My problem is represented with the following code, running on OSX: >> >> mkdir ~/test >> ln -s ~/test ~/testlink >> touch ~/test/foo >> Rscript -e 'Sys.readlink(c("~/test/foo", "~/testlink/foo")); >> normalizePath(c("~/test/foo","~/testlink/foo"))' >> >> I expected `Sys.readlink` to show the same output as `normalizePath`. > > > Why? To quote from the Sys.readlink() docs: > > Value: > > A character vector of the same length as ‘paths’. The entries are > the path of the file linked to, ‘""’ if the path is not a symbolic > link. > > since you are referring to a file and not a link the result is as expected "" > - both on OS X and Linux. > > >> Also, I think the readlink.h imported to R to be the same as from the >> system's `readlink` command, thus mimicking the command line difference. >> >> Am I wrong with the latter? Anyway, the behaviour is irritating, thus the >> request to at least mention `normalizePath` in the Rd of `Sys.readlink`. >> >> Best, >> Sven >> >> >>> On 29 Feb 2016, at 11:44, Mikko Korpela wrote: >>> >>> On 29.02.2016 10:34, Sven E. Templer wrote: Hello together, the function `Sys.readlink` uses the system's readlink command to resolve symlink paths. On OSX/BSD the command has a different meaning than on Linux [1]. There exists the tool 'realpath', which seems suitable for the task, at least applied at the command line level [2]. It is used in `normalizePath`. I suggest (at least the latter) to * use realpath instead readlink within Sys.readlink (do_readlink -> do_normalizepath) * link to `normalizePath` in the Rd document, eventually mentioning the difference Many thanks, Sven [1] see https://www.freebsd.org/cgi/man.cgi?query=readlink vs http://linux.die.net/man/1/readlink [2] https://www.freebsd.org/cgi/man.cgi?query=realpath http://linux.die.net/man/1/realpath >>> >>> What do you mean by "different meaning"? How are the command line tools >>> [1] relevant when R is using the C function 'readlink'? >>> >>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html >>> https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2 >>> http://man7.org/linux/man-pages/man2/readlink.2.html >>> >>> -- >>> Mikko Korpela >>> Aalto University School of Science >>> Department of Computer Science >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Source code of early S versions
According to Wikipedia: "In 1980 the first version of S was distributed outside Bell Laboratories and in 1981 source versions were made available." but I've been unable to locate any version of S online. Does anyone have a copy, somewhere, rusting away on an old hard disk or slowly flaking off a tape? I've had a rummage round the CMU Statlib on archive.org but no sign of it, and its hard to search for "S" generally. Obviously this would be for archaeological purposes, but there's bound to be someone out there who'd like to try and compile it on a modern system. It might at least be nice to see it in a nice format on Gitlab, for example. But maybe there's licensing problems. Anyone interested in the history of S should read Richard Becker's article from the mid 90s: http://sas.uwaterloo.ca/~rwoldfor/software/R-code/historyOfS.pdf Barry [apologies if S talk is off-topic. Surprisingly I've just discovered the S-news mailing list still runs, but looking at the recent archive I don't think I'd get much success there] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [patch] Support many columns in model.matrix
> Karl Millar via R-devel > on Fri, 26 Feb 2016 15:58:20 -0800 writes: > Generating a model matrix with very large numbers of > columns overflows the stack and/or runs very slowly, due > to the implementation of TrimRepeats(). > This patch modifies it to use Rf_duplicated() to find the > duplicates. This makes the running time linear in the > number of columns and eliminates the recursive function > calls. Thank you, Karl. I've committed this (very slightly modified) to R-devel, (also after looking for a an example that runs on a non-huge computer and shows the difference) : nF <- 11 ; set.seed(1) lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), letters[1:nF]) str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels)) ## 'data.frame':128 obs. of 11 variables: ## $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ... ## $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ... ## $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ... ## $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ... ## $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ... ## $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ... ## $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ... ## $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ... ## $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ... ## $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ... ## $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ... ## ## [1] 139968 system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = "contr.helmert"))) ## user system elapsed ## 0.255 0.033 0.287 --- *with* the patch on my desktop (16 GB) ## 1.489 0.031 1.522 --- for R-patched (i.e. w/o the patch) > dim(mff) [1]128 139968 > object.size(mff) 154791504 bytes --- BTW: These example would gain tremendously if I finally got around to provide model.matrix(, sparse = TRUE) which would then produce a Matrix-package sparse matrix. Even for this somewhat small case, a sparse matrix is a factor of 13.5 x smaller : > s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); > as.vector( s1/s2 ) [1] 13.47043 I'm happy to collaborate with you on adding such a (C level) interface to sparse matrices for this case. Martin Maechler __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Function name exported incorrectly in DLL, strange entries in tmp.def
Hi, I originally posted this on the Rcpp github tracker, but it was suggested I post it here. I tried to compile the package https://github.com/khabbazian/sparseAHC/ under Windows. The package requires C++11 so I had to install the R devel build with gcc 4.9.3, and the latest Rtools. I got compilation and installation to work using Rcpp (0.12.3, from CRAN source). Package loads fine. However, when I tried to use the functions: * the Rcpp exported function ```sparseAHC_dgCIsSymmetric``` works correctly * the Rcpp exported function ```sparseAHC_run_sparseAHC``` doesn't work. I could not see anything wrong with the source files and therefore looked at the DLL with DependencyWalker. Interestingly: * ```sparseAHC_dgCIsSymmetric``` is named correctly * ```sparseAHC_run_sparseAHC``` is named ```sparseAHC_run_sparseAHC.weak._ZNSt4listIiSaIiEE7emplaceIJiEEESt14_List_iteratorIiESt20_List_const_iteratorIiEDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorIdSaIdEE19_M_emplace_back_auxIJdEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorIiSaIiEE19_M_emplace_back_auxIJRKiEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE12emplace_backIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE19_M_emplace_back_auxIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorI7EdgeObjEESt10_Select1stIS5_ESt4lessIiESaIS5_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJRS1_EESG_IJESt17_Rb_tree_iteratorIS5_ESt23_Rb_tree_const_iteratorIS5_EDpOT_._ZNK4Rcpp14not_compatible4whatEv.weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorIiEESt10_Select1stIS4_ESt4lessIiESaIS4_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJOiEESF_IJESt17_Rb_tree_iteratorIS4_ESt23_Rb_tree_const_iteratorIS4_EDpOT_._ZNK4Rcpp14not_compatible4whatEv instead! To find out what is going on, I compiled again and captured the ```tmp.def``` which is generated during compilation. As one can see, directly behind the problematic function name there are a lot of entries starting with ```.weak``` that are apparently incorrectly picked up upon by the linker: ``` [...] ZZN4Rcpp8internal12exitRNGScopeEvE3fun _ZZN4Rcpp8internal13enterRNGScopeEvE3fun sparseAHC_dgCIsSymmetric sparseAHC_run_sparseAHC .weak._ZNSt4listIiSaIiEE7emplaceIJiEEESt14_List_iteratorIiESt20_List_const_iteratorIiEDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt6vectorIdSaIdEE19_M_emplace_back_auxIJdEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt6vectorIiSaIiEE19_M_emplace_back_auxIJRKiEEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE12emplace_backIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt6vectorISt4pairIS0_IiiEdESaIS2_EE19_M_emplace_back_auxIJS2_EEEvDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorI7EdgeObjEESt10_Select1stIS5_ESt4lessIiESaIS5_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJRS1_EESG_IJESt17_Rb_tree_iteratorIS5_ESt23_Rb_tree_const_iteratorIS5_EDpOT_._ZNK4Rcpp14not_compatible4whatEv .weak._ZNSt8_Rb_treeIiSt4pairIKiSt14_List_iteratorIiEESt10_Select1stIS4_ESt4lessIiESaIS4_EE22_M_emplace_hint_uniqueIJRKSt21piecewise_construct_tSt5tupleIJOiEESF_IJESt17_Rb_tree_iteratorIS4_ESt23_Rb_tree_const_iteratorIS4_EDpOT_._ZNK4Rcpp14not_compatible4whatEv _Z12order_leavesRN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEEi _Z13run_sparseAHCN5Eigen12SparseMatrixIdLi0EiEEN4Rcpp6VectorILi16ENS2_15PreserveStorageEEE _Z14dgCIsSymmetricN5Eigen12SparseMatrixIdLi0EiEEd [...] ``` I cannot find this problem documented anywhere. But it seems that somehow additional exports are generated that start with ```.weak```, and the linker mangles all of them into one function name. Help? Michael Stravs Eawag Umweltchemie BU E 23 �berlandstrasse 133 8600 D�bendorf +41 58 765 6742 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Source code of early S versions
The Wikipedia statement may be a bit misleading. S was never open source. Source versions would only have been available with a nondisclosure agreement, and relatively few copies would have been distributed in source. There was a small but valuable "beta test" network, mainly university statistics departments. And two shameless plugs: 1. there is a chapter on the history of all this in my forthcoming book on "Extending R" 2. Rick Becker will give a keynote talk on the history of S at the useR! 2016 conference (user2016.org); 2016 is the 40th anniversary of the first work on S. John PS: somehow "historical" would be less unnerving than "archeological" On Feb 29, 2016, at 8:40 AM, Barry Rowlingson wrote: > According to Wikipedia: > > "In 1980 the first version of S was distributed outside Bell > Laboratories and in 1981 source versions were made available." > > but I've been unable to locate any version of S online. Does anyone > have a copy, somewhere, rusting away on an old hard disk or slowly > flaking off a tape? I've had a rummage round the CMU Statlib on > archive.org but no sign of it, and its hard to search for "S" > generally. > > Obviously this would be for archaeological purposes, but there's > bound to be someone out there who'd like to try and compile it on a > modern system. It might at least be nice to see it in a nice format on > Gitlab, for example. But maybe there's licensing problems. > > Anyone interested in the history of S should read Richard Becker's > article from the mid 90s: > > http://sas.uwaterloo.ca/~rwoldfor/software/R-code/historyOfS.pdf > > Barry > > [apologies if S talk is off-topic. Surprisingly I've just discovered > the S-news mailing list still runs, but looking at the recent archive > I don't think I'd get much success there] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [patch] Support many columns in model.matrix
Thanks. Couldn't you implement model.matrix(..., sparse = TRUE) with a small amount of R code similar to MatrixModels::model.Matrix ? On Mon, Feb 29, 2016 at 10:01 AM, Martin Maechler wrote: >> Karl Millar via R-devel >> on Fri, 26 Feb 2016 15:58:20 -0800 writes: > > > Generating a model matrix with very large numbers of > > columns overflows the stack and/or runs very slowly, due > > to the implementation of TrimRepeats(). > > > This patch modifies it to use Rf_duplicated() to find the > > duplicates. This makes the running time linear in the > > number of columns and eliminates the recursive function > > calls. > > Thank you, Karl. > I've committed this (very slightly modified) to R-devel, > > (also after looking for a an example that runs on a non-huge > computer and shows the difference) : > > nF <- 11 ; set.seed(1) > lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), > letters[1:nF]) > str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels)) > ## 'data.frame':128 obs. of 11 variables: > ## $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ... > ## $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ... > ## $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ... > ## $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ... > ## $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ... > ## $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ... > ## $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ... > ## $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ... > ## $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ... > ## $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ... > ## $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ... > ## > ## [1] 139968 > > system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = > "contr.helmert"))) > ## user system elapsed > ## 0.255 0.033 0.287 --- *with* the patch on my desktop (16 GB) > ## 1.489 0.031 1.522 --- for R-patched (i.e. w/o the patch) > >> dim(mff) > [1]128 139968 >> object.size(mff) > 154791504 bytes > > --- > > BTW: These example would gain tremendously if I finally got > around to provide > >model.matrix(, sparse = TRUE) > > which would then produce a Matrix-package sparse matrix. > > Even for this somewhat small case, a sparse matrix is a factor > of 13.5 x smaller : > >> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); >> as.vector( s1/s2 ) > [1] 13.47043 > > I'm happy to collaborate with you on adding such a (C level) > interface to sparse matrices for this case. > > Martin Maechler __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
I have just committed your first patch (the strlen() replacement) to R-devel, and will soon put it in R-patched as well. I wont have time to look at this again before the 3.2.4 release, so your file.show() patch isn't going to make it unless someone else gets to it. There's still a faint chance that I'll do more in R-devel before 3.3.0, but I think it's best if there were bug reports about both of these problems so they don't get forgotten. Since the first one is mainly a Windows problem, I'll write that one up; I'd appreciate it if you could write up the file.show() issue, after checking against R-devel rev 70247 or higher. Duncan Murdoch On 25/02/2016 5:54 AM, Mikko Korpela wrote: On 25.02.2016 11:31, Mikko Korpela wrote: On 23.02.2016 14:06, Mikko Korpela wrote: On 23.02.2016 11:37, Martin Maechler wrote: nospam@altfeld-im de on Mon, 22 Feb 2016 18:45:59 +0100 writes: > Dear R developers > I think I have found a bug that can be reproduced with two lines of code > and I am very thankful to get your first assessment or feed-back on my > report. > If this is the wrong mailing list or I did something wrong > (e. g. semi "anonymous" email address to protect my privacy and defend > unwanted spam) please let me know since I am new here. > Thank you very much :-) > J. Altfeld Dear J., (yes, a bit less anonymity would be very welcomed here!), You are right, this is a bug, at least in the documentation, but probably "all real", indeed, but read on. > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote: >> >> >> If I execute the code from the "?write.table" examples section >> >> x <- data.frame(a = I("a \" quote"), b = pi) >> # (ommited code) >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") >> >> the resulting CSV file has a size of 6 bytes which is too short >> (truncated): >> >> """,3 reproducibly, yes. If you look at what write.csv does and then simplify, you can get a similar wrong result by write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE") which results in a file with one line """ 3 and if you debug write.table() you see that its building blocks here are file <- file(, encoding = fileEncoding) awriteLines(*, file=file) for the column headers, and then "deeper down" C code which I did not investigate. I took a look at connections.c. There is a call to strlen() that gets confused by null characters. I think the obvious fix is to avoid the call to strlen() as the size is already known: Index: src/main/connections.c === --- src/main/connections.c (revision 70213) +++ src/main/connections.c (working copy) @@ -369,7 +369,7 @@ /* is this safe? */ warning(_("invalid char string in output conversion")); *ob = '\0'; - con->write(outbuf, 1, strlen(outbuf), con); + con->write(outbuf, 1, ob - outbuf, con); } while(again && inb > 0); /* it seems some iconv signal -1 on zero-length input */ } else But just looking a bit at such a file() object with writeLines() seems slightly revealing, as e.g., 'eol' does not seem to "work" for this encoding: > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding = "UTF-16LE") > writeLines(LETTERS[3:1], ff); writeLines("|", ff); writeLines(">a", ff) > close(ff) > file.show(fn) CBA|> > file.size(fn) [1] 5 > With the patch applied: > readLines(fn, encoding="UTF-16LE", skipNul=TRUE) [1] "C" "B" "A" "|" ">a" > file.size(fn) [1] 22 I just realized that I was misusing the encoding argument of readLines(). The code above works by accident, but the following would be more appropriate: > ff <- file(fn, open="r", encoding="UTF-16LE") > readLines(ff) [1] "C" "B" "A" "|" ">a" > close(ff) Testing on Linux, with the patch applied. (As noted by Duncan Murdoch, the patch is incomplete on Windows.) Before inspecting the file with readLines() I tried file.show() but it did not work as expected. On Linux using a UTF-8 locale, the result of trying to show the truly UTF-16LE encoded file with > file.show(fn, encoding="UTF-16LE") was a pager showing "<43>" (quotes not included) followed by several empty lines. With the following patch, the command works correctly (in this case, on this platform, not tested comprehensively). The idea is to read the input file "raw" in order to avoid problems with null characters. The input then needs to be split into lines after iconv(), or it could be written to the output file with cat() if the style of line termination characters does not matter. The 'perl = TRUE' is for assumed performance advantage only. It can be removed, or one might want to test if there is a
Re: [Rd] Source code of early S versions
On Mon, Feb 29, 2016 at 6:17 PM, John Chambers wrote: > The Wikipedia statement may be a bit misleading. > > S was never open source. Source versions would only have been available with > a nondisclosure agreement, and relatively few copies would have been > distributed in source. There was a small but valuable "beta test" network, > mainly university statistics departments. So it was free (or at least distribution cost only), but with a nondisclosure agreement? Did binaries circulate freely, legally or otherwise? Okay, guess I'll read the book. I'm sure I saw S source early in my career (1990 or so), possibly on an early Sun 3/60 system or even the on-the-way-out Whitechapel MG-1 workstations. > And two shameless plugs: > > 1. there is a chapter on the history of all this in my forthcoming book on > "Extending R" That will sit nicely on the shelf next to "Extending The S System" that Allan Wilks gave me :) > PS: somehow "historical" would be less unnerving than "archeological" At least I didn't say palaeontological. Thanks for the response. Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Source code of early S versions
> On 29 Feb 2016, at 20:54 pm, Barry Rowlingson > wrote: > > On Mon, Feb 29, 2016 at 6:17 PM, John Chambers wrote: >> The Wikipedia statement may be a bit misleading. >> >> S was never open source. Source versions would only have been available >> with a nondisclosure agreement, and relatively few copies would have been >> distributed in source. There was a small but valuable "beta test" network, >> mainly university statistics departments. > > So it was free (or at least distribution cost only), but with a > nondisclosure agreement? Did binaries circulate freely, legally or > otherwise? Okay, guess I'll read the book. > I don’t think I have seen S source, but some other Bell software has license of this type: C THIS INFORMATION IS PROPRIETARY AND IS THE C PROPERTY OF BELL TELEPHONE LABORATORIES, C INCORPORATED. ITS REPRODUCTION OR DISCLOSURE C TO OTHERS, EITHER ORALLY OR IN WRITING, IS C PROHIBITED WITHOUT WRITTEN PRERMISSION OF C BELL LABORATORIES. C IT IS UNDERSTOOD THAT THESE MATERIALS WILL BE USED FOR C EDUCATIONAL AND INSTRUCTIONAL PURPOSES ONLY. (Obviously in FORTRAN) So the code was “open” in the sense that you could see the code, and it had to be “open", because source code was the only way to distribute software before the era of widespread platforms allowing binary distributions (such as VAX/VMS or Intel/MS-DOS). However, the license in effect says that although you can see the code, you are not even allowed to tell anybody that you have seen it. I don’t know how this is interpreted currently, but you may ask the current owner, Nokia. Cheers, Jari Oksanen __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
The file.show() issue is now in the bug tracker. I used a slightly different example to demonstrate the problem. https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16738 - Mikko On 29.02.2016 20:30, Duncan Murdoch wrote: > I have just committed your first patch (the strlen() replacement) to > R-devel, and will soon put it in R-patched as well. I wont have time to > look at this again before the 3.2.4 release, so your file.show() patch > isn't going to make it unless someone else gets to it. > > There's still a faint chance that I'll do more in R-devel before 3.3.0, > but I think it's best if there were bug reports about both of these > problems so they don't get forgotten. Since the first one is mainly a > Windows problem, I'll write that one up; I'd appreciate it if you could > write up the file.show() issue, after checking against R-devel rev 70247 > or higher. > > Duncan Murdoch > > On 25/02/2016 5:54 AM, Mikko Korpela wrote: >> On 25.02.2016 11:31, Mikko Korpela wrote: >>> On 23.02.2016 14:06, Mikko Korpela wrote: On 23.02.2016 11:37, Martin Maechler wrote: >> nospam@altfeld-im de >> on Mon, 22 Feb 2016 18:45:59 +0100 writes: > > > Dear R developers > > I think I have found a bug that can be reproduced with two > lines of code > > and I am very thankful to get your first assessment or > feed-back on my > > report. > > > If this is the wrong mailing list or I did something wrong > > (e. g. semi "anonymous" email address to protect my privacy > and defend > > unwanted spam) please let me know since I am new here. > > > Thank you very much :-) > > > J. Altfeld > > Dear J., > (yes, a bit less anonymity would be very welcomed here!), > > You are right, this is a bug, at least in the documentation, but > probably "all real", indeed, > > but read on. > > > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote: > >> > >> > >> If I execute the code from the "?write.table" examples section > >> > >> x <- data.frame(a = I("a \" quote"), b = pi) > >> # (ommited code) > >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") > >> > >> the resulting CSV file has a size of 6 bytes which is too > short > >> (truncated): > >> > >> """,3 > > reproducibly, yes. > If you look at what write.csv does > and then simplify, you can get a similar wrong result by > >write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE") > > which results in a file with one line > > """ 3 > > and if you debug write.table() you see that its building blocks > here are > file <- file(, encoding = fileEncoding) > > a writeLines(*, file=file) for the column headers, > > and then "deeper down" C code which I did not investigate. I took a look at connections.c. There is a call to strlen() that gets confused by null characters. I think the obvious fix is to avoid the call to strlen() as the size is already known: Index: src/main/connections.c === --- src/main/connections.c(revision 70213) +++ src/main/connections.c(working copy) @@ -369,7 +369,7 @@ /* is this safe? */ warning(_("invalid char string in output conversion")); *ob = '\0'; -con->write(outbuf, 1, strlen(outbuf), con); +con->write(outbuf, 1, ob - outbuf, con); } while(again && inb > 0); /* it seems some iconv signal -1 on zero-length input */ } else > > But just looking a bit at such a file() object with writeLines() > seems slightly revealing, as e.g., 'eol' does not seem to > "work" for this encoding: > > > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding = > "UTF-16LE") > > writeLines(LETTERS[3:1], ff); writeLines("|", ff); > writeLines(">a", ff) > > close(ff) > > file.show(fn) > CBA|> > > file.size(fn) > [1] 5 > > With the patch applied: > readLines(fn, encoding="UTF-16LE", skipNul=TRUE) [1] "C" "B" "A" "|" ">a" > file.size(fn) [1] 22 >>> I just realized that I was misusing the encoding argument of >>> readLines(). The code above works by accident, but the following would >>> be more appropriate: >>> >>> > ff <- file(fn, open="r", encoding="UTF-16LE") >>> > readLines(ff) >>> [1] "C" "B" "A" "|" ">a" >>> > close(ff) >>> >>> Testing on Linux, with the patch applied. (As noted by Duncan Murdoch, >>> the patch is incomplete on Windows.) >> Before
[Rd] Milestone: 8000 packages on CRAN
Another 1000 packages were added to CRAN, which took less than 7 months. Today (February 29, 2017), the Comprehensive R Archive Network (CRAN) [1] reports: “Currently, the CRAN package repository features 8002 available packages.” The rate with which new packages are added to CRAN is increasing. In 2014-2015 we had 1000 packages added to CRAN in 355 days (2.8 per day), the following 1000 packages took 287 days (3.5 per day) and now the most recent 1000 packages clocked in at an impressive 201 days (5.0 per day). Since the start of CRAN 18.9 years ago on April 23, 1997 [2], there has been on average one new package appearing on CRAN every 20.6 hours - it is actually more frequent than that because dropped/archived packages are not accounted for. The 8000 packages on CRAN are maintained by ~4279 people [3]. Thanks to the CRAN team and to all package developers. You can give back by carefully reporting bugs to the maintainers, properly citing any packages you use in your publications, cf. citation("pkg name") and help out helping others using the R. Milestones: 2016-02-29: 8000 packages [this post] 2015-08-12: 7000 packages [11] 2014-10-29: 6000 packages [10] 2013-11-08: 5000 packages [9] 2012-08-23: 4000 packages [8] 2011-05-12: 3000 packages [7] 2009-10-04: 2000 packages [6] 2007-04-12: 1000 packages [5] 2004-10-01: 500 packages [4] 2003-04-01: 250 packages [4] These data are for CRAN only. There are many more packages elsewhere, e.g. R-Forge, Bioconductor, Github etc. [1] http://cran.r-project.org/web/packages/ [2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones [3] http://www.r-pkg.org/ [4] Private data [5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html [6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html [7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html [8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html [9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html [10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html [11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html Thanks Henrik (a long-term fan) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel