Re: [Rd] Runnable R packages
Some other major tech companies have in the past widely use Runnable R Archives (".Rar" files), similar to Python .par files [1], and integrate them completely into the proprietary R package build system in use there. I thought there were a few systems like this that had made their way to CRAN or the UseR conferences, but I don't have a link. Building something specific to your organization on top of the python .par framework to archive up R, your needed packages/shared libraries, and other dependencies with a runner script to R CMD RUN your entry point in a sandbox is pretty straightforward way to have control in a way that makes sense for your environment. - Murray [1] https://google.github.io/subpar/subpar.html On Mon, Jan 7, 2019 at 12:53 PM David Lindelof wrote: > Dear all, > > I’m working as a data scientist in a major tech company. I have been using > R for almost 20 years now and there’s one issue that’s been bugging me of > late. I apologize in advance if this has been discussed before. > > R has traditionally been used for running short scripts or data analysis > notebooks, but there’s recently been a growing interest in developing full > applications in the language. Three examples come to mind: > > 1) The Shiny web application framework, which facilitates the developent of > rich, interactive web applications > 2) The httr package, which provides lower-level facilities than Shiny for > writing web services > 3) Batch jobs run by data scientists according to, say, a cron schedule > > Compared with other languages, R’s support for such applications is rather > poor. The Rscript program is generally used to run an R script or an > arbitrary R expression, but I feel it suffers from a few problems: > > 1) It encourages developers of batch jobs to provide their code in a single > R file (bad for code structure and unit-testability) > 2) It provides no way to deal with dependencies on other packages > 3) It provides no way to "run" an application provided as an R package > > For example, let’s say I want to run a Shiny application that I provide as > an R package (to keep the code modular, to benefit from unit tests, and to > declare dependencies properly). I would then need to a) uncompress my R > package, b) somehow, ensure my dependencies are installed, and c) call > runApp(). This can get tedious, fast. > > Other languages let the developer package their code in "runnable" > artefacts, and let the developer specify the main entry point. The > mechanics depend on the language but are remarkably similar, and suggest a > way to implement this in R. Through declarations in some file, the > developer can often specify dependencies and declare where the program’s > "main" function resides. Consider Java: > > Artefact: .jar file > Declarations file: Manifest file > Entry point: declared as 'Main-Class' > Executed as: java -jar > > Or Python: > > Artefact: Python package, typically as .tar.gz source distribution file > Declarations file: setup.py (which specifies dependencies) > Entry point: special __main__() function > Executed as: python -m > > R has already much of this machinery: > > Artefact: R package > Declarations file: DESCRIPTION > Entry point: ? > Executed as: ? > > I feel that R could benefit from letting the developer specify, possibly in > DESCRIPTION, how to "run" the package. The package could then be run > through, for example, a new R CMD command, for example: > > R CMD RUN > > I’m sure there are plenty of wrinkles in this idea that need to be ironed > out, but is this something that has ever been considered, or that is on R’s > roadmap? > > Thanks for reading so far, > > > > David Lindelöf, Ph.D. > +41 (0)79 415 66 41 or skype:david.lindelof > http://computersandbuildings.com > Follow me on Twitter: > http://twitter.com/dlindelof > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel on FreeBSD: Support for C99 complex type is required
On Sun, Feb 6, 2011 at 8:50 AM, Rainer Hurling wrote: >> I think this is really a FreeBSD support question. In 2011, an OS really >> should have support for a 1999 standard. Darwin, a FreeBSD derivative, >> does and its help page says > > Hmm, on FreeBSD I really have no other piece of software which complains > about lack of C99. FreeBSD is planning on switching to a different compiler, llvm/clang, and so the version of gcc is stale, but still it should be more than sufficient to support C99. FreeBSD started a C99 effort a decade ago and I haven't heard from this initiative in a long time as I thought it was completed. http://www.freebsd.org/projects/c99/index.html There is I believe experimental support for llvm/clang built into FreeBSD 9, so you could try compiling with that instead of gcc. > Ok, I understand. This seems consistent. I will try to contact FreeBSD > support about it. Please do not change back the behaviour for FreeBSD > (towards emulation code) until this is clarified. Yes, please mail freebsd-standa...@google.com I haven't looked at what autoconf is testing exactly but I suspect simply another argument must be provided in the autoconf script to get it to pull up the C99 math functions its looking for. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel on FreeBSD: Support for C99 complex type is required
On Sun, Feb 6, 2011 at 9:24 AM, Murray Stokely wrote: > Yes, please mail freebsd-standa...@google.com Ugh, that should be freebsd-standa...@freebsd.org of course. Silly brain-o. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Signal handling / alarm timeouts
What are the ramifications of setting up user signal handling to allow the use of e.g. alarm(2) to send a SIGALRM to the R process at some number of seconds in the future to e.g. interrupt a routine that is taking too long to complete. I can't find any R language support for this (e.g. a timeout argument to tryCatch() would be ideal), so am wondering what kinds of problems are to be expected if I do this with native C code in a package. Are there other ways to accomplish timeouts for blocks of R code like this? - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to handle INT8 data
The lack of 64 bit integer support causes lots of problems when dealing with certain types of data where the loss of precision from coercing to 53 bits with double is unacceptable. Two packages were developed to deal with this: int64 and bit64. You may need to find archival versions of these packages if they've fallen off cran. Murray (mobile phone) On Jan 20, 2017 7:20 AM, "Gabriel Becker" wrote: I am not on R-core, so cannot speak to future plans to internally support int8 (though my impression is that there aren't any, at least none that are close to fruition). The standard way of dealing with whole numbers too big to fit in an integer is to put them in a numeric (double down in C land). this can represent integers up to 2^53 without loss of precision see ( http://stackoverflow.com/questions/1848700/biggest- integer-that-can-be-stored-in-a-double). This is how long vector indices are (currently) implemented in R. If it's good enough for indices it's probably good enough for whatever you need them for. Hope that helps. ~G On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris wrote: > Hello r users, > > I have to deal with int8 data with R. AFAIK R does only handle int4 > with `as.integer` function [1]. I wonder: > 1. what is the better approach to handle int8 ? `as.character` ? > `as.numeric` ? > 2. is there any plan to handle int8 in the future ? As you might know, > int4 is to small to deal with earth population right now. > > Thanks for you ideas, > > int8 eg: > > human_id > -- > -1311071933951566764 > -4708675461424073238 > -6865005668390999818 > 5578000650960353108 > -3219674686933841021 > -6469229889308771589 > -606871692563545028 > -8199987422425699249 > -463287495999648233 > 7675955260644241951 > > reference: > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/ > > -- > Nicolas PARIS > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to handle INT8 data
2^53 == 2^53+1 TRUE Which makes joining or grouping data sets with 64 bit identifiers problematic. Murray (mobile) On Jan 20, 2017 9:15 AM, "Nicolas Paris" wrote: Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : > The lack of 64 bit integer support causes lots of problems when dealing with > certain types of data where the loss of precision from coercing to 53 bits with > double is unacceptable. Hello Murray, Do you mean, by eg. -1311071933951566764 loses in precision during as.numeric(-1311071933951566764) process ? Thanks, > > Two packages were developed to deal with this: int64 and bit64. > > You may need to find archival versions of these packages if they've fallen off > cran. > > Murray (mobile phone) > > On Jan 20, 2017 7:20 AM, "Gabriel Becker" wrote: > > I am not on R-core, so cannot speak to future plans to internally support > int8 (though my impression is that there aren't any, at least none that are > close to fruition). > > The standard way of dealing with whole numbers too big to fit in an integer > is to put them in a numeric (double down in C land). this can represent > integers up to 2^53 without loss of precision see ( > http://stackoverflow.com/questions/1848700/biggest- > integer-that-can-be-stored-in-a-double). > This is how long vector indices are (currently) implemented in R. If it's > good enough for indices it's probably good enough for whatever you need > them for. > > Hope that helps. > > ~G > > > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris > wrote: > > > Hello r users, > > > > I have to deal with int8 data with R. AFAIK R does only handle int4 > > with `as.integer` function [1]. I wonder: > > 1. what is the better approach to handle int8 ? `as.character` ? > > `as.numeric` ? > > 2. is there any plan to handle int8 in the future ? As you might know, > > int4 is to small to deal with earth population right now. > > > > Thanks for you ideas, > > > > int8 eg: > > > > human_id > > -- > > -1311071933951566764 > > -4708675461424073238 > > -6865005668390999818 > > 5578000650960353108 > > -3219674686933841021 > > -6469229889308771589 > > -606871692563545028 > > -8199987422425699249 > > -463287495999648233 > > 7675955260644241951 > > > > reference: > > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/ > > > > -- > > Nicolas PARIS > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Gabriel Becker, PhD > Associate Scientist (Bioinformatics) > Genentech Research > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Nicolas PARIS [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] regenerate Rscript after moving R installation
Simon, do you have some examples of packages with this attribute? Removing the hard-coding of paths in base R and Rscript is one of the many local patches we've maintained in the R I use at my workplace since at least the R 2.5 days. We do this to enable us to send R and all its dependencies off to build farms, unit test clusters, and production clusters for running parallel computations among other use cases where the path of the build server is irrelevant to the server running the R code. I don't recall running into any packages where an absolute path from the build host was hard-coded into the package such that we had to update code to get it to work. But maybe I'm just not using those packages. - Murray On Sat, Sep 21, 2013 at 6:45 PM, Simon Urbanek wrote: > I forgot to mention that some packages bake-in paths as well, so even if > you fix both R and Rscript, it will still not work in general. > > On Sep 22, 2013, at 3:42 AM, Simon Urbanek > wrote: > > > On Sep 21, 2013, at 8:43 PM, Tobias Verbeke < > tobias.verb...@openanalytics.eu> wrote: > > > >> L.S. > >> > >> In this bug report > >> > >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14493#c1 > >> > >> it is mentioned that after moving an R installation > >> one should regenerate the Rscript executable. > >> > >> Is there an easy way to do so (after an R installation has been > >> moved)? > >> > > > > You cannot move installed R. Once you run make install, there are > several places in which paths get baked in - mainly Rscript and the R start > script. What I typically do for deployment on the Labs machines is to use > make install rhome= where is some path that I can always create > a symlink in (I also use DESTDIR so that path doesn't actually need to > exist on the build machine and it avoid polluting --prefix which is not > needed). That way you can move R wherever you want as long so you keep that > one symlink up to date. > > > > Cheers, > > Simon > > > > > >> I have not found any information in the R installation and > >> administration manual. > >> > >> Many thanks in advance for any pointer. > >> > >> Best wishes, > >> Tobias > >> > >> P.S. The background to this question is the usage of Rscript > >> calls in the Makevars files of some R packages on CRAN, so > >> the 'broken' Rscript prevents installation of certain R packages. > >> > >> -- > >> > >> Tobias Verbeke > >> Manager > >> > >> OpenAnalytics BVBA > >> Jupiterstraat 20 > >> 2600 Antwerp > >> Belgium > >> > >> E tobias.verb...@openanalytics.eu > >> M +32 499 36 33 15 > >> http://www.openanalytics.eu > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > >> > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Determining files opened by an R session
Most operating systems have tools which allow you to audit the resources used by a running process, for example the 'lsof' (list open files) command on Unix and MacOS X. Or, for more complex dynamic tracing, the DTrace framework again on MacOS X or BSD Unix. Not sure what the Windows equivalent would be, or what platform you are using, but given the number of ways that code in packages and such may be accessing files in C code possibly based on environment variables or other configuration parameters, I would want to lean heavily on the operating systems tools for things like this rather than rely on parsing your R code looking for specific file access. - Murray On Mon, Nov 4, 2013 at 1:32 PM, Martin Gregory wrote: > I'm using R in a regulated environment and one of the requirements is to > be able to trace how a result is arrived at. I would like to be able to > determine which files are opened in read or write mode by an R session, for > example when a program uses source, sink, file, open, read.table, > write.table or any of the other functions which can be used to read or > write files. I'm also interested in output to graphics devices. > > I've looked in the documentation but only found information relating to > profiling. Looking through the source code it seems that much file i/o is > done via the C functions *_open in main/connections.c but don't see > anything there that looks like logging. > > Could someone let me know if it is possible to log which files are opened? > > Regards, > Martin Gregory > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] inflate zlib compressed data using base R or CRAN package?
I think none of these examples describe a zlib compressed data block inside a binary file that the OP asked about, as all of your examples are e.g. prepending gzip or zip headers. Greg, is memDecompress what you are looking for? - Murray On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel wrote: > > On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote: > | > | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote: > | | I have a binary file type that includes a zlib compressed data block > (ie > | | not gzip). Is anyone aware of a way using base R or a CRAN package to > | | decompress this kind of data (from disk or memory). So far I have found > | | Rcompression::decompress on omegahat, but I would prefer to keep > | | dependencies on CRAN (or bioconductor). I am also trying to avoid > | | writing yet another C level interface to part of zlib. > | > | Unless I am missing something, this is in base R; see help(connections). > | > | Here is a quick demo: > | > | R> write.csv(trees, file="/tmp/trees.csv")# data we all have > | R> system("gzip -v /tmp/trees.csv") # as I am lazy here > | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz > | R> read.csv(gzfile("/tmp/trees.csv.gz")) # works out of the box > > Oh, and in case you meant zip file containing a data file, that also works. > > First converting what I did last > > edd@max:/tmp$ gunzip trees.csv.gz > edd@max:/tmp$ zip trees.zip trees.csv > adding: trees.csv (deflated 50%) > edd@max:/tmp$ > > Then reading the csv from inside the zip file: > > R> read.csv(unz("/tmp/trees.zip", "trees.csv")) > X Girth Height Volume > 1 1 8.3 70 10.3 > 2 2 8.6 65 10.3 > 3 3 8.8 63 10.2 > 4 4 10.5 72 16.4 > 5 5 10.7 81 18.8 > 6 6 10.8 83 19.7 > 7 7 11.0 66 15.6 > 8 8 11.0 75 18.2 > 9 9 11.1 80 22.6 > 10 10 11.2 75 19.9 > 11 11 11.3 79 24.2 > 12 12 11.4 76 21.0 > 13 13 11.4 76 21.4 > 14 14 11.7 69 21.3 > 15 15 12.0 75 19.1 > 16 16 12.9 74 22.2 > 17 17 12.9 85 33.8 > 18 18 13.3 86 27.4 > 19 19 13.7 71 25.7 > 20 20 13.8 64 24.9 > 21 21 14.0 78 34.5 > 22 22 14.2 80 31.7 > 23 23 14.5 74 36.3 > 24 24 16.0 72 38.3 > 25 25 16.3 77 42.6 > 26 26 17.3 81 55.4 > 27 27 17.5 82 55.7 > 28 28 17.9 80 58.3 > 29 29 18.0 80 51.5 > 30 30 18.0 80 51.0 > 31 31 20.6 87 77.0 > R> > > Regards, Dirk > > -- > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] C API to get numrow of data frame
The simplest case would be: int num_rows = Rf_length(VECTOR_ELT(dataframe, 0)); int num_columns = Rf_length(dataframe); There may be edge cases for which this doesn't work; would need to look into how the dim primitive is implemented to be sure. - Murray On Mon, Mar 31, 2014 at 4:40 PM, Sandip Nandi wrote: > Hi , > > Is there any C API to the R API nrow of dataframe ? > > x<- data.frame() > n<- nrow(x) > print(n) > 0 > > > Example : > My C function which deals with data frame looks like and I don't to send > the number of rows of data frame .I want to detect it from the function > itself, my function take data frame as argument and do some on it. I want > API equivalent to nrow. I tried Rf_nrows,Rf_ncols . No much help. > > SEXP writeRR(SEXP dataframe) { > > } > > > Any help is very appreciated. > > Thanks, > Sandip > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] C API to get numrow of data frame
I didn't look at the names because I believe that would be incorrect if the row names were stored internally in the compact form. See ?.set_row_names (hat tip, Tim Hesterberg who showed me this years ago) : 'row.names' can be stored internally in compact form. '.set_row_names(n)' generates that form for automatic row names of length 'n', to be assigned to 'attr(, "row.names")'. '.row_names_info' gives information on the internal form of the row names for a data frame: for details of what information see the argument 'type'. The function I wrote obviously doesn't work for 0 row or 0 column data.frames, you need to check for that. On Mon, Mar 31, 2014 at 6:12 PM, Gábor Csárdi wrote: > I think it is actually better to check the length of the row names. In case > the data frame has zero columns. (FIXME, of course.) > > Gabor > > > On Mon, Mar 31, 2014 at 8:04 PM, Murray Stokely wrote: >> >> The simplest case would be: >> >>int num_rows = Rf_length(VECTOR_ELT(dataframe, 0)); >>int num_columns = Rf_length(dataframe); >> >> There may be edge cases for which this doesn't work; would need to >> look into how the dim primitive is implemented to be sure. >> >>- Murray >> >> >> On Mon, Mar 31, 2014 at 4:40 PM, Sandip Nandi >> wrote: >> > Hi , >> > >> > Is there any C API to the R API nrow of dataframe ? >> > >> > x<- data.frame() >> > n<- nrow(x) >> > print(n) >> > 0 >> > >> > >> > Example : >> > My C function which deals with data frame looks like and I don't to send >> > the number of rows of data frame .I want to detect it from the function >> > itself, my function take data frame as argument and do some on it. I >> > want >> > API equivalent to nrow. I tried Rf_nrows,Rf_ncols . No much help. >> > >> > SEXP writeRR(SEXP dataframe) { >> > >> > } >> > >> > >> > Any help is very appreciated. >> > >> > Thanks, >> > Sandip >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] type.convert and doubles
On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert wrote: > Here's my use case: I have a function that pulls arbitrary financial data > from a web service call such as a stock's industry, price, volume, etc. by > reading the web output as a text table. The data may be either character > (industry, stock name, etc.) or numeric (price, volume, etc.), and the > function generally doesn't know the class in advance. The problem is that we > frequently get numeric values represented with more precision than actually > exists, for instance a price of "2.6999" rather than "2.70". The > numeric representation is exactly one digit too much for type.convert which > (in R 3.10.0) converts it to character instead of numeric (not what I want). > This caused a bunch of "non-numeric argument to binary operator" errors to > appear today as numeric data was now being represented as characters. > > I have no doubt that this probably will cause some unwanted RODBC side > effects for us as well. IMO, getting the class right is more important than > infinite precision. What use is a character representation of a number anyway > if you can't perform arithmetic on it? I would favor at least making the new > behavior optional, but I think many packages (like RODBC) potentially need to > be patched to code around the new feature if it's left in. The uses of character representation of a number are many: unique identifiers/user ids, hash codes, timestamps, or other values where rounding results to the nearest value that can be represented as a numeric type would completely change the results of any data analysis performed on that data. Database join operations are certainly an area where R's previous behavior of silently dropping precision of numbers with type.convert can get you into trouble. For example, things like join operations or group by operations performed in R code would produce erroneous results if you are joining/grouping by a key without the full precision of your underlying data. Records can get joined up incorrectly or aggregated with the wrong groups. If you later want to do arithmetic on them, you can choose to lose precision by using as.numeric() or use one of the large number packages on CRAN (GMP, int64, bit64, etc.). But once you've dropped the precision with as.numeric you can never get it back, which is why the previous behavior was clearly dangerous. I think I had some additional examples in the original bug/patch I filed about this issue a few years ago, but I'm unable to find it on bugs.r-project.org and its not referenced in the cl descriptions or news file. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] type.convert and doubles
On Thu, Apr 17, 2014 at 2:35 PM, Gabor Grothendieck wrote: > Only if you knew that that column was supposed to be numeric. There is The columns that are "supposed" to be numeric are those that can fit into a numeric data type. Previously that was not always the case with columns that could not be represented as a numeric erroneously coerced into a truncated/rounded numeric. > nothing in type.convert or read.table to allow you to override how it > works (colClasses only works if you knew which columns are which in > the first place) nor is there anything to allow you to know which > columns were affected so that you know which columns to look at to fix > it yourself afterwards. You want a casting operation in your SQL query or similar if you want a rounded type that will always fit in a double. Cast or Convert operators in SQL, or similar for however you are getting the data you want to use with type.convert(). This is all application specific and sort of beyond the scope of type.convert(), which now behaves as it has been documented to behave. In my code for this kind of thing I have however typically introduced an option() to let the user control casting behavior for e.g. 64-bit ints in C++. Should they be returned as truncated precision numeric types or the full precision data in a character string representation? In the RProtoBuf package we let the user specify an option() to specify which behavior they need for their application as a shortcut to just always returning the safer character representation and making them coerce to numeric often. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] type.convert and doubles
Yes, I'm also strongly in favor of having an option for this. If there was an option in base R for controlling this we would just use that and get rid of the separate RProtoBuf.int64AsString option we use in the RProtoBuf package on CRAN to control whether 64-bit int types from C++ are returned to R as numerics or character vectors. I agree that reasonable people can disagree about the default, but I found my original bug report about this, so I will counter Robert's example with my favorite example of what was wrong with the previous behavior : tmp<-data.frame(n=c("72057594037927936", "72057594037927937"), name=c("foo", "bar")) length(unique(tmp$n)) # 2 write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE) data <- read.csv("/tmp/foo.csv") length(unique(data$n)) # 1 - Murray On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek wrote: > On Apr 19, 2014, at 9:00 AM, Martin Maechler > wrote: > >>> McGehee, Robert >>>on Thu, 17 Apr 2014 19:15:47 -0400 writes: >> This is all application specific and sort of beyond the scope of type.convert(), which now behaves as it has been documented to behave. >> >>> That's only a true statement because the documentation was changed to >>> reflect the new behavior! The new feature in type.convert certainly does >>> not behave according to the documentation as of R 3.0.3. Here's a snippit: >> >>> The first type that can accept all the >>> non-missing values is chosen (numeric and complex return values >>> will represented approximately, of course). >> >>> The key phrase is in parentheses, which reminds the user to expect a >>> possible loss of precision. That important parenthetical was removed from >>> the documentation in R 3.1.0 (among other changes). >> >>> Putting aside the fact that this introduces a large amount of unnecessary >>> work rewriting SQL / data import code, SQL packages, my biggest conceptual >>> problem is that I can no longer rely on a particular function call >>> returning a particular class. In my example querying stock prices, about 5% >>> of prices came back as factors and the remaining 95% as numeric, so we had >>> random errors popping in throughout the morning. >> >>> Here's a short example showing us how the new behavior can be unreliable. I >>> pass a character representation of a uniformly distributed random variable >>> to type.convert. 90% of the time it is converted to "numeric" and 10% it is >>> a "factor" (in R 3.1.0). In the 10% of cases in which type.convert converts >>> to a factor the leading non-zero digit is always a 9. So if you were >>> expecting a numeric value, then 1 in 10 times you may have a bug in your >>> code that didn't exist before. >> options(digits=16) cl <- NULL; for (i in 1:1) cl[i] <- class(type.convert(format(runif(1 table(cl) >>> cl >>> factor numeric >>> 9909010 >> >> Yes. >> >> Murray's point is valid, too. >> >> But in my view, with the reasoning we have seen here, >> *and* with the well known software design principle of >> "least surprise" in mind, >> I also do think that the default for type.convert() should be what >> it has been for > 10 years now. >> > > I think there should be two separate discussions: > > a) have an option (argument to type.convert and possibly read.table) to > enable/disable this behavior. I'm strongly in favor of this. > > b) decide what the default for a) will be. I have no strong opinion, I can > see arguments in both directions > > But most importantly I think a) is better than the status quo - even if the > discussion about b) drags out. > > Cheers, > Simon > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in sum() on integer vector
FYI, the new int64 package on CRAN gets this right, but is of course somewhat slower since it is not doing hardware 64-bit arithmetic. x <- c(rep(180003L, 1000), -rep(120002L, 1500)) library(int64) sum(as.int64(x)) # [1] 0 - Murray 2011/12/9 Hervé Pagès : > Hi, > > x <- c(rep(180003L, 1000), -rep(120002L, 1500)) > > This is correct: > > > sum(as.double(x)) > [1] 0 > > This is not: > > > sum(x) > [1] 4996000 > > Returning NA (with a warning) would also be acceptable for the latter. > That would make it consistent with cumsum(x): > > > cumsum(x)[length(x)] > [1] NA > Warning message: > Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' > > Thanks! > H. > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
Lots of very sensible policies here. I have one request as someone who has in several cases had to involve company lawyers over intellectual property issues with packages on CRAN -- the first bullet point on ownership of copyright and intellectual property rights could be strengthened further. To the existing text "The ownership of copyright and intellectual property rights of all components of the package must be clear and unambiguous (including from the authors specification in the DESCRIPTION file). Where code is copied (or derived) from the work of others (including from R itself), care must be taken that any copyright statements are preserved and authorship is not misrepresented. Trademarks must be respected." I would add a few additional points : 1. The text of the license itself should be included in the package in a LICENSE or COPYING file, as most of these licenses have things that need to be filled in with names and other data, and just referencing a license name in the DESCRIPTION file is not really a great way to deal with licensing metadata when used exclusively (it's a great complement to a full, filled-out license in the package itself). 2. Per file copyright comment headers can help immensely with ensuring compliance and the accidental incorporation of files under a different license. Comment header blocks with the author name and terms of distribution could be recommended for all source files. - Murray On Tue, Mar 27, 2012 at 4:52 AM, Prof Brian Ripley wrote: > CRAN has for some time had a policies page at > http://cran.r-project.org/web/packages/policies.html > and we would like to draw this to the attention of package maintainers. In > particular, please > > - always send a submission email to c...@r-project.org with the package > name and version on the subject line. Emails sent to individual members of > the team will result in delays at best. > > - run R CMD check --as-cran on the tarball before you submit it. Do > this with the latest version of R possible: definitely R 2.14.2, > preferably R 2.15.0 RC or a recent R-devel. (Later versions of R are > able to give better diagnostics, e.g. for compiled code and especially > on Windows. They may also have extra checks for recently uncovered > problems.) > > Also, please note that CRAN has a very heavy workload (186 packages were > published last week) and to remain viable needs package maintainers to make > its life as easy as possible. > > Kurt Hornik > Uwe Ligges > Brian Ripley > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel on FreeBSD: new C99 functions don't build
On Tue, May 15, 2012 at 10:05 AM, Rainer Hurling wrote: > About April 25th, there had been some changes within R-devel's > src/nmath/pnbeta.c (and probably some other relevant places) and now > building R-devel on FreeBSD 10.0-CURRENT (amd64) with gcc-4.6.4 and > math/R-devel (selfmade forked port from math/R) fails like this: > It seems, that at least one new C99 function (log1pl) is introduced in > R-devel, see > > src/nmath/pnbeta.c:l95 > return (double) (log_p ? log1pl(-ans) : (1 - ans)); AFAIK, Bruce Evans is not happy with the numerical accuracy of other open-source implementations of log1pl and so has blocked their inclusion in FreeBSD pending work on a better implementation. Can you put a conditional FreeBSD check here and use log1p instead of log1pl instead as a workaround? I can admire the insistence on correctness from the FreeBSD libm maintainers for their technical purity, but it can be a bit of a pain for things like this. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] r-devel fails tests for parallel
On Thu, May 17, 2012 at 8:09 AM, Prof Brian Ripley wrote: > This is getting increasingly difficult. GCC 4.6.x and 4.7.x detect a lot of > errors (especially C++ errors) that earlier versions did not -- and that > means CRAN gets a fair number of submissions that we cannot compile. And > there have been a lot of optimization advances since 4.1.x. I would also point out that clang has significantly better error detection and diagnostics compared to current GCC. Installations stuck with old GCC releases for GPL3 reasons should really migrate to clang / llvm. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [PATCH] R ignores PATH_MAX and fails in long directories (PR#14228)
Indeed thanks to ripley@ for submitting it. I don't see a note in the NEWS file and it would be nice to point this out as fixed since others may have run into this problem. Could someone submit something like this as well? Index: NEWS === --- NEWS(revision 51276) +++ NEWS(working copy) @@ -577,6 +577,8 @@ o read.fwf() works again when 'file' is a connection. +o R now works correctly with filesystem paths longer than 255 +characters on platforms that support it (PR#14228). CHANGES IN R VERSION 2.10.1 On Thu, Mar 11, 2010 at 9:12 AM, Seth Falcon wrote: > On 3/11/10 12:45 AM, Henrik Bengtsson wrote: >> >> Thanks for the troubleshooting, >> >> I just want to second this patch; it would be great if PATH_MAX could >> be used everywhere. > > The patch, or at least something quite similar, was applied in r51229. > > + seth > > -- > Seth Falcon | @sfalcon | http://userprimary.net/ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R in sandbox/jail (long question)
On Tue, May 18, 2010 at 7:38 PM, Assaf Gordon wrote: > I've found this old thread: > http://r.789695.n4.nabble.com/R-in-a-sandbox-jail-td921991.html > But for technical reasons I'd prefer not to setup a chroot jail. > I would also point out that the state of the art in the operating system community has moved on significantly since 1982 when chroot was added. BSD Jails, Solaris Zones/Containers, SELinux, etc. all provide much more control over the system calls, network connections, and file and device access granted to applications in different jails/zones. These operating system capabilities solve exactly some of the problems you are trying to solve by painstakingly modifying R, but in a more secure and configurable manner. - Murray __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel