Re: [Rd] Does anyone use Sweave (RweaveLatex) option "expand=FALSE"?
I use it, frequently. The idea for it goes back to some of Knuth's original literate programming ideas for developing weave and tangle when he was writing TeX (the program). I want to be able to document the pieces of some complex algorithm without having to see all of the gory details. For instance, I have code that looks like the following. (Note that this is typed on the fly rather than copied from actual source, so there may be typos.) <>= for (i in 1:nSamples) { <> for (j in 1:nChromosomes) { <> <> <> <> <> } } @ Each of the <> is itself a fairly long piece of code defined and documented somewhere else. (Some of them may themselves be written in the same form to reduce the final size of a chunk to something a human has a chance of understanding. That's the difference between weave and tangle in the original implementation.) By blocking expansion, I can focus on the main steps without having them lost in pages and pages of code. So I vote strongly for retaining "expand=FALSE". Best, Kevin Duncan Murdoch wrote: On 19/08/2010 4:29 PM, Claudia Beleites wrote: I never used it. I got curious, though. What would be a situation that benefits of this option? When I put it in, I thought it would be for people who were writing about Sweave. Duncan Murdoch Maybe a use case could be found by "brute force" (grep all .Rnw files on CRAN for the option? Claudia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Does anyone use Sweave (RweaveLatex) option "expand=FALSE"?
I can certainly live with the line number matching some other part of the code. Duncan Murdoch wrote: On 19/08/2010 5:07 PM, Kevin Coombes wrote: I use it, frequently. The idea for it goes back to some of Knuth's original literate programming ideas for developing weave and tangle when he was writing TeX (the program). I want to be able to document the pieces of some complex algorithm without having to see all of the gory details. For instance, I have code that looks like the following. (Note that this is typed on the fly rather than copied from actual source, so there may be typos.) Okay, thanks. I'll keep it in. So now I have a question: suppose you have an error (syntax error at this point, maybe some other kinds of error in the future) in the <> chunk, but that chunk wasn't eval'd, mainloop was eval'd. So the error is going to be reported as occurring in chunk mainloop, but with a line number from somewhere else in the file. Is that a problem? Duncan Murdoch <>= for (i in 1:nSamples) { <> for (j in 1:nChromosomes) { <> <> <> <> <> } } @ Each of the <> is itself a fairly long piece of code defined and documented somewhere else. (Some of them may themselves be written in the same form to reduce the final size of a chunk to something a human has a chance of understanding. That's the difference between weave and tangle in the original implementation.) By blocking expansion, I can focus on the main steps without having them lost in pages and pages of code. So I vote strongly for retaining "expand=FALSE". Best, Kevin Duncan Murdoch wrote: On 19/08/2010 4:29 PM, Claudia Beleites wrote: I never used it. I got curious, though. What would be a situation that benefits of this option? When I put it in, I thought it would be for people who were writing about Sweave. Duncan Murdoch Maybe a use case could be found by "brute force" (grep all .Rnw files on CRAN for the option? Claudia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Does anyone use Sweave (RweaveLatex) option "expand=FALSE"?
I picked the example from segmenting chromosomes for a reason. I have a fair chunk of code that deals with not quite exceeding the amount of RAM available in the machine sitting on my desktop. If I use functions, then the pass-by-value semantics of R will push me beyond the limits at some points. (This is an empirical statement, not a theoretical one. I was bitten by it several times while trying to analyze a couple of these datasets. And, yes, I know I can get around this by buying a bigger and better machine; it's on order...) The real point is that using functions can be detrimental to the efficiency of the program, in ways that have real world consequences. I haven't thought about doing the same thing with expressions. Expressions don't have quite the same semantics as chunks, and you'd have to make sure the evaluation was delayed so that you cold use the current values of things that were computed in the meantime and I already know how to do this with chunks without having to think so hard. Using expressons would, however, help with the one difficulty that I have with reusing <> (independent of whether or not I use 'expand=FALSE'). I usually work inside emacs, using the emacs-speaks-statistics (ESS) package. ESS doesn't know how to evaluate the <> call inside another chunk. so if I want to step through the code during development, I have to jump around myself to locate the source chunks. With expressions that wouldn't matter. As I ramble on about this, it occurs to me that the underlying issue is that <> are not first class objects either in the LaTeX world or in the R world part of Sweave. If there were a way to promote them to first class objects somehow, then it might make my use of ESS easier while simultaneously making it easier for Duncan to figure out how to report the correct line numbers. But I only have an extremely vague idea of how one might start to do that... Kevin Matt Shotwell wrote: On Thu, 2010-08-19 at 17:07 -0400, Kevin Coombes wrote: I use it, frequently. The idea for it goes back to some of Knuth's original literate programming ideas for developing weave and tangle when he was writing TeX (the program). I want to be able to document the pieces of some complex algorithm without having to see all of the gory details. For instance, I have code that looks like the following. (Note that this is typed on the fly rather than copied from actual source, so there may be typos.) <>= for (i in 1:nSamples) { <> for (j in 1:nChromosomes) { <> <> <> <> <> } } @ Each of the <> is itself a fairly long piece of code defined and documented somewhere else. (Some of them may themselves be written in the same form to reduce the final size of a chunk to something a human has a chance of understanding. That's the difference between weave and tangle in the original implementation.) By blocking expansion, I can focus on the main steps without having them lost in pages and pages of code. Couldn't you achieve the same amount of abstraction using function calls, rather than embedded code chunks? The reader can then see real code, rather than non-code, or meta-code, or whatever. Alternatively, represent the code chunks as R expressions, then evaluate the expressions at the appropriate points. -Matt So I vote strongly for retaining "expand=FALSE". Best, Kevin Duncan Murdoch wrote: On 19/08/2010 4:29 PM, Claudia Beleites wrote: I never used it. I got curious, though. What would be a situation that benefits of this option? When I put it in, I thought it would be for people who were writing about Sweave. Duncan Murdoch Maybe a use case could be found by "brute force" (grep all .Rnw files on CRAN for the option? Claudia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fwd: Re: RSiteSearch, sos, rdocumentation.org, ...?
Would it make sense to recreate the "searchable R help pages" by feeding them all into elasticsearch, which will automatically index them and also provides an extensive (HTTP+JSON-based) API to perform complex searches? On 9/8/2016 10:31 AM, Jonathan Baron wrote: On 09/08/16 07:09, John Merrill wrote: Given Google's commitment to R, I don't think that they'd be at all averse to supporting a custom search box on the package page. It might well be a good thing for "someone" to examine the API for setting up such a page and to investigate how to mark the main CRAN page as searchable. The main CRAN page is not ideal. We need to be able to search the help files. My site has only the html help files for each package (except the ones I use, which are fully installed), so someone should re-create that. The CRAN page has a "Reference manual" in pdf for every package, but the individual functions are not separated. But, yes, Google would work, even for my page. And the sos package would have to be modified for that. As I said, I'm not going to do this. But I would welcome it. Jon --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Milestone: 12,000 packages on CRAN
Cool. Since I got a package accepted overnight, I'm going to take credit for being #12,000 It does look like the apparent exponential growth in packages may have finally come to an end however, collapsing back to something nearly linear. Note that under an exponential growth model, CRAN will reach 15,000 packages at about Thanksgiving 2018. Under a now-linear model, that milestone won't occur until some time in March 2019. Remember, you read it here first. Kevin On 12/15/2017 8:29 AM, Henrik Bengtsson wrote: CRAN reached 12,000 packages [1] today (December 15, 2017). A huge thank you to the CRAN team! Milestones: 2017-12-15 12000 pkgs (+6.1/day over 165 days) 6910 mnts (+3.2/day) 2017-07-04 11000 pkgs (+6.3/day over 159 days) 6377 mnts (+3.3/day) 2017-01-27 1 pkgs (+6.3/day over 158 days) 5845 mnts (+3.5/day) 2016-08-22 9000 pkgs (+5.7/day over 175 days) 5289 mnts (+5.8/day) 2016-02-29 8000 pkgs (+5.0/day over 201 days) 4279 mnts (+0.7/day) 2015-08-12 7000 pkgs (+3.4/day over 287 days) 4130 mnts (+2.4/day) 2014-10-29 6000 pkgs (+3.0/day over 335 days) 3444 mnts (+1.6/day) 2013-11-08 5000 pkgs (+2.7/day over 442 days) 2900 mnts (+1.2/day) 2012-08-23 4000 pkgs (+2.1/day over 469 days) 2350 mnts 2011-05-12 3000 pkgs (+1.7/day over 585 days) 2009-10-04 2000 pkgs (+1.1/day over 906 days) 2007-04-12 1000 pkgs 2004-10-01 500 pkgs 2003-04-01 250 pkgs 2002-09-17 68 pkgs 1997-04-23 12 pkgs These data are for CRAN only [1-14]. There are many more packages elsewhere, e.g. Bioconductor, GitHub, R-Forge etc. [1] https://cran.r-project.org/web/packages/ [2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones [3] https://www.r-pkg.org/ [4] Legacy data collected privately [5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html [6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html [7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html [8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html [9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html [10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html [11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html [12] https://stat.ethz.ch/pipermail/r-devel/2016-February/072388.html [13] https://stat.ethz.ch/pipermail/r-devel/2016-August/073011.html [14] Local CRAN mirror data (https://cran.r-project.org/mirror-howto.html) All the best, Henrik (just one of many) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Not documenting a function and not getting a check error?
I am fairly certain that the check for documentation is really just a check for the presence of the function name in an "alias" line. My circumstantial evidence, from a package in the early stages of development, came from changing the name of a function. I updated everything else (usage, examples, etc.) but forgot to change the alias. Got a warning that the newly named function was not documented. It took me a while to figure out why R CMD check was still complaining. I am also pretty sure that, when looking for help in at least one existing package (can't remember which one), I clicked on the link and got sent to a page that said absolutely nothing about the function I was interested in. On Fri, Jan 6, 2023, 4:48 AM Duncan Murdoch wrote: > On 05/01/2023 10:10 p.m., Deepayan Sarkar wrote: > > On Fri, Jan 6, 2023 at 1:49 AM Duncan Murdoch > wrote: > >> > >> I'm in the process of a fairly large overhaul of the exports from the > >> rgl package, with an aim of simplifying maintenance of the package. > >> During this work, I came across the reverse dependency geomorph that > >> calls the rgl.primitive function. > >> > >> I had forgotten that rgl.primitive was still exported: I've been > >> thinking of it as an internal function for a few years now. I was > >> surprised geomorph was able to call it. > >> > >> Particularly surprising to me was the fact that it is not properly > >> documented. One of the help topics lists it as an alias, but it > >> contains no usage info, and is not mentioned in the .Rd file other than > >> the alias. And yet "R CMD check rgl" has never complained about it. > >> > >> Is this intentional? > > > > Does the Rd file that documents it have \keyword{internal}? These are > > not checked fully (as I realized recently while working on the help > > system), and I guess that's intentional. > > No, not marked internal. Here's a simple example: a package that > exports f and g, and has only one help page: > > - > NAMESPACE: > - > export(f, g) > - > > - > R/source.R: > - > f <- function() "this is f" > > g <- function() "this is g" > - > > - > man/f.Rd: > - > \name{f} > \alias{f} > \alias{g} > \title{ > This is f. > } > \description{ > This does nothing > } > \usage{ > f() > } > - > > > No complaints about the lack of documentation of g. > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] An interesting chat with ChatGPT
Chat bots are like politicians, or talking dogs. The fact that they exist is interesting. But no same person would believe anything they say. On Mon, Feb 13, 2023, 10:58 AM Boris Steipe wrote: > Duncan - > > Indeed, this has now been well documented; I have called these > constructions "Schrödinger Facts", since they arise from a superposition of > truths in the training data that collapse into an untruth when observed. > >https://sentientsyllabus.substack.com/p/chatgpts-achilles-heel > > Now, the curious question is: why can it program. Why is its production of > R-syntax less vulnerable to this effect than a literature quote, or > reference? Maybe it has to do with the fact that in producing correct > syntax there is a ground truth - errors just will not run. But I'm not sure. > > Regardless: it signals that we need a reinvigorated culture of validation. > Actually ChatGPT will happily make test data for you. Whisvh. in a way, it > judt did ;-) > > > Boris > > > > > > On 2023-02-13, at 08:37, Duncan Murdoch > wrote: > > > > I was wondering which anchor would match in the regular expression > "(a)|(b)" if both "a" and "b" matches the target string. What appears to > happen is that the first match wins, and that's how it is documented in > Python, but I was looking for docs on this in R. > > > > As far as I can see, the ?regex help page doesn't resolve this. So just > for fun, I decided to ask ChatGPT, and got an interesting response. I > won't show the whole conversation here, but just some extracts: > > > > --- > > Me: In a regular expression in R, if both "a" and "b" match, which does > "(a)|(b)" match? Please include authoritative references in your answer. > > > > ChatGPT: [long answer about grep] > > > > Me: In grep(), it doesn't matter which of a or b matches. It's only in > sub() or gsub() that it makes a difference. Do you know an authoritative > reference that discusses those functions? > > > > ChatGPT: [long irrelevant answer about sub and gsub, with a reference > to "R Cookbook" by P. Teetor.] > > > > Me: Teetor's book is not authoritative, and the other reference doesn't > discuss ambiguous alternation. > > > > ... finally we get to the interesting part ... > > > > ChatGPT: I apologize for the mistake in my previous answer. > > > > A more authoritative reference for the behavior of regular expressions > in R can be found in the official R documentation. > > > > Regarding the behavior of ambiguous alternation, the regex section of > the manual "An Introduction to R" explains it as follows: > > > > "When several alternatives are possible, the first alternative is > chosen, so it is important to specify alternatives in the order in which > they should be tried." > > > > Here is the reference for the official R documentation: > > > >"An Introduction to R" manual: > https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities > > > > I hope this reference provides the information you were looking for. > > --- > > > > > > All right, an authoritative answer! The only problem is that *it > doesn't exist*. There's no such anchor in that document. R-intro isn't a > very rapidly changing document, so I could look at every change recorded in > svn, and that section has *never* existed. > > > > I guess ChatGPT is a pattern matching machine, and if such a question > had an answer, it would probably be in a location like that. (I would have > guessed a help page like ?regex instead of R-intro, but I had just told it > that answer was wrong.) > > > > Duncan Murdoch > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Boris Steipe MD, PhD > > Professor em. > Department of Biochemistry > Temerty Faculty of Medicine > University of Toronto > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] read.csv
I was horrified when I saw John Weinstein's article about Excel turning gene names into dates. Mainly because I had been complaining about that phenomenon for years, and it never remotely occurred to me that you could get a publication out of it. I eventually rectified the situation by publishing "Blasted Cell Line Names", describing how to match different researchers' recording of the names of cell lines, by applying techniques for DNA or protein sequence alignment. Best, Kevin On Tue, Apr 16, 2024, 4:51 PM Reed A. Cartwright wrote: > Gene names being misinterpreted by spreadsheet software (read.csv is > no different) is a classic issue in bioinformatics. It seems like > every practitioner ends up encountering this issue in due time. E.g. > > https://pubmed.ncbi.nlm.nih.gov/15214961/ > > https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7 > > https://www.nature.com/articles/d41586-021-02211-4 > > > https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates > > > On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao > wrote: > > > > Dear R-developers, > > > > I came to a somewhat unexpected behaviour of read.csv() which is trivial > but worthwhile to note -- my data involves a protein named "1433E" but to > save space I drop the quote so it becomes, > > > > Gene,SNP,prot,log10p > > YWHAE,13:62129097_C_T,1433E,7.35 > > YWHAE,4:72617557_T_TA,1433E,7.73 > > > > Both read.cv() and readr::read_csv() consider prot(ein) name as > (possibly confused by scientific notation) numeric 1433 which only alerts > me when I tried to combine data, > > > > all_data <- data.frame() > > for (protein in proteins[1:7]) > > { > >cat(protein,":\n") > >f <- paste0(protein,".csv") > >if(file.exists(f)) > >{ > > p <- read.csv(f) > > print(p) > > if(nrow(p)>0) all_data <- bind_rows(all_data,p) > >} > > } > > > > proteins[1:7] > > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z" > > > > dplyr::bind_rows() failed to work due to incompatible types nevertheless > rbind() went ahead without warnings. > > > > Best wishes, > > > > > > Jing Hua > > > > __ > > R-devel@r-project.org mailing list > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] suggestion for "sets" tools upgrade
As a mathematician by training (and a former practicing mathematician, both of which qualifications I rarely feel compelled to pull out of the closet), I have to agree with Michael's challenge to the original assertion about the "mathematical concept of sets". Sets are collections of distinct objects (at least in Cantors' original naive definition) and do not have a notion of "duplicate values". In the modern axiomatic definition, one axiom is that "two sets are equal if and only if they contain the same members". To expand on Michael's example, the union of {1, 2} with {1, 3} is {1, 2, 3}, not {1, 2, 1, 3} since there is only one distinct object designated by the value "1". A computer programming language could choose to use the ordered vector (or list) [1, 2, 1, 3] as an internal representation of the union of [1,2], and [1,3], but it would then have to work hard to perform every other meaningful set operation. For instance, the cardinality of the union still has to equal three (not four, which is the length of the list), since there are exactly three distinct objects that are members. And, as Michael points out, the set represented by [1,2,3] has to be equal to the set represented by [1,2,1,3] since they contain exactly the same members. Kevin On 2/6/2014 9:39 PM, R. Michael Weylandt wrote: On Thu, Feb 6, 2014 at 8:31 PM, Carl Witthoft wrote: First, let me apologize in advance if this is the wrong place to submit a suggestion for a change to functions in the base-R package. It never really occurred to me that I'd have an idea worthy of such a change. My idea is to provide an upgrade to all the "sets" tools (intersect, union, setdiff, setequal) that allows the user to apply them in a strictly algebraic style. The current tools, as well documented, remove duplicate values in the input vectors. This can be helpful in stats work, but is inconsistent with the mathematical concept of sets and set measure. No comments about back-compatability concerns, etc. but why do you think this is closer to the "mathematical concept of sets"? As I learned them, sets have no repeats (or order) and other languages with set primitives tend to agree: python> {1,1,2,3} == {1,2,3} True I believe C++ calls what you're looking for a multiset (albeit with a guarantee or orderedness). Cheers, Michael __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The case for freezing CRAN
On 3/20/2014 9:00 AM, Therneau, Terry M., Ph.D. wrote: On 03/20/2014 07:48 AM, Michael Weylandt wrote: On Mar 20, 2014, at 8:19, "Therneau, Terry M., Ph.D." wrote: There is a central assertion to this argument that I don't follow: At the end of the day most published results obtained with R just won't be reproducible. This is a very strong assertion. What is the evidence for it? If I've understood Jeroen correctly, his point might be alternatively phrased as "won't be reproducED" (i.e., end user difficulties, not software availability). Michael That was my point as well. Of the 30+ Sweave documents that I've produced I can't think of one that will change its output with a new version of R. My 0/30 estimate is at odds with the "nearly all" assertion. Perhaps I only do dull things? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel The only concrete example that comes to mind from my own Sweave reports was actually caused by BioConductor and not CRAN. I had a set of analyses that used DNAcopy, and the results changed substantially with a new release of the package in which they changed the default values to the main function call. As a result, I've taken to writing out more of the defaults that I previously just accepted. There have been a few minor issues similar to this one (with changes to parts of the Mclust package ??). So my estimates are somewhat higher than 0/30 but are still a long way from "almost all". Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check for the R code from vignettes
Hi, Unless someone is planning to change Stangle to include inline expressions (which I am *not* advocating), I think that relying on side-effects within an \Sexpr construction is a bad idea. So, my own coding style is to restrict my use of \Sexpr to calls of the form \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less believe that having R CMD check use Stangle and report an error is probably a good thing. There is a completely separate questions about the relationship between Sweave/Stangle or knit/purl and literate programming that is linked to your question about whether to use Stangle on vignettes. The underlying model(s) in R have drifted away from Knuth's original conception, for some good reasons. The original goal of literate programming was to be able to explain the algorithms and data structures in the code to humans. For that purpose, it was important to have named code chunks that you could move around, which would allow you to describe the algorithm starting from a high level overview and then drilling down into the details. From this perspective, "tangle" was critical to being able to reconstruct a program that would compile and run correctly. The vast majority of applications of Sweave/Stangle or knit/purl in modern R have a completely different goal: to produce some sort of document that describes the results of an analysis to a non-programmer or non-statistician. For this goal, "weave" is much more important than "tangle", because the most important aspect is the ability to integrate the results (figures, tables, etc) of running the code into the document that get passed off to the person for whom the analysis was prepared. As a result, the number of times in my daily work that I need to explicitly invoke Stangle (or purl) explicitly is many orders of magnitude smaller than the number of times that I invoke Sweave (or knitr). -- Kevin On 5/30/2014 1:04 AM, Yihui Xie wrote: Hi, Recently I saw a couple of cases in which the package vignettes were somewhat complicated so that Stangle() (or knitr::purl() or other tangling functions) can fail to produce the exact R code that is executed by the weaving function Sweave() (or knitr::knit(), ...). For example, this is a valid document that can pass the weaving process but cannot generate a valid R script to be source()d: \documentclass{article} \begin{document} Assign 1 to x: \Sexpr{x <- 1} <<>>= x + 1 @ \end{document} That is because the inline R code is not written to the R script during the tangling process. When an R package vignette contains inline R code expressions that have significant side effects, R CMD check can fail because the tangled output is not correct. What I showed here is only a trivial example, and I have seen two packages that have more complicated scenarios than this. Anyway, the key thing that I want to discuss here is, since the R code in the vignette has been executed once during the weaving process, does it make much sense to execute the code generated from the tangle function? In other words, if the weaving process has succeeded, is it necessary to source() the R script again? The two options here are: 1. Do not check the R code from vignettes; 2. Or fix the tangle function so that it produces exactly what was executed in the weaving process. If this is done, I'm back to my previous question: does it make sense to run the code twice? To push this a little further, personally I do not quite appreciate literate programming in R as two separate steps, namely weave and tangle. In particular, I do not see the value of tangle, considering Sweave() (or knitr::knit()) as the new "source()". Therefore eventually I tend to just drop tangle, but perhaps I missed something here, and I'd like to hear what other people think about it. Regards, Yihui -- Yihui Xie Web: http://yihui.name __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R CMD check for the R code from vignettes
"Doc, it hurts when I do this." "So, don't do that." If no one in R Core does anything about this issue (in terms of changing Sweave or Stangle), then the solution still remains very simple. Authors of vignettes should avoid using anything in \Sexpr{} that has a side effect. As long as they do that, the code will tangle correctly and produce the same result as Sweave. R CMD check already detects other things which may or may not be outright errors but are viewed as bad practice. I think it is bad practice to put code with side effects into an Sexpr. So, I don't do that. If I did do that accidentally, I really wouldn't mind if R CMD check warned me abut it. -- Kevin On 6/2/2014 6:28 PM, Gavin Simpson wrote: On 2 June 2014 15:59, Duncan Murdoch wrote: On 03/06/2014, 4:12 AM, Gavin Simpson wrote: On 2 June 2014 11:44, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: Several of us have told you the real harm: it means that users can't easily extract a script that replicates the computations done in the vignette. That's a useful thing to be able to do. Isn't the issue here that `tangle()` doesn't, currently, "extract a script that replicates the computations done in the vignette", but rather does so only partially? No, I think the issue is that some people don't want to have to guarantee that the tangled source produces the same results. R doesn't guarantee it, it is up to the author to do so. I think those issues have become conflated on this thread; R CMD check issues raised the problem that side effects in \Sexpr may lead to tangle() generating an R script that may not work or do so only incorrectly. Whatever the ensuing discussion; the above issue is not ideal and as you mention below it could be solved by not allowing side effects in \Sexpr, fixing tangle so that \Sexpr is recorded, or some other workaround. People seem to be arguing across one another throughout this thread. Yihui has identified an infelicity in the tangle implementation. Turning off tangling + sourcing in R CMD check may not be a desirable solution, so if the aim is to extract R code to replicate the computations in the vignette, tangle() needs to be modified to allow for inclusion (optional) of \Sexpr "chunks". That's one solution, and the other is to limit \Sexpr code to things with no side effects, as Sweave was originally designed. That would be perfectly fine also; clarifying usage etc helps and whilst it may inconvenience those authors that exploited the ambiguity, there is a solution now that anyone can write their own vignette drivers. To move this thread forwards, would contributions that added this optional feature to tangle() be considered by R Core? If so, perhaps those affected by the current infelicity might wish to propose patches to the R sources which implement a solution? As I said before, I'm more sympathetic to that solution than to dropping the requirement that tangled code should work. I think the changes to base R need only be minimal: only an extra argument to the driver code for the tangling. Users who want to use this feature should write their own (or use someone else's if they don't' mind an extra dependency) as a "non-Sweave vignette driver", whose implementation is to call Stangle with the non-default parameter setting. Duncan Murdoch I agree, and given that the changes to base R would be minimal and yet solve the problem for those wanting to allow & tangle side effects in \Sexpr (or allow them to solve it with a driver) it is disappointing to note i) the length of this thread (!) and ii) the often irrelevant arguments that some contributors have offered. (Do note this is not directed specifically at you Duncan.) It has not gone without notice of late the increasing regularity with which threads here descend into irrelevant or antagonistic directions. G __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony
Having finally found some free time, I was going to use it to update a bunch of R packages from 2.15 to 3.0. I am running Windows 7, 64-bit professional. This is on a brand-new laptop using vanilla settings when installing the operating system. Problem 1: I installed R3.0 to the default location (C:\Program FIles\R\R-3.0.0). The first thing I tried to do was install BioConductor. This failed (permission denied). Thinking that this might be a BioConductor problem, I then tried to install a (semirandom) package from CRAN. This also failed. In both cases, when using the GUI, the error message is almost incomprehensible. You get a pop-up window that *only* says "Do you want to use a private library instead?" Since this wasn't what I wanted to do I said "no". Only after the pop-up closes does the command window print the error message telling me that permission was denied for R to write to its own library location. Dumb Fix to Problem 1: So, I uninstalled R and then reinstalled to a nonstandard location (C:\R\R-3.0.0). Now I can successfully install packages from CRAN and BioConductor (hooray!). But I run directly into: Problem 2: Emacs Speaks Statistics (ESS) can no longer find the R binary. When R was installed in the default location, ESS worked. When R 2.15 (or earlier) was installed in the same nonstandard location, I could get ESS to find the R binaries by including (setq ess-directory-containing-r "C:") in my .emacs file, but that no longer works. Dumb Fix to Problem 2: Hack into ess-site.el and put the complete, explicit path to the correct binary into (setq-default inferior-R-program-name 'FULLPATHHERE") which will break as soon as I upgrade R (assuming I am foolish enough to ever do that again). Now I am ready to rebuild my R packages. I have this nice perl script that goes through the following procedure: 1. Set the path to include the correct Rtools directory. (For reasons that Gabor Grothendieck has pointed out previously, this is not a permanent part of the path since doing so would override some built-in Windows commands.) 2. Build a source tarball via R CMD build $package 3. Build a Windows binary version (as a zip file) via R CMD INSTALL --build $tarball 4. Check the package via R CMD check --as-cran $tarball 5. Install the package via R CMD INSTALL $tarball Problem 3: Step 3 fails, withe the error message "Running 'zip' failed". Dumb Fix to Problem 3: Install the GnbuWin32 version of zip, and make sure that its location is earlier in ter path than the version that comes with Rtools. Problem 4: Step 4 fails when running the test scripts that accompany the package. The error message is the semicryptic "cannot open file 'c:\Users\krc\AppData\Local\Temp\Rtmp' Permission denied" Dumb Fix to Problem 4: Write this email message and hope someone with even more patience than I have has already found a better way to get all this stuff to work. Tired of spinning my wheels, Kevin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony
On 4/20/2013 12:54 PM, Duncan Murdoch wrote: On 13-04-20 12:30 PM, Gabor Grothendieck wrote: On Sat, Apr 20, 2013 at 11:49 AM, Duncan Murdoch wrote: On 13-04-20 11:09 AM, Gabor Grothendieck wrote: On Sat, Apr 20, 2013 at 10:45 AM, Hadley Wickham wrote: Just curious: how often do you use the Windows find command? We have put instructions in place for people to run the install process with a renamed Rtools find command (which I think is the only conflict). The issue is that more users who want to use the command line commands are familiar with the Unix variant (which came first, by the way) than the Windows one, so renaming the Rtools one would cause trouble for more people. Its not just find - its also sort. And really R has no business clobbering built in Windows commands. This is just wrong and really causes anyone who does any significant amount of Windows batch programming (or uses batch programs of any complexity) endless problems. Which is presumably why Rtools doesn't modify the path by default. Better solutions (e.g. Rstudio and devtools) temporarily set the path on when you're calling R CMD *. I am well aware of the various kludges to address this including my own batchfiles ( http://batchfiles.googlecode.com ) which handles this by temporarily changing the path as well; however, the real problem is that Rtools does not play nice with Windows and that needs to be addressed directly. It has been. You ignored it. Duncan Murdoch If some change to address this has been made that would be great but there is no mention of it on the Rtools page in the change history section (the only documented change relates to the png/tiff/jpeg libraries), there was no announcement that I saw and Rtools\bin still contains find and sort so what specifically is the change? It's not a change to Rtools, it's a change is to the build system in R: it allows you to rename sort or find in your own copy of Rtools, and R will use whatever you specify. You were informed of this when I did it in 2007, and I've mentioned it when the topic comes up here, most recently in the message quoted above. That's a long time ago, so I don't remember if you tried it then, but I've never heard a complaint from anyone else that it doesn't work. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel How do you do that? (More explicitly, what steps would I have to take to redefine things like find.exe and sort.exe in Rtools so that R would know how to find them and use them? I can't figure that out from the earlier parts of these messages.) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony
On 4/20/2013 1:21 PM, Duncan Murdoch wrote: On 13-04-20 2:02 PM, Kevin Coombes wrote: On 4/20/2013 12:54 PM, Duncan Murdoch wrote: It's not a change to Rtools, it's a change is to the build system in R: it allows you to rename sort or find in your own copy of Rtools, and R will use whatever you specify. You were informed of this when I did it in 2007, and I've mentioned it when the topic comes up here, most recently in the message quoted above. That's a long time ago, so I don't remember if you tried it then, but I've never heard a complaint from anyone else that it doesn't work. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel How do you do that? (More explicitly, what steps would I have to take to redefine things like find.exe and sort.exe in Rtools so that R would know how to find them and use them? I can't figure that out from the earlier parts of these messages.) Rename them to whatever you want in the Rtools install, then edit the definitions. I think currently they are in src/gnuwin32/Makefile and src/gnuwin32/MkRules (one in each), but I'd suggest you just search files named M* for the strings "sort" and "find", in case I've got it wrong, or it has changed since the last time I looked. If you try to build R itself rather than just packages, you may need to do more edits, because some of the makefiles for things like the jpeg libraries weren't written by us, and may have these commands hard-coded. Duncan Murdoch To most Windows users, the "Rtools install" would seem to refer to getting the bundled Rtools30.exe from the CRAN web site, double-clicking on it, selecting the options form the GUI windows that appear, and clicking "install". There is no option in this procedure to change the names of find or sort. As far as I can tell, the steps you are recommending take place in an earlier build step. This would require the user who wants to do this to rebuild Rtools in its entirety, which is more trouble than it is likely to be worth. Especially when you can avoid the problem by using your own batch script or perl script to reset the path on those relatively rare occasions when you need to use Rtools. Since buiilding Rtools for a Windows machine is something than CRAN does on a regular basis, why can't they just change the names there (and not bother the UNIX users, and probably not even bother the UNIX users who find themselves banished to the Windows wilderness). Just call them "unixfind" and "unixsort" and everyone will be able to figure it out __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 3.0, Rtools3.0,l Windows7 64-bit, and permission agony
Here's the short answer: Whatever you used to do should still work. I started this thread, not knowing that it was going to get sucked into a whirlpool on the fringes of an operating system religious war. My sincerest apologies to everyone who has gotten confused as a consequence. I only ran into problem because I installed R 3.0 and Rtools 3.0 on a new machine, and accidentally put Rtools in a different location from where it used to reside on the machines I used for R 2.whatever. And so the scripts I used to build packages no longer worked because the path was wrong. If you can avoid doing something silly like that, then your old methods for building and maintaining packages should work the same way they always did. On 4/21/2013 8:22 PM, steven mosher wrote: Well, color me confused as heck. I've upgraded to R 3.0 so that I can bring my packages up to date, but the instructions surrounding Rtools30 are not a model of clarity. On Sun, Apr 21, 2013 at 4:04 PM, Gabor Grothendieck wrote: On Sun, Apr 21, 2013 at 6:17 PM, Henrik Bengtsson wrote: I (as well) keep a specific Rsetup.bat file for launching Windows cmd.exe with the proper PATH etc setup for build R packages etc. It's only after this thread I gave it a second thought; you can indeed temporarily set the PATH via ~/.Rprofile or ~/.Renviron, which *are* processed at the very beginning when calling 'R CMD ...'. EXAMPLE WITH .Rprofile: ## ~/.Rprofile (e.g. C:/User/foo/.Rprofile): path <- unlist(strsplit(Sys.getenv("PATH"), ";")); path <- c("C:\\Rtools\\bin", "C:\\Rtools\\gcc-4.6.3\\bin", path); Sys.setenv("PATH"=paste(unique(path), collapse=";")); ## DISABLED: x:\> R --no-init-file CMD INSTALL matrixStats_0.6.2.tar.gz * installing to library 'C:/Users/hb/R/win-library/3.0' * installing *source* package 'matrixStats' ... ** libs *** arch - i386 ERROR: compilation failed for package 'matrixStats' * removing 'C:/Users/hb/R/win-library/3.0/matrixStats' ## ENABLED: x:\> R CMD INSTALL matrixStats_0.6.2.tar.gz * installing to library 'C:/Users/hb/R/win-library/3.0' * installing *source* package 'matrixStats' ... ** libs *** arch - i386 gcc -m32 -I"C:/PROGRA~1/R/R-3.0.0patched/include" -DNDEBUG [...] [...] * DONE (matrixStats) EXAMPLE WITH .Renviron: ## ~/.Renviron (e.g. C:/User/foo/.Renviron): # Backslashes are preserved iff put within quotes PATH="C:\Rtools\bin;C:\Rtools\gcc-4.6.3\bin;${PATH}" x:\> R --no-environ CMD INSTALL matrixStats_0.6.2.tar.gz => fails x:\> R CMD INSTALL matrixStats_0.6.2.tar.gz => works As long as R is on the PATH, the above either of the approaches removes the need to add Rtools to the PATH via a BAT file and it won't clutter up your PATH. This begs the question (as somewhat already proposed), instead of users/developers doing this manually, would it be possible to have 'R CMD ...' to locate add Rtools to the PATH internally. That would certainly lower the barriers for newcomers to install packages from source that need compilation. Obviously, this doesn't make the tools (e.g. make) in Rtools available outside of R, it does not allow you to build R itself from source, but it does cover the very common use cases of calling 'R CMD build/INSTALL/check/...'. /Henrik PS. Hadley, is this what you meant when you wrote "Better solutions (e.g. Rstudio and devtools) temporarily set the path on when you're calling R CMD *.", or those approaches are only when you call 'R CMD' from the R prompt? I believe the latter, but I just want to make sure I didn't miss something. That seems like a reasonable approach although the code shown does entail more setup and ongoing maintenance by the user than R.bat which does not require that the user edit any files and additionally locates R itself and has many other features. Also, because R.bat locates R itself it can be useful even if you are not doing development. On the other hand if you are looking to do development strictly from within R then devtools is already developed. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Defining a method in two packages
Wouldn't it make sense to simply create a "ranef" package whose only role in the universe is to create the generic function that lme4, coxme, and anyone else who needs it could just import, without getting tons of additional and (depending on the application) irrelevant code? Best, Kevin Uwe Ligges wrote: On 08.03.2010 17:16, Terry Therneau wrote: Brian& Uwe, Thanks for responding. Let me see if I can refine the query and move towards a solution. From Uwe: Of course, after loading lme4, you can still use the ranef from coxme: coxme::ranef(fit) Of course, but I'm interested in other users (as well as myself) and prefer to avoid the 'secret handshake' form of a call. In your own package, you could simply import the generic from coxme. I don't understand this. You could import the generic from the other package and define your won methods for it in order to make dispatching work correctly. From Brian: My solution would though be NOT to reuse a name that is already established in another package (nlme has used it for many years). The design problem is that generic foo() in package B might have nothing to do with foo() in package A. When it does, we expect B ... I disagree completely. It is precisely because of nlme and lmer prominence that I want to reprise their methods: my users have a much better chance of remembering how to do things. If I followed this logic to its conclusion one should never define a print() method because it might conflict with the base definition. The consequence is that I am under obligation to NOT make my method something different than Doug's, if I want to satisfy the goal of user level consistency. Several aspects of coxme purposefully mimic lmer, even in cases (such as print.coxme) where his layout is not precisely what I would have chosen. Then please folow my suggestion and import the generic from the packages mentioned above in your namespace. Then you could extend it by your own methods wihtout having to define another generic of the same name and avoid the conflicts. I really do not want to require lme4 just to pick up the methods definition. It's a huge package, and there is no code in common. Both packages work very hard to be efficient via sparse matrix methods, but the actual details are completely different due to the mathematical structure of our underlying likelihoods. Use of both in the same analysis would be rare, so my issue won't be common. Well, then things become complicated if not impossible. The situation can be alleviated by making S3 methods visible. Thus if coxme exported coxme:::ranef.coxme and lme4 had a default method ranef<- function (object, ...) UseMethod("ranef") I have no objection to exporting my method. If a joint change to lme4 and coxme is the best solution, I will take the discussion off line with Doug. Is this the best way forward? I think so. Best wishes, uwe Terry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] library(): load library from a specified location
If we're counting votes, then I vote "no". And I'd be willing to help stuff the ballot box and even volunteer to count the final tallies in order to make sure that the "no" side wins. I understand the logical argument in favor of "use" or "require" or "borrow". I am not swayed. Backwards compatibility matters. A lot. This proposed change breaks an unfathomably large amount of existing code. With zero gain in terms of performance or reliability. It probably does not even help new users just learning the language, since they still have to be confused about why there are two functions that do almost the same thing in terms of loading packages. Even with a "long deprecation" time, I don't see the value. Just train yourself to interpret > library(aPackage) as the syntactic form of the thing in R that has the semantic meaning: "go to the library and bring back aPackage". Curmudgeonly, Kevin Martin Maechler wrote: [ re-diverted to R-devel ] Barry Rowlingson on Tue, 30 Mar 2010 20:15:00 +0100 writes: > On Tue, Mar 30, 2010 at 7:58 PM, Rolf Turner > wrote: >> But ***please*** say ``load *package*'', not ``load >> library''. The *location* (collection of packages) from >> which you wish to load the given package is the >> ``library''. > Anyone vote for deprecating the library() function and > renaming it use() or requiring require() instead? I'm voting pro. We (R core) had planned to do this, probably about 5 to eight years ago, then started discussing about possible features of the new use() function, of making a package into an "object" that you'd want to interrogate, ... and then probably got tired ;-) With the many moons passed, I'd now tend to *not* add features, but really renamed 'library' to 'use' and create a library() with a deprecation message which then simply calls use()... and yes, I'd allow a very exceptionally long deprecation period of two to five years before making library() defunct. Martin > I mean, when I go get a book out of our library, I don't > say "I'd like to library Case Studies in Spatial Point > Process Modelling". Maybe we should use > 'borrow(package)'? Then it might be clear you were getting > a package from a library, and that you magically put it > back at the end of your R session > Slightly silly mood this evening > Barry > __ > r-h...@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do > read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel