[Rd] Accessing ENVSXP and CLOSXP while processing parsed R code
Hello Guys, Following up my earlier mail where I am trying to write an alternative front-end for R, I had a question about accessing the closures and environments in R code. Here's the function taken and modified a little from "*Lexical Scope and Statistical Computing*" = f<-function(){ Rmlfun<-function(x){ sumx <- sum(x) n <- length(x) function(mu) n*log(mu)-mu*sumx } efun <-Rmlfun(1:10) y1 <- efun(3) print(y1) efun2 <- Rmlfun(20:30) y2 <- efun2(3) print(y2) } = Now assignment efun <-Rmlfun(1:10) creates a closure where *function(mu) n*log(mu)-mu*sumx *is returned and *sumx* and *n *are added to the existing environment. I can parse the code using *PROTECT(e = R_ParseVector(tmp,1,&status,R_NilValue));* where tmp is the buffer containing the same source. I can walk the resultant parser output and build and alternative Abstract syntax tree(AST). I would like to include the information about closure/environments in my AST so that I can possibly do some optimizations. My question is, how can I get hold of this information? One thing I noticed while 'walking' through the parser output, I never encounter a CLOSXP (which I check using TYPEOF()) , even though in the above code, closure is created. Is it the case that this information is meant just for the internal "eval*" function and not exposed application writers? Thanks, Rob [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Accessing ENVSXP and CLOSXP while processing parsed R code
On 11-11-07 5:24 AM, Rob Anderson wrote: Hello Guys, Following up my earlier mail where I am trying to write an alternative front-end for R, I had a question about accessing the closures and environments in R code. Here's the function taken and modified a little from "*Lexical Scope and Statistical Computing*" = f<-function(){ Rmlfun<-function(x){ sumx<- sum(x) n<- length(x) function(mu) n*log(mu)-mu*sumx } efun<-Rmlfun(1:10) y1<- efun(3) print(y1) efun2<- Rmlfun(20:30) y2<- efun2(3) print(y2) } = Now assignment efun<-Rmlfun(1:10) creates a closure where *function(mu) n*log(mu)-mu*sumx *is returned and *sumx* and *n *are added to the existing environment. That's not correct. When you call Rmlfun, an evaluation frame (environment) is created. It contains the argument x. Then sumx and n are added to it. Then the anonymous closure is created, with body n*log(mu)-mu*sumx, and the closure's environment is the evaluation frame from the call to Rmlfun. I can parse the code using *PROTECT(e = R_ParseVector(tmp,1,&status,R_NilValue));* where tmp is the buffer containing the same source. I can walk the resultant parser output and build and alternative Abstract syntax tree(AST). I would like to include the information about closure/environments in my AST so that I can possibly do some optimizations. It's not there, except potentially. When you call the function "function" to create the closure, that's when the closure is created. That doesn't happen at parse time. Rmlfun is created when you evaluate f() and then the anonymous function is created when you call Rmlfun() within it. My question is, how can I get hold of this information? One thing I noticed while 'walking' through the parser output, I never encounter a CLOSXP (which I check using TYPEOF()) , even though in the above code, closure is created. Is it the case that this information is meant just for the internal "eval*" function and not exposed application writers? No, there's nothing hidden, it just didn't exist at the time you were looking for it. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Efficiency of factor objects
Stavros Macrakis alum.mit.edu> writes: > > data.table certainly has some useful mechanisms, and I've been > experimenting with it as an implementation mechanism, though it's not a > drop-in substitute for factors. Also, though it is efficient for set > operations between small sets and large sets, it is not very efficient for > operations between two large sets As a general statement that could do with some clarification ;) data.table likes keys consisting of multiple ordered columns, e.g. (id,date). It is (I believe) efficient for joining two large 2+ column keyed data sets because the upper bound of each row's one-sided binary search is localised in that case (by group of the previous key column). As I understand it, Stavros has a different type of 'two large datasets' : English language website data. Each set is one large vector of uniformly distributed unique strings. That appears to be quite a different problem to multiple columns of many times duplicated data. Matthew > Thanks everyone, and if you do come across a relevant CRAN package, I'd be > very interested in hearing about it. > > -s > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Efficiency of factor objects
Le dimanche 06 novembre 2011 à 19:00 -0500, Stavros Macrakis a écrit : > Milan, Jeff, Patrick, > > > Thank you for your comments and suggestions. > > > Milan, > > > This is far from a "completely theoretical problem". I am performing > text analytics on a corpus of about 2m documents. There are tens of > thousands of distinct words (lemmata). It seems to me that the > natural representation of words is as an "enumeration type" -- in R > terms, a "factor". Interesting. What does your data look like? I've used the tm package, and for me there are only two representations of text corpora: a list of texts, which are basically a character string with attributes; a document-term matrix, with documents as rows, terms as columns, and counts at their intersection. So I wonder how you're using factors. Do you have a factor containing words for each text? > Why do I think factors are the "natural way" of representing such > things? Because for most kinds of analysis, only their identity > matters (not their spelling as words), but the human user would like > to see names, not numbers. That is pretty much the definition of an > enumeration type. In terms of R implementation, R is very efficient in > dealing with integer identities and indexing (e.g. tabulate) and not > very efficient in dealing with character identities -- indeed, 'table' > first converts strings into factors. Of course I could represent the > lemmata as integers, and perform the translation between integers and > strings myself, but that would just be duplicating the function of an > enumeration type. My point was that the efficiency of factors is due to the redundancy of their levels. You usually have very few levels, and many observations (in my work, often 10 levels and 100,000s of observations). If each level only appears a few times on average, you don't save that much memory by using a factor. Since you have a real use case for that, I withdraw my criticism of your suggestion being useless. ;-) But I'm still not sure R core devs would like to follow it, since your application can be considered non-standard, and worth a specialized class. Cheers __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN: How to list a non-Sweave doc under "Vignettes:" on package page?
On 11/6/2011 7:41 PM, Henrik Bengtsson wrote: Hi, is it possible to have non-Sweave vignettes(*) in inst/doc/ be listed under 'Downloads' on CRAN package pages? For instance, in my R.rsp package I have a inst/doc/report.pdf (part of the source *.tar.gz) that is not detected/listed. The PDF is not based on a Sweave vignette but an *.tex.rsp vignette that is dynamically created via inst/doc/Makefile. It is listed (*) BTW, can the term "vignette" be used for any inst/doc/ document, or should it be reserved for Sweave+LaTeX-based documents? I have a related problem/question and a request to R-Core to consider relaxing the requirements for vignettes when, for one reason or another, they cannot be built entirely via Sweave. In such cases, perhaps package authors can provide alternative metadata in the form of an 00index.html or something similar to allow such vignettes to be more visible. Those who face this problem would then be able to figure out a Makefile or manual way to maintain the metadata file. What would be required to implement this? In my case, my Guerry package has a vignette, inst/doc/MultiSpat.pdf, originally built entirely with Sweave. However, the vignette require()d a package only on R-Forge, which the author does not wish to release to CRAN. At some point, ~ R 2.10, this triggered a WARNING/ERROR from the CRAN check daemon, in spite of the fact that the vignette .Rnw file contained the following hack designed to make sure that all necessary packages were available anywhere: \subsection{Installation and loading of required packages} Several packages must be installed to run the different analyses: <=FALSE, width=7, height=7>>= pkg <- c("maptools","spdep","ade4","Guerry","spacemakeR") inst.pkg <- row.names(installed.packages()) pkg2inst <- pmatch(pkg,inst.pkg) if(any(is.na(pkg2inst[1:4]))) install.packages(pkg[which(is.na(pkg2inst[1:4]))],repos="http://cran.at.r-project.org";) if(is.na(pkg2inst[5])) install.packages("spacemakeR", repos="http://R-Forge.R-project.org";) library(maptools) library(ade4) library(spdep) library(spacemakeR) library(Guerry) @ However, this hack was deemed unacceptable for a CRAN package vignette. In the end, the only solution that would satisfy the CRAN check daemon was to delete the source inst/doc/MultiSpat.Rnw file from the package. Consequently, the .pdf vignette remains in the package, but it is not listed as a vignette on CRAN, nor found via vignette() best, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Efficiency of factor objects
Matthew, Yes, the case I am thinking of is a 1-column key; sorry for the overgeneralization. I haven't thought much about the multi-column key case. -s On Mon, Nov 7, 2011 at 12:48, Matthew Dowle wrote: > Stavros Macrakis alum.mit.edu> writes: > > > > data.table certainly has some useful mechanisms, and I've been > > experimenting with it as an implementation mechanism, though it's not a > > drop-in substitute for factors. Also, though it is efficient for set > > operations between small sets and large sets, it is not very efficient > for > > operations between two large sets > > As a general statement that could do with some clarification ;) data.table > likes keys consisting of multiple ordered columns, e.g. (id,date). It is (I > believe) efficient for joining two large 2+ column keyed data sets because > the > upper bound of each row's one-sided binary search is localised in that > case (by > group of the previous key column). > > As I understand it, Stavros has a different type of 'two large datasets' : > English language website data. Each set is one large vector of uniformly > distributed unique strings. That appears to be quite a different problem to > multiple columns of many times duplicated data. > > Matthew > > > Thanks everyone, and if you do come across a relevant CRAN package, I'd > be > > very interested in hearing about it. > > > > -s > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN: How to list a non-Sweave doc under "Vignettes:" on package page?
G'day Henrik, On Sun, 6 Nov 2011 16:41:22 -0800 Henrik Bengtsson wrote: > is it possible to have non-Sweave vignettes(*) in inst/doc/ be listed > under 'Downloads' on CRAN package pages? As far as I know, only by a little trick. Create an Sweave based vignette that uses the pdfpages package to include the .pdf file that you want to have listed. This dummy vignette should then be listed on CRAN. See the lasso2 package for an example. The vignette in inst/doc/ in that package is actually a bit more complicated than necessary. As I think there is no point of having two nearly identical copies of PDF files in a package, I use .buildignores to have the original PDF file not included in the source package. This started to create a problem when R decided to rebuild vignettes during the checking process and pdfpages decided to hang if the PDF file to be included was missing. HTH. Cheers, Berwin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN: How to list a non-Sweave doc under "Vignettes:" on package page?
> How CRAN behaves and how the help package system behaves may be two > different problems. My question is specifically on how CRAN works. > > To have inst/doc/ documents to be listed on the package's help page, > you can add an inst/doc/index.html file, cf. Section 'Writing package > vignettes' in 'Writing R Extensions'. You can use the following > index.html file as a template: But they still won't be listed under vignette() - I see this solution as a temporary hack until non-Sweave vignettes become first class citizens. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel