Re: [Rd] modifying large R objects in place
On Fri, Sep 28, 2007 at 08:14:45AM -0500, Luke Tierney wrote: [...] > [...] A related issue is that user-defined > assignment functions always see a NAMED of 2 and hence cannot modify > in place. We've been trying to come up with a reasonable solution to > this, so far without success but I'm moderately hopeful. If a user-defined function evaluates its body in its parent environment using the suggestion of Peter Dalgaard eval.parent(substitute( )), then NAMED attribute is not increased and the function may do in place modifications. On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote: > Longer-term, I still have some hope for better reference counting, but > the semantics of environments make it really ugly -- an environment can > contain an object that contains the environment, a simple example being > > f <- function() >g <- function() 0 > f() > On Fri, Sep 28, 2007 at 09:46:39AM -0400, Duncan Murdoch wrote: > f has no input; it's output is the function g, whose environment is the > evaluation environment of f. g is never used, but it is returned as the > value of f. Thus we have the loop: > > g refers to the environment. > the environment contains g. > > Even though the result of f() was never saved, two things (the > environment and g) got created and each would have non-zero reference > count. Thank you very much for the example and explanation. I would not guess, something like this is possible, but now I see that it may, in fact, be quite common. For example something <- function() { a <- 1:5 b <- 6:10 c <- c("a","a","b","b","b") mf <- model.frame(c ~ a + b) mf } mf1 <- something() e1 <- attr(attr(mf1,"terms"),".Environment") mf2 <- eval(expression(mf),envir=e1) e2 <- attr(attr(mf2,"terms"),".Environment") print(identical(e1,e2)) # TRUE seems to be a similar situation. Here, the references go in the sequence mf1 -> e1 -> mf2 -> e1. I think that already mf2 is the same as mf1, but I do not know how to demonstrate this. However, both mf1 and mf2 refer to the same environment, so e1 -> mf2 -> e1 is a cycle for sure. On Fri, Sep 28, 2007 at 08:14:45AM -0500, Luke Tierney wrote: > >If yes, is it possible during gc() to determine also cases, > >when NAMED may be dropped from 2 to 1? How much would this increase > >the complexity of gc()? > > Probably not impossible but would be a fair bit of work with probably > not much gain as the NAMED values would still be high until the next > gc of the appropriate level, which will probably be a fair time as an > object being modified is likely to be older, but the interval in which > there would be a benefit is short. On Fri, Sep 28, 2007 at 04:36:40PM +0100, Prof Brian Ripley wrote: [...] > On Fri, 28 Sep 2007, Luke Tierney wrote: [...] > >approach may be possible. A related issue is that user-defined > >assignment functions always see a NAMED of 2 and hence cannot modify > >in place. We've been trying to come up with a reasonable solution to > >this, so far without success but I'm moderately hopeful. > > I am not persuaded that the difference between NAMED=1/2 makes much > difference in general use of R, and I recall Ross saying that he no longer > believed that this was a worthwhile optimization. It's not just > 'user-defined' replacement functions, but also all the system-defined > closures (including all methods for the generic replacement functions > which are primitive) that are unable to benefit from it. I am thinking about the following situation. The user creates a large matrix A and then performs a sequence of operations on it. Some of the operations scan the matrix in a read-only manner (calculating e.g. some summaries), some operations are top level commands, which modify the matrix itself. I do not argue that such a sequence of operations should be done in place by default. However, I think that R should provide tools, which allow to do this in place, if the user does some extra work. If the matrix is really large, then in place operations are not only more space efficient, but also more time efficient. Using the information from the current thread, there are two possible approaches to reach this. 1. The initial matrix should not be generated by "matrix" function due to the observation by Henrik Bengtsson (this is the issue with dimnames). The matrix may be initiated using e.g. .Internal(matrix(data, nrow, ncol, byrow)) The matrix should not be scanned using an R function, which evaluates its body in its own enviroment. This includes functions nrow, ncol, colSums, rowSums and probaly more. The matrix may be scanned by functions, which use eval.parent(substitute( )) and avoid giving the matrix a new name. The user may prepare versions of nrow, ncol, colSums, rowSums, etc. with this property. 2. If NAMED attribute of A may be decreased from 2 to 1 during an operation similar to garbage collection (if A is not in a refere
[Rd] R-Server remotecontrolled via browser-GUI
hi jeff, i have read your paper from 2005 and your rapache solution sounds good. i was wondering if u did sth about the state problem... you should put a changelog on your website. what changes will come with 1.0? and what is brew exactly for? it is like a tuned "cat", mixing r-output and text, right? so if i build a gui, brew could generate the html. but i dont'have to use brew, do i? what are the advantages of using brew with rapache? what do you think of openstatserver btw? i think there are lots of interesting and promising approaches around in R-community. but as with all OSS, same here. many people working on slightly different solutions for the same problem. and none of the solutions is feature complete, some are not even actively developed and only a few are to be seen as stable or beyond beta stadium. well, i work almost only with OSS, so i got used to it :-) it's just difficult to navigate thru the possibilities without spending too much time on recherche. have a nice day, Josuah __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CHAR () and Rmpi
Hao Yu, I spot two types of problematic code. Certainly the memcpy in conversions.c:54 and 56 will cause problems, but I'm not sure whether those functions are actually used? The second paradigm is, e.g., Rmpi.c:561 MPI_Recv(CHAR(STRING_ELT(sexp_data,i)), slen,MPI_CHAR,source,tag, comm[commn],&status[statusn]); where the first argument to MPI_Recv is a buffer that MPI_Recv will fill. sexp_data is a user-supplied character vector. A not-clever solution creates a temporary buffer via R_alloc (for garbage-collected memory) or R_Calloc (for user-managed memory, probably appropriate in a loop where you'd like to reuse the buffer), passes the buffer to MPI_Recv, and then SET_STRING_ELT with the now filled temporary buffer converted to a CHARSXP with mkChar. I think this is backward compatible. The user-supplied character vector has gone to waste, used only to pass in the length of the expected string. mkChar will copy the temporary buffer (unless an identical CHARSXP already exists), so that there are potentially three memory allocations per string! I suspect most users rely on higher-level access (mpi.par*Apply, mpi.*.Robj, etc) where this inefficiency is not important or can be addressed without modifying the public interface. Martin Prof Brian Ripley <[EMAIL PROTECTED]> writes: > I'm not sure what your sticking point here is. If mpi does not modify > data in a (char *) pointer, then that really is a (const char *) pointer > and the headers are being unhelpful in not telling the compiler that > the data are constant. > > If that is the case you need to use casts to (char *) and the following > private define may be useful to you: > > #define CHAR_RW(x) ((char *) CHAR(x)) > > > However, you ask > >> Is there an easy way to get a char pointer to STRING_ELT((sexp_rdata),0) >> and is also backward compatible to old R versions. > > and the answer is that there is no such way, since (const char *) and > (char *) are not the same thing and any package that wants to alter the > contents of a string element needs to create a new CHARSXP to be that > element. > > > BTW, you still have not changed Rmpi to remove the configure problems on > 64-bit systems (including assuming libs are in /usr/lib not /usr/lib64) I > pointed out a long time ago. > > > On Fri, 28 Sep 2007, Hao Yu wrote: > >> Hi. I am the maintainer of Rmpi package. Now I have a problem regarding >> the change of CHAR () in R 2.6.0. According to R 2.6.0 NEWS: >> *** >> CHAR() now returns (const char *) since CHARSXPs should no >>longer be modified in place. This change allows compilers to >>warn or error about improper modification. Thanks to Herve >>Pages for the suggestion. >> *** >> Unfortunately this causes Rmpi to fail since MPI requires char pointers >> rather than const char pointers. Normally I use >>CHAR(STRING_ELT((sexp_rdata),0)) >> to get the pointer to MPI where a R character vector (C sense) is stored. >> Because of the change, all character messengers fail. Is there an easy way >> to get a char pointer to STRING_ELT((sexp_rdata),0) and is also backward >> compatible to old R versions. BTW Rmpi does not do any modification of >> characters at C level. >> >> Thanks >> Hao Yu >> >> > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] as.Date.numeric
I noticed that R 2.7.0 will have as.Date.numeric with a second non-optional origin argument. Frankly I would prefer that it default to the Epoch since its a nuisance to specify but at the very least I think that .Epoch should be provided as a builtin variable. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fwd: smart updates and rolling windows
Greetings R'ers! I have been looking for mathematics libraries for event stream processing / time series simulation. Mathematics libraries for event stream processing require two key features; 1) "smart updates" (functions use optimal update algorithms, f.ex. once mean is calculated for an event stream, the subsequent calls to the function are computed using previous values of mean rather than by brute force re-calculation), 2) "rolling calculations" (functions take a lag parameter for sample size, f.ex. mean of last 100 events.) I found a couple simple summary statistics implemented like this in the zoo package. I have also found implementations for smart updates in some other languages (apache commons math, and BOOST accumulators) but these only supports accumulated calculations, not rolling calculations. I have built libraries for this before, and I am currently working on a new version - but before I reinvent the wheel I am trying to find some folks in the community with similar interests to collaborate with. My personal use for this is financial time series analysis, so I am interested in implementing these high-performance algorithms for classical statistics, robust statistics, regression models, etc. Best! /brad [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel