Re: [Rd] modifying large R objects in place

2007-09-29 Thread Petr Savicky
On Fri, Sep 28, 2007 at 08:14:45AM -0500, Luke Tierney wrote:
[...]
> [...] A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.

If a user-defined function evaluates its body in its parent environment
using the suggestion of Peter Dalgaard eval.parent(substitute(  )),
then NAMED attribute is not increased and the function may do in place
modifications.

On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
> Longer-term, I still have some hope for better reference counting, but 
> the semantics of environments make it really ugly -- an environment can 
> contain an object that contains the environment, a simple example being 
> 
> f <- function()
>g <- function() 0
> f()
> 

On Fri, Sep 28, 2007 at 09:46:39AM -0400, Duncan Murdoch wrote:
> f has no input; it's output is the function g, whose environment is the 
> evaluation environment of f.  g is never used, but it is returned as the 
> value of f.  Thus we have the loop:
> 
> g refers to the environment.
> the environment contains g.
> 
> Even though the result of f() was never saved, two things (the 
> environment and g) got created and each would have non-zero reference 
> count.

Thank you very much for the example and explanation. I would
not guess, something like this is possible, but now I see that
it may, in fact, be quite common. For example
  something <- function()
  {
  a <- 1:5
  b <- 6:10
  c <- c("a","a","b","b","b")
  mf <- model.frame(c ~ a + b)
  mf
  }
  mf1 <- something()
  e1 <- attr(attr(mf1,"terms"),".Environment")
  mf2 <- eval(expression(mf),envir=e1)
  e2 <- attr(attr(mf2,"terms"),".Environment")
  print(identical(e1,e2)) # TRUE
seems to be a similar situation. Here, the references go in the
sequence mf1 -> e1 -> mf2 -> e1. I think that already mf2 is
the same as mf1, but I do not know how to demonstrate this.
However, both mf1 and mf2 refer to the same environment, so
e1 -> mf2 -> e1 is a cycle for sure.

On Fri, Sep 28, 2007 at 08:14:45AM -0500, Luke Tierney wrote:
> >If yes, is it possible during gc() to determine also cases,
> >when NAMED may be dropped from 2 to 1? How much would this increase
> >the complexity of gc()?
> 
> Probably not impossible but would be a fair bit of work with probably
> not much gain as the NAMED values would still be high until the next
> gc of the appropriate level, which will probably be a fair time as an
> object being modified is likely to be older, but the interval in which
> there would be a benefit is short.

On Fri, Sep 28, 2007 at 04:36:40PM +0100, Prof Brian Ripley wrote:
[...]
> On Fri, 28 Sep 2007, Luke Tierney wrote:
[...]
> >approach may be possible. A related issue is that user-defined
> >assignment functions always see a NAMED of 2 and hence cannot modify
> >in place. We've been trying to come up with a reasonable solution to
> >this, so far without success but I'm moderately hopeful.
> 
> I am not persuaded that the difference between NAMED=1/2 makes much 
> difference in general use of R, and I recall Ross saying that he no longer 
> believed that this was a worthwhile optimization.  It's not just 
> 'user-defined' replacement functions, but also all the system-defined 
> closures (including all methods for the generic replacement functions 
> which are primitive) that are unable to benefit from it.

I am thinking about the following situation. The user creates a large
matrix A and then performs a sequence of operations on it. Some of
the operations scan the matrix in a read-only manner (calculating e.g.
some summaries), some operations are top level commands, which modify the
matrix itself. I do not argue that such a sequence of operations should
be done in place by default. However, I think that R should provide
tools, which allow to do this in place, if the user does some extra
work. If the matrix is really large, then in place operations are not
only more space efficient, but also more time efficient.

Using the information from the current thread, there are two
possible approaches to reach this.

1. The initial matrix should not be generated by "matrix" function
   due to the observation by Henrik Bengtsson (this is the issue
   with dimnames). The matrix may be initiated using e.g.
 .Internal(matrix(data, nrow, ncol, byrow))

   The matrix should not be scanned using an R function, which evaluates
   its body in its own enviroment. This includes functions nrow, ncol,
   colSums, rowSums and probaly more. The matrix may be scanned by
   functions, which use eval.parent(substitute(  )) and avoid giving
   the matrix a new name. The user may prepare versions of nrow, ncol,
   colSums, rowSums, etc. with this property.

2. If NAMED attribute of A may be decreased from 2 to 1 during an operation
   similar to garbage collection (if A is not in a refere

[Rd] R-Server remotecontrolled via browser-GUI

2007-09-29 Thread idontwant googeltospyafterme
hi jeff,
i have read your paper from 2005 and your rapache solution sounds
good. i was wondering if u did sth about the state problem... you
should put a changelog on your website. what changes will come with
1.0?
and what is brew exactly for? it is like a tuned "cat", mixing
r-output and text, right?
so if i build a gui, brew could generate the html. but i dont'have to
use brew, do i? what are the advantages of using brew with rapache?

what do you think of openstatserver btw?

i think there are lots of interesting and promising approaches around
in R-community.
but as with all OSS, same here. many people working on slightly
different solutions for the same problem. and none of the solutions is
feature complete, some are not even actively developed and only a few
are to be seen as stable or beyond beta stadium.
well, i work almost only with OSS, so i got used to it :-)
it's just difficult to navigate thru the possibilities without
spending too much time on recherche.

have a nice day,

Josuah

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CHAR () and Rmpi

2007-09-29 Thread Martin Morgan
Hao Yu,

I spot two types of problematic code. Certainly the memcpy in
conversions.c:54 and 56 will cause problems, but I'm not sure whether
those functions are actually used?

The second paradigm is, e.g., Rmpi.c:561

MPI_Recv(CHAR(STRING_ELT(sexp_data,i)),
 slen,MPI_CHAR,source,tag, comm[commn],&status[statusn]);

where the first argument to MPI_Recv is a buffer that MPI_Recv will
fill. sexp_data is a user-supplied character vector. A not-clever
solution creates a temporary buffer via R_alloc (for garbage-collected
memory) or R_Calloc (for user-managed memory, probably appropriate in
a loop where you'd like to reuse the buffer), passes the buffer to
MPI_Recv, and then SET_STRING_ELT with the now filled temporary buffer
converted to a CHARSXP with mkChar. I think this is backward
compatible. The user-supplied character vector has gone to waste, used
only to pass in the length of the expected string. mkChar will copy
the temporary buffer (unless an identical CHARSXP already exists), so
that there are potentially three memory allocations per string!  I
suspect most users rely on higher-level access (mpi.par*Apply,
mpi.*.Robj, etc) where this inefficiency is not important or can be
addressed without modifying the public interface.

Martin

Prof Brian Ripley <[EMAIL PROTECTED]> writes:

> I'm not sure what your sticking point here is.  If mpi does not modify 
> data in a (char *) pointer, then that really is a (const char *) pointer 
> and the headers are being unhelpful in not telling the compiler that 
> the data are constant.
>
> If that is the case you need to use casts to (char *) and the following 
> private define may be useful to you:
>
> #define CHAR_RW(x) ((char *) CHAR(x))
>
>
> However, you ask
>
>> Is there an easy way to get a char pointer to STRING_ELT((sexp_rdata),0) 
>> and is also backward compatible to old R versions.
>
> and the answer is that there is no such way, since (const char *) and 
> (char *) are not the same thing and any package that wants to alter the 
> contents of a string element needs to create a new CHARSXP to be that 
> element.
>
>
> BTW, you still have not changed Rmpi to remove the configure problems on 
> 64-bit systems (including assuming libs are in /usr/lib not /usr/lib64) I 
> pointed out a long time ago.
>
>
> On Fri, 28 Sep 2007, Hao Yu wrote:
>
>> Hi. I am the maintainer of Rmpi package. Now I have a problem regarding
>> the change of CHAR () in R 2.6.0. According to R 2.6.0 NEWS:
>> ***
>> CHAR() now returns (const char *) since CHARSXPs should no
>>longer be modified in place.  This change allows compilers to
>>warn or error about improper modification.  Thanks to Herve
>>Pages for the suggestion.
>> ***
>> Unfortunately this causes Rmpi to fail since MPI requires char pointers
>> rather than const char pointers. Normally I use
>>CHAR(STRING_ELT((sexp_rdata),0))
>> to get the pointer to MPI where a R character vector (C sense) is stored.
>> Because of the change, all character messengers fail. Is there an easy way
>> to get a char pointer to STRING_ELT((sexp_rdata),0) and is also backward
>> compatible to old R versions. BTW Rmpi does not do any modification of
>> characters at C level.
>>
>> Thanks
>> Hao Yu
>>
>>
>
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] as.Date.numeric

2007-09-29 Thread Gabor Grothendieck
I noticed that R 2.7.0 will have as.Date.numeric with a second
non-optional origin argument.  Frankly I would prefer that it default
to the Epoch since its a nuisance to specify but at the very least
I think that .Epoch should be provided as a builtin variable.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: smart updates and rolling windows

2007-09-29 Thread Bradford Cross
Greetings R'ers!

I have been looking for mathematics libraries for event stream processing /
time series simulation.  Mathematics libraries for event stream processing
require two key features; 1) "smart updates" (functions use optimal update
algorithms, f.ex. once mean is calculated for an event stream, the
subsequent calls to the function are computed using previous values of mean
rather than by brute force re-calculation), 2)  "rolling calculations"
(functions take a lag parameter for sample size, f.ex. mean of last 100
events.)

I found a couple simple summary statistics implemented like this in the zoo
package.  I have also found implementations for smart updates in some other
languages (apache commons math, and BOOST accumulators) but these only
supports accumulated calculations, not rolling calculations.

I have built libraries for this before, and I am currently working on a new
version - but before I reinvent the wheel I am trying to find some folks in
the community with similar interests to collaborate with.

My personal use for this is financial time series analysis, so I am
interested in  implementing these high-performance algorithms for classical
statistics, robust statistics, regression models, etc.

Best!

/brad

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel