Re: [Rd] Implicit vs explicit printing and the call stack

2007-05-13 Thread Prof Brian Ripley
First, it was not clear that you are talking about the output of 
traceback(), which is _a representation of_ the call stack and depends on 
the details of deparsing.

Second, the difference is I believe not implicit vs explicit printing, but 
of printing an evaluated object vs printing a call.  What your subject 
line would correspond to is

x <- ggplot(mtcars, aes(x = cyl, y = -mpg)) + scale_y_log10() + geom_point()
x vs print(x)

I can't demonstrate (what is aes?), but I expect that would show minimal 
differences.

The issue seems to be whether arguments have been evaluated, and if so 
whether they were promises.  Promises can be deparsed back to the symbolic 
representation rather than to their value.  When you call print explicitly 
_on the ggplot call_, you have promises to arguments (to support lazy 
evaluation).  As far as I know (this is a detail of your code), the object 
ggplot returns has evaluated 'mtcars' and contains the value and not the 
promise, hence the difference in the deparsing of the call stack.

You chose to call traceback() without 'max.lines', and the answer would 
seem to be

1) don't use a buggy print method,
2) if you have to (to fix it), use 'max.lines' or do this in two steps.

If you want your end users to see a nicer representation, reconsider what 
you actually return in your object.


On Sat, 12 May 2007, hadley wickham wrote:

> Hi everyone,
>
> I've run into a bit of strange problem with implicit vs explicit
> printing and the call stack. I've included an example at the bottom of
> this email.  The basic problem is that I have an S3 object with a
> print method.  When the object is implicitly printed (ie. typed
> directly into the console) the function arguments in the call stack
> are exploded out to their actual values, rather than just the name I
> typed in (see below for an example if my language is confusing).  When
> I explicitly "print" the object, the call stack is fine.
>
> This is not just of academic interest, because with a larger dataset
> and an implicit print, there is a noticeable delay before control
> returns to the prompt  (I can't quantify it exactly because
> system.time requires a explicit print, but it's on the order of a few
> seconds).
>
> I'm not sure if I've provided enough information to be able to solve
> the problem, so please let me know what additional details would be
> useful.
>
> Thanks,
>
> Hadley
>
>
>> ggplot(mtcars, aes(x=cyl, y=-mpg)) + scale_y_log10() + geom_point()
> Error in grid.pretty(.$domain()) : infinite axis extents 
> [GEPretty(-inf,inf,5)]
> In addition: Warning messages:
> 1: NaNs produced in: log(x, base)
> 2: no non-missing arguments to min; returning Inf
> 3: no non-missing arguments to max; returning -Inf
> 4: no non-missing arguments to min; returning Inf
> 5: no non-missing arguments to max; returning -Inf
>> traceback()
> 16: .Call(L_pretty, range)
> 15: grid.pretty(.$domain())
> 14: get("breaks", env = .$y(), inherits = TRUE)(.$y(), ...)
> 13: .$y()$breaks()
> 12: range(at)
> 11: as.numeric(x)
> 10: unit(range(at), "native")
> 9: ggaxis_line(at, position)
> 8: ggaxis(.$y()$breaks(), .$y()$labels(), "left", range$y)
> 7: get("guide_axes", env = coordinates, inherits = TRUE)(coordinates,
>   ...)
> 6: coordinates$guide_axes()
> 5: guides_basic(plot, scales, cs)
> 4: ggplot_plot(x, ...)
> 3: grid.draw(ggplot_plot(x, ...))
> 2: print.ggplot(list(data = list(mpg = c(21, 21, 22.8, 21.4, 18.7,
>   18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4,
>   14.7, 32.4, 30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26,
>   30.4, 15.8, 19.7, 15, 21.4), cyl = c(6, 6, 4, 6, 8, 6, 8, 4,
>   4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8,
>   6, 8, 4), disp = c(160, 160, 108, 258, 360, 225, 360, 146.7,
>   140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7,
>   75.7, 71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351,
>   145, 301, 121), hp = c(110, 110, 93, 110, 175, 105, 245, 62,
>   95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150,
>   150, 245, 175, 66, 91, 113, 264, 175, 335, 109), drat = c(3.9,
>   3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07,
>   3.07, 3.07, 2.93, 3, 3.23, 4.08, 4.93, 4.22, 3.7, 2.76, 3.15,
>   3.73, 3.08, 4.08, 4.43, 3.77, 4.22, 3.62, 3.54, 4.11), wt = c(2.62,
>   2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44,
>   4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465,
>   3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57,
>   2.78), qsec = c(16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84,
>   20, 22.9, 18.3, 18.9, 17.4, 17.6, 18, 17.98, 17.82, 17.42, 19.47,
>   18.52, 19.9, 20.01, 16.87, 17.3, 15.41, 17.05, 18.9, 16.7, 16.9,
>   14.5, 15.5, 14.6, 18.6), vs = c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1,
>   1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
>   1), am = c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>   1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4,

Re: [Rd] Implicit vs explicit printing and the call stack

2007-05-13 Thread hadley wickham
> First, it was not clear that you are talking about the output of
> traceback(), which is _a representation of_ the call stack and depends on
> the details of deparsing.

Given that there is a substantial delay in one case, and not in the
other, I had assumed (perhaps falsely) that there was something more
fundamental going on that simply the representation of the call stack
by traceback. Perhaps I should have been more clear that the output of
traceback was a symptom of the problem, not what I was really
interested in.

> Second, the difference is I believe not implicit vs explicit printing, but
> of printing an evaluated object vs printing a call.  What your subject
> line would correspond to is
>
> x <- ggplot(mtcars, aes(x = cyl, y = -mpg)) + scale_y_log10() + geom_point()
> x vs print(x)
>
> I can't demonstrate (what is aes?), but I expect that would show minimal
> differences.

I don't think that is the case.  Using Gabor's reproducible example:

library(lattice)

x <- xyplot(conc ~ uptake, CO2, xlim = Inf)
x
traceback()
print(x)
traceback()

shows the same differences that I reported.

> The issue seems to be whether arguments have been evaluated, and if so
> whether they were promises.  Promises can be deparsed back to the symbolic
> representation rather than to their value.  When you call print explicitly
> _on the ggplot call_, you have promises to arguments (to support lazy
> evaluation).  As far as I know (this is a detail of your code), the object
> ggplot returns has evaluated 'mtcars' and contains the value and not the
> promise, hence the difference in the deparsing of the call stack.

I'm not sure I completely follow, and I don't think your reasoning
holds given the example above. Please correct me if I am wrong.  My
code for ggplot.print follows if that helps.

print.ggplot <- function(x, newpage = is.null(vp), vp = NULL, ...) {
if (newpage) grid.newpage()
if (is.null(vp)) {
grid.draw(ggplot_plot(x, ...))
} else {
pushViewport(vp)
grid.draw(ggplot_plot(x, ...))
upViewport()
}
}


>
> You chose to call traceback() without 'max.lines', and the answer would
> seem to be
>
> 1) don't use a buggy print method,
> 2) if you have to (to fix it), use 'max.lines' or do this in two steps.
>
> If you want your end users to see a nicer representation, reconsider what
> you actually return in your object.

I don't really care about the representation in the call stack
(although it is a pain when debugging, even though max.lines helps),
the main problem is the 4-5 second delay after an error before control
returns to the user.  This does not occur with an explicit print.

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Implicit vs explicit printing and the call stack

2007-05-13 Thread Prof Brian Ripley
On Sun, 13 May 2007, hadley wickham wrote:

>> First, it was not clear that you are talking about the output of
>> traceback(), which is _a representation of_ the call stack and depends on
>> the details of deparsing.
>
> Given that there is a substantial delay in one case, and not in the
> other, I had assumed (perhaps falsely) that there was something more
> fundamental going on that simply the representation of the call stack
> by traceback. Perhaps I should have been more clear that the output of
> traceback was a symptom of the problem, not what I was really
> interested in.

traceback() does very little apart from show you .Traceback, which is 
constructed (as a representation) at the time of the error.  My guess is 
that the concern is the time it takes to create .Traceback, but see
below.

>> Second, the difference is I believe not implicit vs explicit printing, but
>> of printing an evaluated object vs printing a call.  What your subject
>> line would correspond to is
>> 
>> x <- ggplot(mtcars, aes(x = cyl, y = -mpg)) + scale_y_log10() + 
>> geom_point()
>> x vs print(x)
>> 
>> I can't demonstrate (what is aes?), but I expect that would show minimal
>> differences.
>
> I don't think that is the case.  Using Gabor's reproducible example:
>
> library(lattice)
>
> x <- xyplot(conc ~ uptake, CO2, xlim = Inf)
> x
> traceback()
> print(x)
> traceback()
>
> shows the same differences that I reported.

It is not the same as regards my comments: your example showed the details 
of mtcars, and that appeared to be your concern.  That example does not 
show the details of CO2.

Part of my point is that if your returned object does not contain the 
evaluated dataset (but, say, a promise to it), this will not happen.

>> The issue seems to be whether arguments have been evaluated, and if so
>> whether they were promises.  Promises can be deparsed back to the symbolic
>> representation rather than to their value.  When you call print explicitly
>> _on the ggplot call_, you have promises to arguments (to support lazy
>> evaluation).  As far as I know (this is a detail of your code), the object
>> ggplot returns has evaluated 'mtcars' and contains the value and not the
>> promise, hence the difference in the deparsing of the call stack.
>
> I'm not sure I completely follow, and I don't think your reasoning
> holds given the example above. Please correct me if I am wrong.

It does.  In one traceback() you have print.trellis(x), in the other 
print.trellis().  This is because in print(x), 
print.trellis is passed a promise, and in 'x', 'x' has been evaluated and 
print and print.trellis are passed the value (a list).

[ ... ]

Let me try to spell out the mechanisms in so far as I understand them (R 
lacks technical documentation):

A) print(x).

The R evaluator sets up an argument list containing a promise to evaluate 
'x' in the calling environemnt. It then calls UseMethod("print"), which 
needs to evaluate the first argument to find the class, finds the method, 
and manipulates the call to be to print.ggplot.  At that point the details 
get complicated, but when evaluation gets to the body of print.ggplot, the 
intention is that it is just as if print.ggplot(x) had been called and so 
the first argument is a promise to evaluate 'x' in the calling environment 
of print().  (Because the details vary across the internals, I am not sure 
without tracing through the code if the promise is a new one or the 
evaluated one: in either case it will deparse to 'x'.)

B) 'x' at top-level.

This creates an anonymous object whose value is the result of eval(x) (at 
C level).  The R evaluator then notices that the R_Visible flag is set and 
calls PrintValueEnv on the anonymous object.  The latter notices that the 
object has the OBJECT bit set and so calls print() on the anonymous 
object. At that point the name 'x' is not available and the anonymous 
object deparses to its value.


Suppose that printing integers had a bug.  Then my understanding is that 
print(2L+3L) would have '2L+3L' in the deparsed call, and '2L+3L' would 
have '5' (or '5L').  That is intended to illustrate that deparsing to the 
call may or may not be more compact than to the value.

>> You chose to call traceback() without 'max.lines', and the answer would
>> seem to be
>> 
>> 1) don't use a buggy print method,
>> 2) if you have to (to fix it), use 'max.lines' or do this in two steps.
>> 
>> If you want your end users to see a nicer representation, reconsider what
>> you actually return in your object.
>
> I don't really care about the representation in the call stack
> (although it is a pain when debugging, even though max.lines helps),
> the main problem is the 4-5 second delay after an error before control
> returns to the user.  This does not occur with an explicit print.

I can't reproduce that of course.  It might well be that deparsing the 
calls to dump the call stack to .Traceback is the problem. in which case 
my '1)' applies. (See the comments a

Re: [Rd] Implicit vs explicit printing and the call stack

2007-05-13 Thread hadley wickham
On 5/13/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> On Sun, 13 May 2007, hadley wickham wrote:
>
> >> First, it was not clear that you are talking about the output of
> >> traceback(), which is _a representation of_ the call stack and depends on
> >> the details of deparsing.
> >
> > Given that there is a substantial delay in one case, and not in the
> > other, I had assumed (perhaps falsely) that there was something more
> > fundamental going on that simply the representation of the call stack
> > by traceback. Perhaps I should have been more clear that the output of
> > traceback was a symptom of the problem, not what I was really
> > interested in.
>
> traceback() does very little apart from show you .Traceback, which is
> constructed (as a representation) at the time of the error.  My guess is
> that the concern is the time it takes to create .Traceback, but see
> below.
>
> >> Second, the difference is I believe not implicit vs explicit printing, but
> >> of printing an evaluated object vs printing a call.  What your subject
> >> line would correspond to is
> >>
> >> x <- ggplot(mtcars, aes(x = cyl, y = -mpg)) + scale_y_log10() +
> >> geom_point()
> >> x vs print(x)
> >>
> >> I can't demonstrate (what is aes?), but I expect that would show minimal
> >> differences.
> >
> > I don't think that is the case.  Using Gabor's reproducible example:
> >
> > library(lattice)
> >
> > x <- xyplot(conc ~ uptake, CO2, xlim = Inf)
> > x
> > traceback()
> > print(x)
> > traceback()
> >
> > shows the same differences that I reported.
>
> It is not the same as regards my comments: your example showed the details
> of mtcars, and that appeared to be your concern.  That example does not
> show the details of CO2.

Oh, I see - it is much more verbose, but CO2 has not been exploded out.

> Part of my point is that if your returned object does not contain the
> evaluated dataset (but, say, a promise to it), this will not happen.

What is the suggested way to create such a promise? A closure? eg.

ggplot <- function(data) {
list(data = function() data)
}

A promise would have the desirable property of not storing a copy of
the data in the object, and you wouldn't need to update the plot
object if you changed the data in between creation and first plotting
(but not between first and second plotting).

Or delayed assign?

ggplot <- function(data) {
out <- new.env()
delayedAssign("data", data, assign.env=out)
out
}

Or something else?

> >> The issue seems to be whether arguments have been evaluated, and if so
> >> whether they were promises.  Promises can be deparsed back to the symbolic
> >> representation rather than to their value.  When you call print explicitly
> >> _on the ggplot call_, you have promises to arguments (to support lazy
> >> evaluation).  As far as I know (this is a detail of your code), the object
> >> ggplot returns has evaluated 'mtcars' and contains the value and not the
> >> promise, hence the difference in the deparsing of the call stack.
> >
> > I'm not sure I completely follow, and I don't think your reasoning
> > holds given the example above. Please correct me if I am wrong.
>
> It does.  In one traceback() you have print.trellis(x), in the other
> print.trellis().  This is because in print(x),
> print.trellis is passed a promise, and in 'x', 'x' has been evaluated and
> print and print.trellis are passed the value (a list).

Ok, I think I get it now.  Thanks for the detailed explanation.

> [ ... ]
>
> Let me try to spell out the mechanisms in so far as I understand them (R
> lacks technical documentation):
>
> A) print(x).
>
> The R evaluator sets up an argument list containing a promise to evaluate
> 'x' in the calling environemnt. It then calls UseMethod("print"), which
> needs to evaluate the first argument to find the class, finds the method,
> and manipulates the call to be to print.ggplot.  At that point the details
> get complicated, but when evaluation gets to the body of print.ggplot, the
> intention is that it is just as if print.ggplot(x) had been called and so
> the first argument is a promise to evaluate 'x' in the calling environment
> of print().  (Because the details vary across the internals, I am not sure
> without tracing through the code if the promise is a new one or the
> evaluated one: in either case it will deparse to 'x'.)
>
> B) 'x' at top-level.
>
> This creates an anonymous object whose value is the result of eval(x) (at
> C level).  The R evaluator then notices that the R_Visible flag is set and
> calls PrintValueEnv on the anonymous object.  The latter notices that the
> object has the OBJECT bit set and so calls print() on the anonymous
> object. At that point the name 'x' is not available and the anonymous
> object deparses to its value.
>
>
> Suppose that printing integers had a bug.  Then my understanding is that
> print(2L+3L) would have '2L+3L' in the deparsed call, and '2L+3L' would
> have '5' (or '5L').  That is intended to illustrate that deparsing t

Re: [Rd] Strange behavior of debugger

2007-05-13 Thread Duncan Murdoch
On 13/05/2007 12:21 AM, Tong Wang wrote:
> Hi, All:
> I had some trouble debugging C source dynamically loaded into R , when I 
> issued N  in gdb(or insight) , the debugger, instead of moving downward step 
> by step, jumped to strange positions (upward, downward, one step, a few steps 
> away).
> 
>To enter the debugger, I issued  gdb(insight) Rgui.exe in Cygwin and add 
> this line :  asm("int $3");   to my C code.  After 
> entering R, I did something like:  dyn.load("mypath/mycode.dll") , then  out 
> <- .C("myfun", arg1=as.numeric(a),..) 
> The C files are compiled with:R CMD SHLIB -d myfile.c
>
>I am using Win XP + Cygwin, and I have a binary version and a cygwin 
> compiled version of R-2.4.1 installed. This same 
> behavior show up in both installations.  

I think you're seeing the code rearrangements that happen when gcc 
optimizes the code.  It puts some functions inline, it shares similar 
sequences of instructions between different blocks of code, etc.

If this is causing problems in debugging, change the -O3 to a lower 
level of optimization, e.g. -O0 (i.e. oh zero) in the relevant Makefile: 
  src/gnuwin32/MakeDll if you're debugging your own package, 
src/gnuwin32/Makefile for most of R, etc.  However, be aware that in 
case of nasty bugs, this may change the behaviour of your program.

By the way, if you used Cygwin compilers to build, expect problems.  The 
supported compiler is MinGW.

> One thing is, even though I set the evn  DEBUG as T when built R from sourse 
> in 
> Cygwin,  the Rgui.exe I got doesn't seem to contain debug info.  (although 
> R.exe does) , I am not sure if this means I did something wrong.  

Perhaps you didn't recompile everything with the flag set.  You need it 
set both when the .o object files are created and later when they are 
linked.  And DEBUG=T has no effect on the initial entry point of 
Rgui.exe, because it is coming from the MinGW libraries, not from R 
source, and R doesn't compile those.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] symbollic differentiation in R

2007-05-13 Thread Andrew Clausen
Hi all,

I wrote a symbollic differentiation function in R, which can be downloaded
here:

http://www.econ.upenn.edu/~clausen/computing/Deriv.R
http://www.econ.upenn.edu/~clausen/computing/Simplify.R

It is just a prototype.  Of course, R already contains two differentiation
functions: D and deriv.  However, these functions have several limitations.
They can probably be fixed, but since they are written in C, this would
require a lot of work.  Limitations include:
 * The derivatives table can't be modified at runtime, and is only available
in C.
 * The output of "deriv" can not be differentiated again.
 * Neither function can substitute function calls.  eg:
   f <- function(x, y) x + y; deriv(f(x, x^2), "x")
 * They can't differentiate vector-valued functions (although my code also
can't do this yet)

I think these limitations are fairly important.  As it stands, it's rather
difficult to automatically differentiate a likelihood function.  Ideally, I
would like to be able to write

ll <- function(mean, sd)
-sum(log(dnorm(x, mean, sd)))

ll.deriv <- Deriv.function(ll)

I can't get this to work with my code since:
 * since sum can't add a list of vectors (although I could easily write a sum
replacement.)
 * "x" is assumed to be a scalar in this contect.  I'm not sure if there's a
good way to generalize.

The above code would work right now if there were one parameter (so
sum doesn't screw it up) and one scalar data point "x".

Is there an existing way of doing this that is close to being this convenient?
Is it really much easier to solve the limitations I listed with a fresh
R implementation?

Cheers,
Andrew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] relist, an inverse operator to unlist

2007-05-13 Thread Andrew Clausen
Hi all,

I wrote a function called relist, which is an inverse to the existing
unlist function:

http://www.econ.upenn.edu/~clausen/computing/relist.R

Some functions need many parameters, which are most easily represented in
complex structures.  Unfortunately, many mathematical functions in R,
including optim, nlm, and grad can only operate on functions whose domain is
a vector.  R has a function to convert complex objects into a vector
representation.  This file provides an inverse operation called "unlist" to
convert vectors back to the convenient structural representation.  Together,
these functions allow structured functions to have simple mathematical
interfaces.

For example, a likelihood function for a multivariate normal model needs a
variance-covariance matrix and a mean vector.  It would be most convenient to
represent it as a list containing a vector and a matrix.  A typical parameter
might look like

list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))

However, optim can't operate on functions that take lists as input; it
only likes vectors.  The solution is conversion:

 initial.param <- list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))

 ll <- function(param.vector)
 {
param <- relist(initial.param, param.vector)
-sum(dnorm(x, mean=param$mean, vcov=param$vcov, log=TRUE))
# note: dnorm doesn't do vcov... but I hope you get the point
 }

 optim(unlist(initial.param), ll)

"relist" takes two parameters: skeleton and flesh.  Skeleton is a sample
object that has the right "shape" but the wrong content.  "flesh" is a vector
with the right content but the wrong shape.  Invoking

relist(skeleton, flesh)

will put the content of flesh on the skeleton.

As long as "skeleton" has the right shape, it should be a precise inverse
of unlist.  These equalities hold:

relist(skeleton, unlist(x)) == x
unlist(relist(skeleton, y)) == y

Is there any easy way to do this without my new relist function?  Is there any
interest in including this in R's base package?  (Or anywhere else?)  Any
comments on the implementation?  

Cheers,
Andrew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] symbollic differentiation in R

2007-05-13 Thread Gabor Grothendieck
On 5/13/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I wrote a symbollic differentiation function in R, which can be downloaded
> here:
>
>http://www.econ.upenn.edu/~clausen/computing/Deriv.R
>http://www.econ.upenn.edu/~clausen/computing/Simplify.R
>
> It is just a prototype.  Of course, R already contains two differentiation
> functions: D and deriv.  However, these functions have several limitations.
> They can probably be fixed, but since they are written in C, this would
> require a lot of work.  Limitations include:
>  * The derivatives table can't be modified at runtime, and is only available
> in C.
>  * The output of "deriv" can not be differentiated again.

Try this:

> D(D(quote(x^3), "x"), "x")
3 * (2 * x)

>  * Neither function can substitute function calls.  eg:
>   f <- function(x, y) x + y; deriv(f(x, x^2), "x")

Try Ryacas package:

> library(Ryacas)
> x <- Sym("x")
> f <- function(x)x^2
> deriv(f(x^3))
expression(6 * x^5)

>  * They can't differentiate vector-valued functions (although my code also
> can't do this yet)

> library(Ryacas)
> x <- Sym("x")
> deriv(List(x, x^2))
expression(list(1, 2 * x))


>
> I think these limitations are fairly important.  As it stands, it's rather
> difficult to automatically differentiate a likelihood function.  Ideally, I
> would like to be able to write
>
>ll <- function(mean, sd)
>-sum(log(dnorm(x, mean, sd)))
>
>ll.deriv <- Deriv.function(ll)
>
> I can't get this to work with my code since:
>  * since sum can't add a list of vectors (although I could easily write a sum
> replacement.)
>  * "x" is assumed to be a scalar in this contect.  I'm not sure if there's a
> good way to generalize.
>
> The above code would work right now if there were one parameter (so
> sum doesn't screw it up) and one scalar data point "x".
>
> Is there an existing way of doing this that is close to being this convenient?
> Is it really much easier to solve the limitations I listed with a fresh
> R implementation?
>
> Cheers,
> Andrew
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] relist, an inverse operator to unlist

2007-05-13 Thread Andrew Clausen
On Sun, May 13, 2007 at 01:29:11PM -0400, Andrew Clausen wrote:
> R has a function to convert complex objects into a vector
> representation.  This file provides an inverse operation called "unlist" to
> convert vectors back to the convenient structural representation.

Oops.  I meant to say:

R has a function to convert complex objects into a vector representation called
"unlist".  This file provides an inverse operation called "relist" to convert
vectors back to the convenient structural representation.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Help understanding LAPACK symbol resolution

2007-05-13 Thread Martin Morgan
R developers,

I am trying to understand how symbols are resolved, so that I can
configure a package that I contributed to, and so that I can provide
guidance to (linux / OSX) users of the package. To be concrete, my
package uses the LAPACK Fortran symbol zsysv. This is not in
libRlapack, but is defined on my system in the library
/usr/lib64/liblapack.so.

* I suspect that the reason the symbol is not in libRlapack is just
  one of economy, i.e., no use for the symbol in R routines, rather
  than for other nefarious reasons (?? some fundamental incompatibility
  with R?)

I guess that most of my package users will have an R built without
special attention to their lapack library, so will start with
something like

[EMAIL PROTECTED]:~> R CMD config LAPACK_LIBS
-L/home/mtmorgan/arch/x86_64/R-devel/lib -lRlapack

My R is built with --enable-R-shlib, so predictably enough

R CMD INSTALL --clean 

is 'successful' (zsysv_ is marked as unresolved in the .so, but
this doesn't stop compiling and linking). Also predictably enough,
loading the package in R indicates 'undefined symbol: zsysv_'. Inside
R, LD_LIBRARY_PATH starts with he R_HOME/lib, and includes /usr/lib64,
so I surmise that the libraries defined at compile / link are the ones
where symbols are searched (rather than all libraries in
LD_LIBRARY_PATH).

To allow the user to provide a specific LAPACK, I added lines to a
configure.in file that allow for a --with-lapack

LAPACK_LIBS=`"${R_HOME}/bin/R" CMD config LAPACK_LIBS`
AC_ARG_WITH([lapack],
AC_HELP_STRING([--with-lapack=LIB_PATH],
[LAPACK library location with complex routines]),
[LAPACK_LIBS=$withval])

added a check to see that zsysv_ is actually available

AC_CHECK_FUNC(zsysv_,,
AC_MSG_ERROR([lapack needs zsysv_ in ${LAPACK_LIBS}]))

and substituted LAPACK_LIBS into a Makevars.in file

AC_SUBST(LAPACK_LIBS)
AC_OUTPUT(src/Makevars)

Makevars.in:
[EMAIL PROTECTED]@

I then install my package with

R CMD INSTALL --clean --configure-args=--with-lapack=-llapack 

or more generally

R CMD INSTALL --clean \
   --configure-args="--with-lapack='-L/usr/lib64 -llapack'" 

This 'works', in the sense that the package compiles, loads, and
apparently runs as expected. I'm concerned though about how lapack is
being found, and how symbols are actually being resolved.

When I

[EMAIL PROTECTED]:~> ldd .so

I see an entry

liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x2b0928a1c000)

and I do NOT see an entry pointing to libRlapack .Am I right in
interpreting this to mean:

* All LAPACK symbols in my package, including those that
  coincidentally have a definition in libRlapack, resolve to
  /usr/lib64/liblapack.so?

* liblapack.so will be found without any need to specify
  LD_LIBRARY_PATH, or other configuration variables? Or is the library
  being found because my LD_LIBRARY_PATH already includes /usr/lib64?
  If the latter, how can the user 'best' configure their system to
  find the required library (I think I'm looking for something between
  'get the system administrator to install lapack in a findable place'
  and 'set LD_LIBRARY_PATH before starting R').

* Resolving symbols to libraries will occur in a way consistent with
  the last two points (as opposed to the implementation details)
  across platforms, compilers, and static vs. shared libraries?

Thanks for any reassurance or corrective guidance.

Martin
-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Help understanding LAPACK symbol resolution

2007-05-13 Thread Prof Brian Ripley
On Sun, 13 May 2007, Martin Morgan wrote:

> R developers,
>
> I am trying to understand how symbols are resolved, so that I can
> configure a package that I contributed to, and so that I can provide
> guidance to (linux / OSX) users of the package. To be concrete, my
> package uses the LAPACK Fortran symbol zsysv. This is not in
> libRlapack, but is defined on my system in the library
> /usr/lib64/liblapack.so.
>
> * I suspect that the reason the symbol is not in libRlapack is just
>  one of economy, i.e., no use for the symbol in R routines, rather
>  than for other nefarious reasons (?? some fundamental incompatibility
>  with R?)

Space saving.  'Writing R Extensions' covers this.

> I guess that most of my package users will have an R built without
> special attention to their lapack library, so will start with
> something like
>
> [EMAIL PROTECTED]:~> R CMD config LAPACK_LIBS
> -L/home/mtmorgan/arch/x86_64/R-devel/lib -lRlapack
>
> My R is built with --enable-R-shlib, so predictably enough
>
> R CMD INSTALL --clean 
>
> is 'successful' (zsysv_ is marked as unresolved in the .so, but
> this doesn't stop compiling and linking). Also predictably enough,
> loading the package in R indicates 'undefined symbol: zsysv_'. Inside
> R, LD_LIBRARY_PATH starts with he R_HOME/lib, and includes /usr/lib64,
> so I surmise that the libraries defined at compile / link are the ones
> where symbols are searched (rather than all libraries in
> LD_LIBRARY_PATH).

Not the way R is usually built.  Library dirs specified by -L during 
configure are added to R_LIBRARY_PATH, but not those specified by the 
environment LD_LIBRARY_PATH at build time.  Most loaders have a -R/-rpath 
option, but R does not (by default) make use of it.  (I personally think 
it should: ELF originates on Solaris and that makes very effective use of 
-R.)

At run time ld.so searches its cache as well as LD_LIBRARY_PATH.  The 
order is system-specific: Linux says

o (ELF only) Using the DT_RPATH dynamic section attribute  of  the
  binary  if present and DT_RUNPATH attribute does not exist.  Use
  of DT_RPATH is deprecated.

o Using the environment variable LD_LIBRARY_PATH.  Except  if  the
  executable  is  a set-user-ID/set-group-ID binary, in which case
  it is ignored.

o (ELF only) Using the DT_RUNPATH dynamic section attribute of the
binary if present.

o From  the  cache file /etc/ld.so.cache which contains a compiled
  list of candidate libraries previously found  in  the  augmented
  library  path.  If, however, the binary was linked with -z node-
  flib linker option, libraries in the default library  paths  are
  skipped.

o In  the default path /lib, and then /usr/lib.  If the binary was
  linked with -z nodeflib linker option, this step is skipped.

(and for a 64-bit system, read lib64 for lib).


> To allow the user to provide a specific LAPACK, I added lines to a
> configure.in file that allow for a --with-lapack
>
> LAPACK_LIBS=`"${R_HOME}/bin/R" CMD config LAPACK_LIBS`
> AC_ARG_WITH([lapack],
>   AC_HELP_STRING([--with-lapack=LIB_PATH],
>   [LAPACK library location with complex routines]),
>   [LAPACK_LIBS=$withval])
>
> added a check to see that zsysv_ is actually available
>
> AC_CHECK_FUNC(zsysv_,,
>   AC_MSG_ERROR([lapack needs zsysv_ in ${LAPACK_LIBS}]))
>
> and substituted LAPACK_LIBS into a Makevars.in file
>
> AC_SUBST(LAPACK_LIBS)
> AC_OUTPUT(src/Makevars)
>
> Makevars.in:
> [EMAIL PROTECTED]@
>
> I then install my package with
>
> R CMD INSTALL --clean --configure-args=--with-lapack=-llapack 
>
> or more generally
>
> R CMD INSTALL --clean \
>   --configure-args="--with-lapack='-L/usr/lib64 -llapack'" 
>
> This 'works', in the sense that the package compiles, loads, and
> apparently runs as expected. I'm concerned though about how lapack is
> being found, and how symbols are actually being resolved.
>
> When I
>
> [EMAIL PROTECTED]:~> ldd .so
>
> I see an entry
>
>liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x2b0928a1c000)
>
> and I do NOT see an entry pointing to libRlapack .Am I right in
> interpreting this to mean:
>
> * All LAPACK symbols in my package, including those that
>  coincidentally have a definition in libRlapack, resolve to
>  /usr/lib64/liblapack.so?

Yes.  libRlapack.so will not be in the search path.

> * liblapack.so will be found without any need to specify
>  LD_LIBRARY_PATH, or other configuration variables? Or is the library
>  being found because my LD_LIBRARY_PATH already includes /usr/lib64?

Both ld (used for linking) and ld.so (used a runtime) look in that path by 
default.

>  If the latter, how can the user 'best' configure their system to
>  find the required library (I think I'm looking for something between
>  'get the system administrator to install lapack in a findable place'
>  and 'set LD_LIBRARY_PATH before starting R').

Better to set it in the ld.so cache (via a file in /etc/ld.

Re: [Rd] symbollic differentiation in R

2007-05-13 Thread Gabor Grothendieck
On 5/13/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> On 5/13/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I wrote a symbollic differentiation function in R, which can be downloaded
> > here:
> >
> >http://www.econ.upenn.edu/~clausen/computing/Deriv.R
> >http://www.econ.upenn.edu/~clausen/computing/Simplify.R
> >
> > It is just a prototype.  Of course, R already contains two differentiation
> > functions: D and deriv.  However, these functions have several limitations.
> > They can probably be fixed, but since they are written in C, this would
> > require a lot of work.  Limitations include:
> >  * The derivatives table can't be modified at runtime, and is only available
> > in C.
> >  * The output of "deriv" can not be differentiated again.
>
> Try this:
>
> > D(D(quote(x^3), "x"), "x")
> 3 * (2 * x)
>
> >  * Neither function can substitute function calls.  eg:
> >   f <- function(x, y) x + y; deriv(f(x, x^2), "x")
>
> Try Ryacas package:

I had omitted one line.  f has to be registered with yacas:

>
> > library(Ryacas)
> > x <- Sym("x")
> > f <- function(x)x^2

yacas(f)

> > deriv(f(x^3))
> expression(6 * x^5)
>
> >  * They can't differentiate vector-valued functions (although my code also
> > can't do this yet)
>
> > library(Ryacas)
> > x <- Sym("x")
> > deriv(List(x, x^2))
> expression(list(1, 2 * x))
>
>
> >
> > I think these limitations are fairly important.  As it stands, it's rather
> > difficult to automatically differentiate a likelihood function.  Ideally, I
> > would like to be able to write
> >
> >ll <- function(mean, sd)
> >-sum(log(dnorm(x, mean, sd)))
> >
> >ll.deriv <- Deriv.function(ll)
> >
> > I can't get this to work with my code since:
> >  * since sum can't add a list of vectors (although I could easily write a 
> > sum
> > replacement.)
> >  * "x" is assumed to be a scalar in this contect.  I'm not sure if there's a
> > good way to generalize.
> >
> > The above code would work right now if there were one parameter (so
> > sum doesn't screw it up) and one scalar data point "x".
> >
> > Is there an existing way of doing this that is close to being this 
> > convenient?
> > Is it really much easier to solve the limitations I listed with a fresh
> > R implementation?
> >
> > Cheers,
> > Andrew
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] relist, an inverse operator to unlist

2007-05-13 Thread Gabor Grothendieck
I suggest you define a "relist" class and then define an unlist
method for it which stores the skeleton as an attribute.  Then
one would not have to specify skeleton in the relist command
so

relist(unlist(relist(x))) === x

1. relist(x) is the same as x except it gets an additional class "relist".
2. unlist(relist(x)) invokes the relist method of unlist on relist(x)
returning another relist object
3. relist(unlist(relist(x))) then recreates relist(x)


On 5/13/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I wrote a function called relist, which is an inverse to the existing
> unlist function:
>
>http://www.econ.upenn.edu/~clausen/computing/relist.R
>
> Some functions need many parameters, which are most easily represented in
> complex structures.  Unfortunately, many mathematical functions in R,
> including optim, nlm, and grad can only operate on functions whose domain is
> a vector.  R has a function to convert complex objects into a vector
> representation.  This file provides an inverse operation called "unlist" to
> convert vectors back to the convenient structural representation.  Together,
> these functions allow structured functions to have simple mathematical
> interfaces.
>
> For example, a likelihood function for a multivariate normal model needs a
> variance-covariance matrix and a mean vector.  It would be most convenient to
> represent it as a list containing a vector and a matrix.  A typical parameter
> might look like
>
>list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
>
> However, optim can't operate on functions that take lists as input; it
> only likes vectors.  The solution is conversion:
>
> initial.param <- list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
>
> ll <- function(param.vector)
> {
>param <- relist(initial.param, param.vector)
>-sum(dnorm(x, mean=param$mean, vcov=param$vcov, log=TRUE))
># note: dnorm doesn't do vcov... but I hope you get the point
> }
>
> optim(unlist(initial.param), ll)
>
> "relist" takes two parameters: skeleton and flesh.  Skeleton is a sample
> object that has the right "shape" but the wrong content.  "flesh" is a vector
> with the right content but the wrong shape.  Invoking
>
>relist(skeleton, flesh)
>
> will put the content of flesh on the skeleton.
>
> As long as "skeleton" has the right shape, it should be a precise inverse
> of unlist.  These equalities hold:
>
>relist(skeleton, unlist(x)) == x
>unlist(relist(skeleton, y)) == y
>
> Is there any easy way to do this without my new relist function?  Is there any
> interest in including this in R's base package?  (Or anywhere else?)  Any
> comments on the implementation?
>
> Cheers,
> Andrew
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] select.list() not on front of other windows

2007-05-13 Thread Henrik Bengtsson
When calling

  select.list(letters[1:3])

in a fresh R session (R v2.4.1, v2.5.0, v2.6.0 devel) on WinXP using
*Rterm*, the dialog does *not* come up on front of other windows the
first time you call it.  Under Rgui it works just fine.

If you do:

 1) select.list(letters[1:3])
 2) bring the window to front manually
 3) select an option and press OK
 4) select.list(letters[1:3])

it the (second) dialog comes up in front of all other windows.

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Help understanding LAPACK symbol resolution

2007-05-13 Thread Martin Morgan
Prof. Ripley,

Thank you for the very helpful guidance and pointer to fastICA. 

Martin

Prof Brian Ripley <[EMAIL PROTECTED]> writes:

> On Sun, 13 May 2007, Martin Morgan wrote:
>
>> R developers,
>>
>> I am trying to understand how symbols are resolved, so that I can
>> configure a package that I contributed to, and so that I can provide
>> guidance to (linux / OSX) users of the package. To be concrete, my
>> package uses the LAPACK Fortran symbol zsysv. This is not in
>> libRlapack, but is defined on my system in the library
>> /usr/lib64/liblapack.so.
>>
>> * I suspect that the reason the symbol is not in libRlapack is just
>>  one of economy, i.e., no use for the symbol in R routines, rather
>>  than for other nefarious reasons (?? some fundamental incompatibility
>>  with R?)
>
> Space saving.  'Writing R Extensions' covers this.
>
>> I guess that most of my package users will have an R built without
>> special attention to their lapack library, so will start with
>> something like
>>
>> [EMAIL PROTECTED]:~> R CMD config LAPACK_LIBS
>> -L/home/mtmorgan/arch/x86_64/R-devel/lib -lRlapack
>>
>> My R is built with --enable-R-shlib, so predictably enough
>>
>> R CMD INSTALL --clean 
>>
>> is 'successful' (zsysv_ is marked as unresolved in the .so, but
>> this doesn't stop compiling and linking). Also predictably enough,
>> loading the package in R indicates 'undefined symbol: zsysv_'. Inside
>> R, LD_LIBRARY_PATH starts with he R_HOME/lib, and includes /usr/lib64,
>> so I surmise that the libraries defined at compile / link are the ones
>> where symbols are searched (rather than all libraries in
>> LD_LIBRARY_PATH).
>
> Not the way R is usually built.  Library dirs specified by -L during
> configure are added to R_LIBRARY_PATH, but not those specified by the
> environment LD_LIBRARY_PATH at build time.  Most loaders have a
> -R/-rpath option, but R does not (by default) make use of it.  (I
> personally think it should: ELF originates on Solaris and that makes
> very effective use of -R.)
>
> At run time ld.so searches its cache as well as LD_LIBRARY_PATH.  The
> order is system-specific: Linux says
>
> o (ELF only) Using the DT_RPATH dynamic section attribute  of  the
>   binary  if present and DT_RUNPATH attribute does not exist.  Use
>   of DT_RPATH is deprecated.
>
> o Using the environment variable LD_LIBRARY_PATH.  Except  if  the
>   executable  is  a set-user-ID/set-group-ID binary, in which case
>   it is ignored.
>
> o (ELF only) Using the DT_RUNPATH dynamic section attribute of the
> binary if present.
>
> o From  the  cache file /etc/ld.so.cache which contains a compiled
>   list of candidate libraries previously found  in  the  augmented
>   library  path.  If, however, the binary was linked with -z node-
>   flib linker option, libraries in the default library  paths  are
>   skipped.
>
> o In  the default path /lib, and then /usr/lib.  If the binary was
>   linked with -z nodeflib linker option, this step is skipped.
>
> (and for a 64-bit system, read lib64 for lib).
>
>
>> To allow the user to provide a specific LAPACK, I added lines to a
>> configure.in file that allow for a --with-lapack
>>
>> LAPACK_LIBS=`"${R_HOME}/bin/R" CMD config LAPACK_LIBS`
>> AC_ARG_WITH([lapack],
>>  AC_HELP_STRING([--with-lapack=LIB_PATH],
>>  [LAPACK library location with complex routines]),
>>  [LAPACK_LIBS=$withval])
>>
>> added a check to see that zsysv_ is actually available
>>
>> AC_CHECK_FUNC(zsysv_,,
>>  AC_MSG_ERROR([lapack needs zsysv_ in ${LAPACK_LIBS}]))
>>
>> and substituted LAPACK_LIBS into a Makevars.in file
>>
>> AC_SUBST(LAPACK_LIBS)
>> AC_OUTPUT(src/Makevars)
>>
>> Makevars.in:
>> [EMAIL PROTECTED]@
>>
>> I then install my package with
>>
>> R CMD INSTALL --clean --configure-args=--with-lapack=-llapack 
>>
>> or more generally
>>
>> R CMD INSTALL --clean \
>>   --configure-args="--with-lapack='-L/usr/lib64 -llapack'" 
>>
>> This 'works', in the sense that the package compiles, loads, and
>> apparently runs as expected. I'm concerned though about how lapack is
>> being found, and how symbols are actually being resolved.
>>
>> When I
>>
>> [EMAIL PROTECTED]:~> ldd .so
>>
>> I see an entry
>>
>>liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x2b0928a1c000)
>>
>> and I do NOT see an entry pointing to libRlapack .Am I right in
>> interpreting this to mean:
>>
>> * All LAPACK symbols in my package, including those that
>>  coincidentally have a definition in libRlapack, resolve to
>>  /usr/lib64/liblapack.so?
>
> Yes.  libRlapack.so will not be in the search path.
>
>> * liblapack.so will be found without any need to specify
>>  LD_LIBRARY_PATH, or other configuration variables? Or is the library
>>  being found because my LD_LIBRARY_PATH already includes /usr/lib64?
>
> Both ld (used for linking) and ld.so (used a runtime) look in that
> path by default.
>
>>  If the latter, how can the user 'b

[Rd] Native implementation of rowMedians()

2007-05-13 Thread Henrik Bengtsson
Hi,

I've got a version of rowMedians(x, na.rm=FALSE) for matrices that
handles missing values implemented in C.  It has been optimized for
memory and speed.  To avoid coercing integers to doubles, and hence
allocate an additional 200% memory, there is one C function for
integers and one for doubles.

The rowMedians() implementation is currently sitting in my non-CRAN
package R.native available by:

source("http://www.braju.com/R/hbLite.R";)
hbLite("R.native")
library(R.native)
example(rowMedians)

The source code package is available at:

 http://www.braju.com/R/repos/R.native_0.1.2.tar.gz

Before I submit a package to CRAN consisting of pretty much just
rowMedians(), would it make more sense for it to go into one of the
core packages?  If so, how should I proceed?

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] relist, an inverse operator to unlist

2007-05-13 Thread Andrew Clausen
Hi Gabor,

Thanks for the interesting suggestion.  I must confess I got lost -- is
it something like this?
 * unlist() could attach skeleton to every vector it returns.
 * relist() could then use the skeleton attached to the vector to reconstruct
the object.  The interface might be

relist <- function(flesh, skeleton=attributes(flesh)$skeleton)

For example:

par <- list(mean=c(0, 0), vcov(rbind(c(1, 1), c(1, 1
vector.for.optim <- unlist(par)
print(attributes(vector.optim)$skeleton)# the skeleton is stored!
converted.back.again <- relist(par)

Some concerns:
 * the metadata might get lost in some applications -- although it seems
to work fine with optim().  But, if we provide both interfaces (where
skeleton=flesh$skeleton is the default), then there should be no problem.
 * would there be any bad side-effects of changing the existing unlist
interface?  I suppose an option like "save.skeleton" could be added to unlist.
I expect there would be some objections to enabling this as default behaviour,
as it would significantly increase the storage requirements of the output.

Cheers,
Andrew

On Sun, May 13, 2007 at 07:02:37PM -0400, Gabor Grothendieck wrote:
> I suggest you define a "relist" class and then define an unlist
> method for it which stores the skeleton as an attribute.  Then
> one would not have to specify skeleton in the relist command
> so
> 
> relist(unlist(relist(x))) === x
> 
> 1. relist(x) is the same as x except it gets an additional class "relist".
> 2. unlist(relist(x)) invokes the relist method of unlist on relist(x)
> returning another relist object
> 3. relist(unlist(relist(x))) then recreates relist(x)
> 
> 
> On 5/13/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> >Hi all,
> >
> >I wrote a function called relist, which is an inverse to the existing
> >unlist function:
> >
> >   http://www.econ.upenn.edu/~clausen/computing/relist.R
> >
> >Some functions need many parameters, which are most easily represented in
> >complex structures.  Unfortunately, many mathematical functions in R,
> >including optim, nlm, and grad can only operate on functions whose domain 
> >is
> >a vector.  R has a function to convert complex objects into a vector
> >representation.  This file provides an inverse operation called "unlist" to
> >convert vectors back to the convenient structural representation.  
> >Together,
> >these functions allow structured functions to have simple mathematical
> >interfaces.
> >
> >For example, a likelihood function for a multivariate normal model needs a
> >variance-covariance matrix and a mean vector.  It would be most convenient 
> >to
> >represent it as a list containing a vector and a matrix.  A typical 
> >parameter
> >might look like
> >
> >   list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
> >
> >However, optim can't operate on functions that take lists as input; it
> >only likes vectors.  The solution is conversion:
> >
> >initial.param <- list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
> >
> >ll <- function(param.vector)
> >{
> >   param <- relist(initial.param, param.vector)
> >   -sum(dnorm(x, mean=param$mean, vcov=param$vcov, log=TRUE))
> >   # note: dnorm doesn't do vcov... but I hope you get the 
> >   point
> >}
> >
> >optim(unlist(initial.param), ll)
> >
> >"relist" takes two parameters: skeleton and flesh.  Skeleton is a sample
> >object that has the right "shape" but the wrong content.  "flesh" is a 
> >vector
> >with the right content but the wrong shape.  Invoking
> >
> >   relist(skeleton, flesh)
> >
> >will put the content of flesh on the skeleton.
> >
> >As long as "skeleton" has the right shape, it should be a precise inverse
> >of unlist.  These equalities hold:
> >
> >   relist(skeleton, unlist(x)) == x
> >   unlist(relist(skeleton, y)) == y
> >
> >Is there any easy way to do this without my new relist function?  Is there 
> >any
> >interest in including this in R's base package?  (Or anywhere else?)  Any
> >comments on the implementation?
> >
> >Cheers,
> >Andrew
> >
> >__
> >R-devel@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
> >

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] relist, an inverse operator to unlist

2007-05-13 Thread Gabor Grothendieck
unlist would not attach a skeleton to every vector it returns, only
the relist method of unlist would.   That way just that method needs
to be added and no changes to unlist itself are needed.

Before applying unlist to an object you would coerce the object to
class "relist" to force the relist method of unlist to be invoked.

Here is an outline of the code:

as.relist <- function(x) {
   if (!inherits(x, "relist")) class(x) <- c("relist", class(x))
   x
}

unlist.relist <- function(x, ...) {
   y <- x
   cl <- class(y)
   class(y) <- cl[- grep("relist", cl)]
   z <- unlist(y)
   attr(z, "relist") <- y
   as.relist(z)
}

relist <- function(x, skeleton = attr(x, "relist")) {
   # simpler version of relist so test can be executed
   skeleton
}

# test
x <- list(a = 1:2, b = 3)
class(as.relist(x))
unlist(as.relist(x))
relist(unlist(as.relist(x)))


On 5/14/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> Thanks for the interesting suggestion.  I must confess I got lost -- is
> it something like this?
>  * unlist() could attach skeleton to every vector it returns.
>  * relist() could then use the skeleton attached to the vector to reconstruct
> the object.  The interface might be
>
>relist <- function(flesh, skeleton=attributes(flesh)$skeleton)
>
> For example:
>
>par <- list(mean=c(0, 0), vcov(rbind(c(1, 1), c(1, 1
>vector.for.optim <- unlist(par)
>print(attributes(vector.optim)$skeleton)# the skeleton is stored!
>converted.back.again <- relist(par)
>
> Some concerns:
>  * the metadata might get lost in some applications -- although it seems
> to work fine with optim().  But, if we provide both interfaces (where
> skeleton=flesh$skeleton is the default), then there should be no problem.
>  * would there be any bad side-effects of changing the existing unlist
> interface?  I suppose an option like "save.skeleton" could be added to unlist.
> I expect there would be some objections to enabling this as default behaviour,
> as it would significantly increase the storage requirements of the output.
>
> Cheers,
> Andrew
>
> On Sun, May 13, 2007 at 07:02:37PM -0400, Gabor Grothendieck wrote:
> > I suggest you define a "relist" class and then define an unlist
> > method for it which stores the skeleton as an attribute.  Then
> > one would not have to specify skeleton in the relist command
> > so
> >
> > relist(unlist(relist(x))) === x
> >
> > 1. relist(x) is the same as x except it gets an additional class "relist".
> > 2. unlist(relist(x)) invokes the relist method of unlist on relist(x)
> > returning another relist object
> > 3. relist(unlist(relist(x))) then recreates relist(x)
> >
> >
> > On 5/13/07, Andrew Clausen <[EMAIL PROTECTED]> wrote:
> > >Hi all,
> > >
> > >I wrote a function called relist, which is an inverse to the existing
> > >unlist function:
> > >
> > >   http://www.econ.upenn.edu/~clausen/computing/relist.R
> > >
> > >Some functions need many parameters, which are most easily represented in
> > >complex structures.  Unfortunately, many mathematical functions in R,
> > >including optim, nlm, and grad can only operate on functions whose domain
> > >is
> > >a vector.  R has a function to convert complex objects into a vector
> > >representation.  This file provides an inverse operation called "unlist" to
> > >convert vectors back to the convenient structural representation.
> > >Together,
> > >these functions allow structured functions to have simple mathematical
> > >interfaces.
> > >
> > >For example, a likelihood function for a multivariate normal model needs a
> > >variance-covariance matrix and a mean vector.  It would be most convenient
> > >to
> > >represent it as a list containing a vector and a matrix.  A typical
> > >parameter
> > >might look like
> > >
> > >   list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
> > >
> > >However, optim can't operate on functions that take lists as input; it
> > >only likes vectors.  The solution is conversion:
> > >
> > >initial.param <- list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
> > >
> > >ll <- function(param.vector)
> > >{
> > >   param <- relist(initial.param, param.vector)
> > >   -sum(dnorm(x, mean=param$mean, vcov=param$vcov, log=TRUE))
> > >   # note: dnorm doesn't do vcov... but I hope you get the
> > >   point
> > >}
> > >
> > >optim(unlist(initial.param), ll)
> > >
> > >"relist" takes two parameters: skeleton and flesh.  Skeleton is a sample
> > >object that has the right "shape" but the wrong content.  "flesh" is a
> > >vector
> > >with the right content but the wrong shape.  Invoking
> > >
> > >   relist(skeleton, flesh)
> > >
> > >will put the content of flesh on the skeleton.
> > >
> > >As long as "skeleton" has the right shape, it should be a precise inverse
> > >of unlist.  These equalities hold:
> > >
> > >   relist(skeleton, unlist(x)) == x
> > >   unlist(relist(skeleto