Re: [Rd] environment question

2010-12-27 Thread Duncan Murdoch

On 10-12-26 4:30 PM, Paul Johnson wrote:
> Hello, everybody.
>
> I'm putting together some lecture notes and course exercises on R
> programming.  My plan is to pick some R packages, ask students to read
> through code and see why things work, maybe make some changes.  As I
> look for examples, I'm running up against the problem that packages
> use coding idioms that are unfamiliar to me.
>
> A difficult thing for me is explaining scope of variables in R
> functions.  When should we pass an object to a function, when should
> we let the R system search about for an object?  I've been puzzling
> through ?environment for quite a while.

Take a look at the Language Definition, not just the ?environment page.

>
> Here's an example from one of the packages that I like, called "ltm".
> In the function "ltm.fit" the work of calculating estimates is sent to
> different functions like "EM' and "loglikltm" and "scoreltm".  Before
> that, this is used:
>
> environment(EM)<- environment(loglikltm)<- environment(scoreltm)<-
> environment()
>
> ##and then EM is called
> res.EM<- EM(betas, constraint, control$iter.em, control$verbose)
>
> I want to make sure I understand this. The environment line gets the
> current environment and then assigns it for those 3 functions, right?
> All variables and functions that can be accessed from the current
> position in the code become available to function EM, loglikltm,
> scoreltm.

That's one way to think of it, but it is slightly more accurate to say 
that three new functions are created, whose associated environments are 
set to the current environment.


>
> So, which options should be explicitly inserted into a function call,
> which should be left in the environment for R to find when it needs
> them?

That's a matter of style.  I would say that it is usually better style 
not to mess around with a function's environment.


>
> 1. I *think* that when EM is called, the variables "betas",
> "constraint", and "control" are already in the environment.

That need not be true, as long as they are in the environment by the 
time EM, loglikltm, scoreltm are called.


>
> The EM function is declared like this, using the same words "beta" and
> "constraint"
>
> EM<-
> function (betas, constraint, iter, verbose = FALSE) {
>
> It seems to me that if I wrote the function call like this (leave out
> "betas" and "constraint")
>
> res.EM<- EM(control$iter.em, control$verbose)
>
> R will run EM and go find "betas" and "constraint" in the environment,
> there was no need to name them as arguments.

Including them as arguments means that new local copies will be created 
in the evaluation frame.


>
>
> 2 Is a function like EM allowed to alter objects that it finds through
> the environment, ones that are not passed as arguments? I understand
> that a function cannot alter an object that is passed explicitly, but
> what about the ones it grabs from the environment?

Yes it's allowed, but the usual rules of assignment won't do it.  Read 
about the <<- operator for modifying things that are not local.  In summary:


 beta <- 1

creates or modifies a new local variable, while

 beta <<- 1

goes looking for beta, and modifies the first one it finds.  If it fails 
to find one, it creates one in the global environment.


Duncan Murdoch

> If you have ideas about packages that might be handy teaching
> examples, please let me know.
>
> pj

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] minor problem in strsplit help file

2010-12-27 Thread Martin Maechler
> "PatB" == Patrick Burns 
> on Fri, 24 Dec 2010 10:27:23 + writes:

PatB> The 'extended' argument to 'strsplit' has been
PatB> removed, but it is still mentioned in the argument
PatB> items in the help file for 'fixed' and 'perl'.

Indeed; thank you Pat!
I've committed a fix.

Martin

PatB> -- Patrick Burns pbu...@pburns.seanet.com twitter:
PatB> @portfolioprobe http://www.portfolioprobe.com/blog
PatB> http://www.burns-stat.com (home of 'Some hints for the
PatB> R beginner' and 'The R Inferno')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] aperm() should retain class of input object

2010-12-27 Thread Michael Friendly
aperm() was designed for multidimensional arrays, but is also useful for 
table objects, particularly
with the lattice, vcd and vcdExtra packages.  But aperm() was designed 
and implemented before other
related object classes were conceived, and I propose a small tune-up to 
make it more generally useful.


The problem is that  aperm() always returns an object of class 'array', 
which causes problems for methods
designed for table objects. It also requires some package writers to 
implement both .array and .table

methods for the same functionality, usually one in terms of the other.
Some examples of unexpected, and initially perplexing results (when only 
methods for one class are implemented)

are shown below.


> library(vcd)
> pairs(UCBAdmissions, shade=TRUE)
> UCB <- aperm(UCBAdmissions, c(2, 1, 3))
>
> # UCB is now an array, not a table
> pairs(UCB, shade=TRUE)
There were 50 or more warnings (use warnings() to see the first 50)
>
> # fix it, to get pairs.table
> class(UCB) <- "table"
> pairs(UCB, shade=TRUE)
>



Of course, I can define a new function, tperm() that does what I think 
should be the expected behavior:


# aperm, for table objects

tperm <- function(a, perm, resize = TRUE) {
result <- aperm(a, per, resize)
class(result) <- class(a)
result
}

But I think it is more natural to include this functionality in aperm() 
itself.  Thus, I propose the following

revision of base::aperm(), at the R level:

aperm <- function (a, perm, resize = TRUE, keep.class=TRUE)
{
if (missing(perm))
perm <- integer(0L)
result <- .Internal(aperm(a, perm, resize))
if(keep.class) class(result) <- class(a)
result
}


I don't think this would break any existing code, except where someone 
depended on coercion to an array.
The drop-in replacement for aperm would set keep.class=FALSE by default, 
but I think TRUE is  more

natural.

FWIW, here are the methods for table and array objects
from my current (non-representative) session.

> methods(class="table")
 [1] as.data.frame.table barchart.table* cloud.table*
contourplot.table*  dotplot.table*
 [6] head.table* levelplot.table*pairs.table*
plot.table* print.table

[11] summary.table   tail.table*

   Non-visible functions are asterisked
>
> methods(class="array")
[1] anyDuplicated.array as.data.frame.array as.raster.array*
barchart.array* contourplot.array*  dotplot.array*

[7] duplicated.arraylevelplot.array*unique.array


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: sapply() limitation from vector to matrix, but not further

2010-12-27 Thread Martin Maechler
Finally finding time to come back to this.
Remember that I've started the thread by proposing a version of sapply()
which does not just "stop" with making a matrix() from the lapply() result, but
instead --- only when the new argument ARRAY = TRUE is set ---
may return an array() of any (appropriate) order, in those cases where
the lapply() result elements all return an array of the same dim().

On Wed, Dec 1, 2010 at 19:51, Hadley Wickham  wrote:
>> A downside of that approach is that lapply(X,...) can
>> cause a lot of unneeded memory to be allocated (length(X)
>> SEXP's).  Those SEXP's would be tossed out by simplify() but
>> the peak memory usage would remain high.  sapply() can
>> be written to avoid the intermediate list structure.
>
> But the upside is reusable code that can be used in multiple places -
> what about the simplification code used by mapply and tapply? Why are
> there three different implementations of simplification?
>
> Hadley

I have now looked into using a version of what Hadley had proposed.
Note (to Bill's point) that the current implementation of sapply()
does go via lapply() and
that we have  vapply()  as a faster version of sapply()  with less
copying (hopefully).

Very unfortunately, vapply() .. which was only created 13 months ago,
has inherited the ``illogical''  behavior of  sapply()
in that it does not make up higher rank arrays if the single element
is already a matrix (say).
...
Consequently, we also need a patch to vapply(),
and I do wonder if we should not make "ARRAY=TRUE" the default there,
since with vapply() you specify a result value, and if you specify a
matrix, the total result should stack these matrices into an array of
rank 3, etc.
Looking at it, the patch is not so much work... notably if we don't
use a new argument but really let  FUN.VALUE determine what the result
should look like.

More comments are stil welcome...
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: sapply() limitation from vector to matrix, but not further

2010-12-27 Thread Gabor Grothendieck
On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
 wrote:
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)

It would reduce the proliferation of arguments if the simplify=
argument were extended to allow this, e.g. simplify = "array" or
perhaps simplify = n would allow a maximum of n dimensions.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] rJava question

2010-12-27 Thread Dominick Samperi
After some trial and error I figured out how to pass matrices from R to java
and back
using rJava, but this method is not documented and I wonder if there is a
better way?

Anyway, here is what I found works:

(m = matrix(as.double(1:12),3,4))
[shows m as you would expect]

jtest <- .jnew("JTest")
(v <- .jcall(jtest, '[[D], 'myfunc', .jarray(m), evalArray=FALSE))
[shows v = m + 10]

Here the JTest class has a method named myfunc that accepts
a double[][] and returns a double[][]. It simply adds 10 to every
element.

The parameter 'evalArray' is confusing because when
evalArray=TRUE the result is NOT evaluated (a list is returned
that you then have to apply .jevalArray to do get the answer).

There seems to be an option to have a java reference returned
instead of the actual matrix. Can the R side manipulate the
matrix (on the java side) through this reference?

Thanks,
Dominick

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel