[Rd] Summer of Code, LLVM, parallelization and R

2009-03-15 Thread Florian Gross

Hi everybody,

I'm currently working towards my Master's degree as a student of  
Computer Science at the University of Saarbrücken and highly  
interested in compiler construction, interpretation techniques,  
optimization, programming languages and more. :)


Two professors of my university approached me about an interesting  
project just a few days ago: Developing a LLVM-based JIT compilation  
back-end for R. The primary goal would be the generation of parallel /  
vectorized code, but other ways of increasing performance might be  
very interesting as well.


I've thought a bit about this and am now wondering if this would make  
sense as a project for Google's Summer of Code program -- I have seen  
that the R foundation was accepted as a mentoring organization in 2008  
and has applied to be one again in this year.


I've already taken part in the SoC program thrice (working on Novell's  
JScript.NET compiler and run-time environment in 2005, writing a  
debugger for the Ruby programming language in 2006 and working on a  
detailed specification for the Ruby programming language in 2007) and  
it has always been a lot of fun and a great experience. One thing that  
was particularly helpful was getting into contact with the development  
communities so easily.


What do you folks think? Would this be of benefit to the R community?  
Would it be a good candidate for this year's SoC installment? :)


Also, if some thinking in this direction has already been done or if  
you have any other pointers, please don't hesitate to reply!


Thanks a lot in advance!

Kind regards,
Florian Gross
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Could you please add "time<-" as a generic function in the 'stats' package ?

2009-03-15 Thread Yohan Chalabi
 "JC" == John Chambers 
 on Wed, 11 Mar 2009 19:10:29 -0700

   JC> The problems are related to masking objects (in this case ) in
   JC> the search list, not especially related to methods.
   JC>
   JC> It was in order to get around such problems that NAMESPACE
   JC> was added to
   JC> R.  You should use it, but it applies to evaluating calls
   JC> to functions
   JC> in the package, by avoiding the dependency on the order of
   JC> packages in
   JC> the search list.  To ensure correct results, you need to call a
   JC> function from your package (i.e., one that is not masked).  The
   JC> computations in the function will see what has been imported
   JC> into the
   JC> namespace.
   JC>
   JC> For example, if you do the following:
   JC>
   JC> 1.  add a NAMESPACE file, for example containing:
   JC>
   JC> import(stats)
   JC> import(zoo)
   JC> exportPattern(^[a-zA-Z])
   JC>
   JC> 2.  Do the computations in a function in your package,
   JC> say doDemo(),
   JC> with a few show(time()) lines added to print things.
   JC>
   JC> 3.  With the import(zoo), no need to define as an S3 generic.
   JC>
   JC> Then things behave with or without zoo attached, because the
   JC> computations are defined by your namespace.


Thank you for your responses.

'timeSeries' and 'zoo' both have functionality for time series
management. Although they have similar concepts, they are intrinsically
different; the former package uses S4 classes and the latter S3 classes.

Until now both packages have been able to coexist and have been  
independent from each other.

As I mentioned in my previous post, both packages define methods to  
extract timestamps of their respective classes with the function  
'time' .

I agree with you that if we had used a function name and its  
assignment version defined in 'zoo', we should import it from their  
namespace. But in this case, 'time<-' is the natural extension of a  
function already present in a base package.

Until now we defined the S3 generic 'time<-' so that both packages  
could coexist without needing to import the function from the  
namespace of the other. But this workaround won't work anymore if we  
define an S4 generic.

We are thus asking the R developers if they could add 'time<-'  as a  
generic in 'stats' because it is the natural extension of an existing  
function. This will ensure that packages can continue to coexist and  
remain independent.

Best regards,
Yohan

-- 
PhD student
Swiss Federal Institute of Technology
Zurich

www.ethz.ch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Assigning to factor[[i]]

2009-03-15 Thread Stavros Macrakis
I am a bit confused about the semantics of classes, [, and [[.

For at least some important built-in classes (factors and dates), both
the getter and the setter methods of [ operate on the class, but
though the getter method of [[ operates on the class, the setter
method operates on the underlying vector.  Is this behavior
documented? (I haven't found any documentation of it.) Is it
intentional?  (i.e. is it a bug or a feature?)  There are also cases
where invalid assignments don't signal an error.

A simple example:

> fact <- factor(2,levels=2:4)# master copy
> f0 <- fact; f0; dput(f0)
[1] 2
Levels: 2 3 4
structure(1L, .Label = c("2", "3", "4"), class = "factor")

> f0 <- fact; f0[1] <- 3; f0; dput(f0) # use [ setter
[1] 3
Levels: 2 3 4
structure(2L, .Label = c("2", "3", "4"), class = "factor")


> f0 <- fact; f0[[1]] <- 3L; f0; dput(f0)   # use [[ setter
[1] 4# ? didn't
convert 3 to factor
Levels: 2 3 4
structure(3L, .Label = c("2", "3", "4"), class = "factor")   #
modified underlying vector
> f0[1]
[1] 4
Levels: 2 3 4
# but result is a valid factor

> f0 <- fact; f0[[1]] <- 3; f0; dput(f0)   # use [[ setter
[1] 4
Levels: 2 3 4
structure(3, .Label = c("2", "3", "4"), class = "factor")  # didn't
convert to 3L
> f0[1]
Error in class(y) <- oldClass(x) :
  adding class "factor" to an invalid object

I suppose f0[1] and f0[[1]] fail here because the underlying vector
must be integer and not numeric? If so, why didn't assigning to
f0[[1]] cause an error? And why didn't printing f0 cause the same
error?

Here are some more examples. Consider

fac <- factor(c("b","a","c"),levels=c("b","c","a"))

f <- fac; f[1] <- "c"; dput(f)
# structure(c(2L, 3L, 2L), .Label = c("b", "c", "a"), class = "factor")
 OK, implicit conversion of "c" to factor(c) was performed

f <- fac; f[1] <- 25; dput(f)
# Warning message:
# In `[<-.factor`(`*tmp*`, 1, value = 25) :
#   invalid factor level, NAs generated
# structure(c(NA, 3L, 2L), .Label = c("b", "c", "a"), class = "factor")
 OK, error given for invalid value, which becomes an NA
 Same thing happens for f[1]<-"foo"

So far, so good.  Now compare to what happens with fac[[...]] <- ...

f <- fac; f[[1]] <- 25; dput(f)
# structure(c(25, 3, 2), .Label = c("b", "c", "a"), class = "factor")
 No error given, but invalid factor generated

f <- fac; f[[1]] <- "c"; dput(f)
# structure(c("c", "3", "2"), .Label = c("b", "c", "a"), class = "factor")
 No conversion performed; no error given; invalid factor generated

f
# [1]   
# Levels: b c a
 Prints as though it were factor(c(NA,NA,NA)) with no warning/error

f[]
# Error in class(y) <- oldClass(x) :
#  adding class "factor" to an invalid object
 But f[] gives an error
 Same error with f[1] and f[[1]]

Another interesting case is f[1] <- list(NULL) -- which correctly
gives an error -- versus f[[1]] <- list(), which gives no error but
results in an f which is not a factor at all:

f <- fac; f[[1]]<-list(); class(f); dput(f)
[1] "list"
list(list(), 3L, 2L)

I can see that being able to modify the underlying vector of a classed
object directly would be very valuable functionality, but there is an
assymmetry here: f[[1]]<- modifies the underlying vector, but f[[1]]
accesses the classed vector.  Presumably you need to do
unclass(f)[[1]] to see the underlying value.  But on the other hand,
unclass doesn't have a setter (`unclass<-`), so you can't say
unclass(f)[[1]] <- ...

I have not been able to find documentation of all this in the R
Language Definition or in the man page for [/[[, but perhaps I'm
looking in the wrong place?

-s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Conversion and rounding of POSIXct

2009-03-15 Thread Stavros Macrakis
POSIXct/lt supports fractional seconds (see Sub-second Accuracy
section of man page), but there seem to be some inconsistencies in
their handling.

Converting to POSIXlt and back does not give back the same time for
times before the origin:

> t0 <- as.POSIXct('1934-01-05 23:59:59.1')
> t0
[1] "1934-01-06 00:00:00 EST"  # rounding issue, see below
> as.POSIXlt(t0)
[1] "1934-01-06 00:00:00 EST"
> as.POSIXct(as.POSIXlt(t0))
[1] "1934-01-06 00:00:01 EST"   # ???
> as.POSIXct(as.POSIXlt(t0)) - t0
Time difference of 1 secs

Also, POSIXct always rounds up when printing for times before the origin:

> as.POSIXct('1934-01-05 10:10:23')
[1] "1934-01-05 10:10:23 EST"
> as.POSIXct('1934-01-05 10:10:23.1')
[1] "1934-01-05 10:10:24 EST"

and always rounds down when printing times after the origin:

as.POSIXct('2010-01-05 23:59:59.4')
[1] "2010-01-05 23:59:59 EST"
> as.POSIXct('2010-01-05 23:59:59.6')
[1] "2010-01-05 23:59:59 EST"
> as.POSIXct('2010-01-05 23:59:59.999')
[1] "2010-01-05 23:59:59 EST"

But the Description section says that POSIXct "represent[s] calendar
dates and times (to the nearest second)".  "Nearest" would seem to
imply printing rounding-to-nearest, not rounding-up or rounding-down.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conversion and rounding of POSIXct

2009-03-15 Thread Dirk Eddelbuettel

Stavros,

Two really quick comments:

a) you need to enable sub-second print formats
b) AFAIK pre-epoch times are second-class citizens

R> options("digits.secs"=6)   ## print with 6 digits for microseconds
R> t0 <- as.POSIXct('1974-01-05 23:59:59.1')
R> t0
[1] "1974-01-05 23:59:59.1 CST"
R> as.POSIXlt(t0)
[1] "1974-01-05 23:59:59.1 CST"
R> as.POSIXct(as.POSIXlt(t0)) - t0
Time difference of 0 secs

All that said, POSIXt is still under-documented and rather mysterious so I
won't / can't comment on all aspects of your post but the above should shed
some light on the first few items.

Hth, Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conversion and rounding of POSIXct

2009-03-15 Thread Stavros Macrakis
On Sun, Mar 15, 2009 at 1:04 PM, Dirk Eddelbuettel  wrote:

Dirk,

Thanks for your reply.

> a) you need to enable sub-second print formats

Yes, if I want to display sub-second printing.  But I was just looking
at the rounding behavior.

> b) AFAIK pre-epoch times are second-class citizens

In what sense?  That bugs in their handling won't be fixed?  If so, it
would be nice to document that.

Thanks again,

   -s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error compiling rgl package

2009-03-15 Thread Duncan Murdoch

On 12/03/2009 3:16 PM, Mohammad Nikseresht wrote:

Hi,

I receive the following error while I try to install rgl package:

CC -xtarget=native64 -I/opt/R-2.8.1/lib/R/include 
-I/opt/SUNWhpc/HPC8.1/sun/include -DHAVE_PNG_H -I/usr/include/libpng12 
-DHAVE_FREETYPE -Iext/ftgl -I/usr/sfw/include/freetype2 
-I/usr/sfw/include -Iext -I/opt/SUNWhpc/HPC8.1/sun/include 
-I/usr/sfw/include -I/opt/csw/include-KPIC  -O -c Background.cpp -o 
Background.o

"math.h", line 47: Error: modf is not a member of file level.
"math.h", line 48: Error: modff is not a member of file level.
"Shape.hpp", line 58: Error: The function "strncpy" must have a prototype.
3 Error(s) detected.

I am using Sun studio 12.
I suspect that this is an incompatibility between g++ and Sun studio CC.
I would appreciate any you could share your experience with me.


Brian Ripley contributed some patches that should help with this.  Could 
you check out the source from R-forge, and confirm that it now compiles 
on your system?  (Or wait for the tarball there to be updated to 0.84-1 
in a few hours, and download that.)


Thanks Brian, for the patch.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Could you please add "time<-" as a generic function in the 'stats' package ?

2009-03-15 Thread John Chambers
I understand the problem and wasn't voting either way on the S3 
replacement function generic you want in stats.  Prof. Ripley noted that 
it's odd to have stats doing that to solve the problems of two outside 
packages when  it doesn't even have the function concerned,  but others 
may have opinions either way.  It's certainly not a good precedent.  
Every time a package writes an S3 generic version of a function in a 
base package (or, in this case, not in a base package), should the base 
package convert its function to an S3 generic?  (This problem is one 
reason for the S4 implicit generic idea, so that methods can be written 
compatibly for existing functions.)

Your request requires writing and documenting the new function, so at 
least you should provide a patch that can be inserted without adding 
more work for R-core.

But that was not my main point.

The point is that such problems with name conflicts arise in many 
ways--I agree that they arise especially easily when one package uses S4 
and another S3 methods with the same named function.  It can and does 
arise anyway, e.g., the two versions of gam() noted in my book (p. 26). 
The general solution is to have a namespace for your package and to 
ensure that it imports only what you want.  Then the results are 
independent of packages attached, _provided_ the user is calling a 
function from your package.

Users calling both packages from the global environment may have to be 
specific as to which version they want, say by using the "::" operator.  
This is a consequence (a deficiency if you like) of the classic S and R 
rule of using the first version of a function encountered.  It's 
possible that the evaluator in the future could be more sophisticated 
and recognize the situation of compatible S3 and S4 functions, but it 
won't be for 2.9.0.

The addition to stats won't help unless/until zoo and any other package 
with a replacement version of time() removes that function.

John

Yohan Chalabi wrote:
> "JC" == John Chambers 
> on Wed, 11 Mar 2009 19:10:29 -0700
>   
>
>JC> The problems are related to masking objects (in this case ) in
>JC> the search list, not especially related to methods.
>JC>
>JC> It was in order to get around such problems that NAMESPACE
>JC> was added to
>JC> R.  You should use it, but it applies to evaluating calls
>JC> to functions
>JC> in the package, by avoiding the dependency on the order of
>JC> packages in
>JC> the search list.  To ensure correct results, you need to call a
>JC> function from your package (i.e., one that is not masked).  The
>JC> computations in the function will see what has been imported
>JC> into the
>JC> namespace.
>JC>
>JC> For example, if you do the following:
>JC>
>JC> 1.  add a NAMESPACE file, for example containing:
>JC>
>JC> import(stats)
>JC> import(zoo)
>JC> exportPattern(^[a-zA-Z])
>JC>
>JC> 2.  Do the computations in a function in your package,
>JC> say doDemo(),
>JC> with a few show(time()) lines added to print things.
>JC>
>JC> 3.  With the import(zoo), no need to define as an S3 generic.
>JC>
>JC> Then things behave with or without zoo attached, because the
>JC> computations are defined by your namespace.
>
>
> Thank you for your responses.
>
> 'timeSeries' and 'zoo' both have functionality for time series
> management. Although they have similar concepts, they are intrinsically
> different; the former package uses S4 classes and the latter S3 classes.
>
> Until now both packages have been able to coexist and have been  
> independent from each other.
>
> As I mentioned in my previous post, both packages define methods to  
> extract timestamps of their respective classes with the function  
> 'time' .
>
> I agree with you that if we had used a function name and its  
> assignment version defined in 'zoo', we should import it from their  
> namespace. But in this case, 'time<-' is the natural extension of a  
> function already present in a base package.
>   
That wasn't my point.   It was only your demo that required importing 
zoo into your dummy package.
> Until now we defined the S3 generic 'time<-' so that both packages  
> could coexist without needing to import the function from the  
> namespace of the other. But this workaround won't work anymore if we  
> define an S4 generic.
>
> We are thus asking the R developers if they could add 'time<-'  as a  
> generic in 'stats' because it is the natural extension of an existing  
> function. This will ensure that packages can continue to coexist and  
> remain independent.
>
> Best regards,
> Yohan
>
>   

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Definition of [[

2009-03-15 Thread Stavros Macrakis
The semantics of [ and [[ don't seem to be fully specified in the
Reference manual.  In particular, I can't find where the following
cases are covered:

> cc <- c(1); ll <- list(1)

> cc[3]
[1] NA
OK, RefMan says: If i is positive and exceeds length(x) then the
corresponding selection is NA.

> dput(ll[3])
list(NULL)
? i is positive and exceeds length(x); why isn't this list(NA)?

> ll[[3]]
Error in list(1)[[3]] : subscript out of bounds
? Why does this return NA for an atomic vector, but give an error for
a generic vector?

> cc[[3]] <- 34; dput(cc)
c(1, NA, 34)
OK

ll[[3]] <- 34; dput(ll)
list(1, NULL, 34)
Why is second element NULL, not NA?
And why is it OK to set an undefined ll[[3]], but not to get it?

I assume that these are features, not bugs, but I can't find
documentation for them.

-s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] surprising behaviour of names<-

2009-03-15 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
>
> Obviously, assuming that R really executes 
>   *tmp* <- x
>   x <- "names<-"('*tmp*', value=c("a","b"))
> under the hood, in the C code, then *tmp* does not end up in the symbol
> table and does not persist beyond the execution of 
>   names(x) <- c("a","b")
>
>   

to prove that i take you seriously, i have peeked into the code, and
found that indeed there is a temporary binding for *tmp* made behind the
scenes -- sort of. unfortunately, it is not done carefully enough to
avoid possible interference with the user's code:

'*tmp*' = 0
`*tmp*`
# 0

x = 1
names(x) = 'foo'
`*tmp*`
# error: object "*tmp*" not found

`*ugly*`

given that `*tmp*`is a perfectly legal (though some would say
'non-standard') name, it would be good if somewhere here a warning were
issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just
any non-standard name, but one that is 'obviously' used under the hood
to perform black magic.

it also appears that the explanation given in, e.g., the r language
definition (draft, of course) sec. 3.4.4:

"
Assignment to subsets of a structure is a special case of a general
mechanism for complex
assignment:
x[3:5] <- 13:15
The result of this commands is as if the following had been executed
‘*tmp*‘ <- x
x <- "[<-"(‘*tmp*‘, 3:5, value=13:15)
"

is incomplete (because the final result is not '*tmp*' having the value
of x, as it might seem, but rather '*tmp*' having been unbound).

so the suggestion for the documenters is to add to the end of the
section (or wherever else it is appropriate) a warning to the effect
that in the end '*tmp*' will be removed, even if the user has explicitly
defined it earlier in the same scope.

or maybe have the implementation not rely on a user-forgeable name? for
example, the '.Last.value' name is automatically bound to the most
recently returned value, but it resides in package:base and does not
collide with bindings using it made by the user:

.Last.value = 0

1
.Last.value
# 0, not 1

1
base::.Last.value
# 1, not 0


why could not '*tmp*' be bound and unbound outside of the user's
namespace? (i guess it's easier to update the docs -- or just ignore the
issue.)


on the margin, traceback('<-') will pick only one of the uses of '<-'
suggested by the code above:

x <- 1:10

trace('<-')
x[3:5] <- 13:15
# trace: x[3:5] <- 13:15
# trace: x <- `[<-`(`*tmp*`, 3:5, value = 13:15)

which is somewhat confusing, because then '*tmp*' appears in the trace
somewhat ex machina. (again, the explanation is in the source code, but
the traceback could have been more informative.)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-15 Thread Duncan Murdoch

On 15/03/2009 2:31 PM, Stavros Macrakis wrote:

The semantics of [ and [[ don't seem to be fully specified in the
Reference manual.  In particular, I can't find where the following
cases are covered:


cc <- c(1); ll <- list(1)



cc[3]

[1] NA
OK, RefMan says: If i is positive and exceeds length(x) then the
corresponding selection is NA.


dput(ll[3])

list(NULL)
? i is positive and exceeds length(x); why isn't this list(NA)?


Because the sentence you read was talking about "simple vectors", and ll 
is presumably not a simple vector.  So what is a simple vector?  That is 
not explicitly defined, and it probably should be.  I think it is 
"atomic vectors, except those with a class that has a method for [".





ll[[3]]

Error in list(1)[[3]] : subscript out of bounds
? Why does this return NA for an atomic vector, but give an error for
a generic vector?


cc[[3]] <- 34; dput(cc)

c(1, NA, 34)
OK

ll[[3]] <- 34; dput(ll)
list(1, NULL, 34)
Why is second element NULL, not NA?


NA is a length 1 atomic vector with a specific type matching the type of 
c.  It makes more sense in this context to put in a NULL, and return a 
list(NULL) for ll[3].



And why is it OK to set an undefined ll[[3]], but not to get it?


Lots of code grows vectors by setting elements beyond the end of them, 
so whether or not that's a good idea, it's not likely to change.


I think an argument could be made that ll[[toobig]] should return NULL 
rather than trigger an error, but on the other hand, the current 
behaviour allows the programmer to choose:  if you are assuming that a 
particular element exists, use ll[[element]], and R will tell you when 
your assumption is wrong.  If you aren't sure, use ll[element] and 
you'll get NA or list(NULL) if the element isn't there.



I assume that these are features, not bugs, but I can't find
documentation for them.


There is more documentation in the man page for Extract, but I think it 
is incomplete.  The most complete documentation is of course the source 
code, but it may not answer the question of what's intentional and 
what's accidental.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-15 Thread Stavros Macrakis
Duncan,

Thanks for the reply.

On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch  wrote:
> On 15/03/2009 2:31 PM, Stavros Macrakis wrote:

>> dput(ll[3])
>> list(NULL)
>> ? i is positive and exceeds length(x); why isn't this list(NA)?
>
> Because the sentence you read was talking about "simple vectors", and ll is
> presumably not a simple vector.  So what is a simple vector?  That is not
> explicitly defined, and it probably should be.  I think it is "atomic
> vectors, except those with a class that has a method for [".

The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
and 3.4.4 Subset assignment, so the context seems to be saying that
"simple vectors" are those which are not matrices or arrays, and those
("other structures") which do not overload [.

Even if the definition of 'simple vector' were clarified to cover only
atomic vectors, I still can't find any text specifying that list(3)[5]
=> lsit(NULL).

For that matter, it would leave the subscripting of important
built-ins such as factors and dates, etc. undefined. Obviously the
intuition is that vectors of factors or vectors of dates would do the
'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
say what that thing is

>>> ll[[3]]
>>
>> Error in list(1)[[3]] : subscript out of bounds
>> ? Why does this return NA for an atomic vector, but give an error for
>> a generic vector?
>>
>>> cc[[3]] <- 34; dput(cc)
>>
>> c(1, NA, 34)
>> OK
>>
>> ll[[3]] <- 34; dput(ll)
>> list(1, NULL, 34)
>> Why is second element NULL, not NA?
>
> NA is a length 1 atomic vector with a specific type matching the type of c.
>  It makes more sense in this context to put in a NULL, and return a
> list(NULL) for ll[3].

Understood that that's the rationale, but where is it documented?

Also, if that's the rationale, it seems to say that NULL is the
equivalent of NA for list elements, but in fact NULL does not function
like NA:

> is.na(NULL)
logical(0)
Warning message:
In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'
> is.na(list(NULL))
[1] FALSE

Indeed, NA seems to both up-convert and down-convert nicely to other
forms of NA:

> dput(as.integer(as.logical(c(TRUE,NA,TRUE
c(1L, NA, 1L)
> dput(as.logical(as.integer(c(TRUE,NA,TRUE
c(TRUE, NA, TRUE)

and are not converted to NULL when converted to generic vector:

> dput(as.list(c(TRUE,NA,TRUE)))
list(TRUE, NA, TRUE)

and NA is preserved when downconverting:

> dput(as.logical(as.list(c(TRUE,NA,23
c(TRUE, NA, TRUE)

But if you try to downconvert NULL, you get an error

> dput(as.integer(list(NULL)))
Error in isS4(x) : (list) object cannot be coerced to type 'integer'

So I don't see why NULL is the right way to represent NA, especially
since NULL is a perfectly good list element, distinct from NA.

>> And why is it OK to set an undefined ll[[3]], but not to get it?
>
> Lots of code grows vectors by setting elements beyond the end of them, so
> whether or not that's a good idea, it's not likely to change.

I wasn't suggesting changing this.

> I think an argument could be made that ll[[toobig]] should return NULL
> rather than trigger an error, but on the other hand, the current behaviour
> allows the programmer to choose:  if you are assuming that a particular
> element exists, use ll[[element]], and R will tell you when your assumption
> is wrong.  If you aren't sure, use ll[element] and you'll get NA or
> list(NULL) if the element isn't there.

Yes, that could make sense, but why would it be true for ll[[toobig]]
but not cc[[toobig]]?

>> I assume that these are features, not bugs, but I can't find
>> documentation for them.

> There is more documentation in the man page for Extract, but I think it is
> incomplete.

Yes, I was looking at that man page, and I don't think it resolves any
of the above questions.

> The most complete documentation is of course the source code,
> but it may not answer the question of what's intentional and what's
> accidental.

Well, that's one issue.  But another is that there should be a
specification addressed to users, who should not have to understand
internals.

 -s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-15 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
>
> Well, that's one issue.  But another is that there should be a
> specification addressed to users, who should not have to understand
> internals.
>   

this should really be taken seriously.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] miscomputation (PR#13594)

2009-03-15 Thread sarmad
Full_Name: Majid Sarmad
Version: 2.8.1
OS: Linux / Windows
Submission from: (NULL) (194.225.128.135)


With thanks to Alberto Viglione, in HW.tests function of homtest package, there
is the following line 

V2 <- (sum(ni * ((ti - tauReg)^2 + (t3i - tau3Reg)^2))/sum(ni)   )^0.5


which is a mistyping and leads to a miscomputation. It must be

V2 <- sum(ni * ((ti - tauReg)^2 + (t3i - tau3Reg)^2)   ^0.5)   /sum(ni)


as it is in help file of the function:

V2 = sum[i from 1 to k] ni {(t^(i) - t^R)^2 + (t3^(i) - t3^R)^2}^(1/2) / sum[i
from 1 to k] ni


Similarly, in

V2s[i] <- (sum(ni * ((ti.sim - tauReg.sim)^2 + (t3i.sim - 
tau3Reg.sim)^2))/sum(ni))^0.5

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug Report Fwd: MANOVA Data (PR#13595)

2009-03-15 Thread dvdbooth

 Hi.? There appears to be a bug in R function manova.? My friend and I both ran 
it the same way as shown below (his run) with the shown data set. His results 
are shown below. we both got the same results.? I was running with R 2.3.1. I'm 
not sure what version he used.
Thanks very much,
David Booth
Kent State University


 


 

-Original Message-
From: dvdbo...@cs.com
To: kb...@ilstu.edu
Sent: Sun, 15 Mar 2009 7:01 pm
Subject: Re: MANOVA Data











 Ken,

Did you notice that Wilks, Roy, etc p-values are all the same?? Pillai is 
almost the SAS result.? Can't figure it out.? I'll submit a bug report. What's 
Velleman going to talk about?? Thanks for looking at the R.

Best,

Dave





 





 



-Original Message-

From: Ken Berk 

To: dvdbo...@cs.com

Sent: Sun, 15 Mar 2009 3:45 pm

Subject: Re: Fwd: MANOVA Data














At 08:07 PM 3/5/2009, you wrote:



Hi Ken,


I've run the attached data set ( a one way MANOVA ex. from the SAS manual
chapter on MANOVA) in both SAS and R and I don't get the same
results.? Do you have any suggestions about how I can find out
what's going on?


Thanks,


Dave







-Original Message-


From: dvdbo...@cs.com


To: dvdbo...@aol.com


Sent: Thu, 5 Mar 2009 5:06 pm


Subject: MANOVA Data










Email message sent from CompuServe - visit us today at
http://www.cs.com





Email message sent from CompuServe - visit us today at
http://www.cs.com








Hello, David




My R results are clearly crap, as shown below.




The degrees of freedom are clearly wrong, as is apparent when looking at
the univariate anovas.




SAS gives the correct answers.




I don't know what to do about R.




Ken







COUNT??? REWGRP??? COMMIT???
SATIS??? STAY


1
1
16???
19?? 18


2
1
18???
15?? 17


3
1
18???
14?? 14


4
1
16???
20?? 10


5
1
15???
13?? 17


6
1
12???
15?? 11


7
2
16???
20?? 13


8
2
18???
14?? 16


9
2
13???
10?? 14


10??? 2
17???
13?? 19


11??? 2
14???
18?? 15


12??? 2
19???
16?? 18


13??? 3
20???
18?? 16


14??? 3
18???
15?? 19


15??? 3
13???
14?? 17


16??? 3
12???
16?? 15


17??? 3
16???
17?? 18


18??? 3
14???
19?? 15




> attach(booth)


> Y <- cbind(COMMIT, SATIS, STAY)


> fit <- manova(Y ~ REWGRP)


> summary(fit, test="Pillai")


? Df? Pillai
approx F num Df den Df Pr(>F)


REWGRP 1 0.22731?
1.37283? 3 14
0.2918


Residuals
16?



> summary(fit, test="Wilks")


? Df??
Wilks approx F num Df den Df Pr(>F)


REWGRP 1 0.77269?
1.37283? 3 14
0.2918


Residuals
16?



> summary(fit, test="Hotelling-Lawley")


? Df
Hotelling-Lawley approx F num Df den Df Pr(>F)


REWGRP
1? 0.29418?
1.37283? 3 14
0.2918


Residuals
16??



> summary(fit, test="Roy")


?
Df Roy approx F num Df den Df Pr(>F)


REWGRP 1 0.29418?
1.37283? 3 14
0.2918


Residuals
16?



> summary(fit)


? Df? Pillai
approx F num Df den Df Pr(>F)


REWGRP 1 0.22731?
1.37283? 3 14
0.2918


Residuals
16?



> summary.aov(fit)


?Response COMMIT :


???
Df? Sum Sq Mean Sq F value Pr(>F)


REWGRP?? 1??
0.333?? 0.333? 0.0532 0.8204


Residuals?? 16 100.167??
6.260??





?Response SATIS :


???
Df? Sum Sq Mean Sq F value Pr(>F)


REWGRP?? 1??
0.750?? 0.750? 0.0945 0.7625


Residuals?? 16 127.028??
7.939??





?Response STAY :


??? Df Sum
Sq Mean Sq F value Pr(>F)


REWGRP?? 1 14.083? 14.083?
2.3013 0.1488


Residuals?? 16 97.917??
6.120??





> 




 






Email message sent from CompuServe - visit us today at http://www.cs.com





 


Email message sent from CompuServe - visit us today at http://www.cs.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Definition of [[

2009-03-15 Thread Duncan Murdoch

Just a couple of inline comments down below:

On 15/03/2009 5:30 PM, Stavros Macrakis wrote:

Duncan,

Thanks for the reply.

On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch  wrote:

On 15/03/2009 2:31 PM, Stavros Macrakis wrote:



dput(ll[3])
list(NULL)
? i is positive and exceeds length(x); why isn't this list(NA)?

Because the sentence you read was talking about "simple vectors", and ll is
presumably not a simple vector.  So what is a simple vector?  That is not
explicitly defined, and it probably should be.  I think it is "atomic
vectors, except those with a class that has a method for [".


The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
and 3.4.4 Subset assignment, so the context seems to be saying that
"simple vectors" are those which are not matrices or arrays, and those
("other structures") which do not overload [.

Even if the definition of 'simple vector' were clarified to cover only
atomic vectors, I still can't find any text specifying that list(3)[5]
=> lsit(NULL).

For that matter, it would leave the subscripting of important
built-ins such as factors and dates, etc. undefined. Obviously the
intuition is that vectors of factors or vectors of dates would do the
'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
say what that thing is


ll[[3]]

Error in list(1)[[3]] : subscript out of bounds
? Why does this return NA for an atomic vector, but give an error for
a generic vector?


cc[[3]] <- 34; dput(cc)

c(1, NA, 34)
OK

ll[[3]] <- 34; dput(ll)
list(1, NULL, 34)
Why is second element NULL, not NA?

NA is a length 1 atomic vector with a specific type matching the type of c.
 It makes more sense in this context to put in a NULL, and return a
list(NULL) for ll[3].


Understood that that's the rationale, but where is it documented?

Also, if that's the rationale, it seems to say that NULL is the
equivalent of NA for list elements, but in fact NULL does not function
like NA:


is.na(NULL)

logical(0)
Warning message:
In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'

is.na(list(NULL))

[1] FALSE

Indeed, NA seems to both up-convert and down-convert nicely to other
forms of NA:


dput(as.integer(as.logical(c(TRUE,NA,TRUE

c(1L, NA, 1L)

dput(as.logical(as.integer(c(TRUE,NA,TRUE

c(TRUE, NA, TRUE)

and are not converted to NULL when converted to generic vector:


dput(as.list(c(TRUE,NA,TRUE)))

list(TRUE, NA, TRUE)

and NA is preserved when downconverting:


dput(as.logical(as.list(c(TRUE,NA,23

c(TRUE, NA, TRUE)

But if you try to downconvert NULL, you get an error


dput(as.integer(list(NULL)))

Error in isS4(x) : (list) object cannot be coerced to type 'integer'

So I don't see why NULL is the right way to represent NA, especially
since NULL is a perfectly good list element, distinct from NA.


And why is it OK to set an undefined ll[[3]], but not to get it?

Lots of code grows vectors by setting elements beyond the end of them, so
whether or not that's a good idea, it's not likely to change.


I wasn't suggesting changing this.


I think an argument could be made that ll[[toobig]] should return NULL
rather than trigger an error, but on the other hand, the current behaviour
allows the programmer to choose:  if you are assuming that a particular
element exists, use ll[[element]], and R will tell you when your assumption
is wrong.  If you aren't sure, use ll[element] and you'll get NA or
list(NULL) if the element isn't there.


Yes, that could make sense, but why would it be true for ll[[toobig]]
but not cc[[toobig]]?


But it is:

> cc <- c(1)
> cc[[3]]
Error in cc[[3]] : subscript out of bounds


I assume that these are features, not bugs, but I can't find
documentation for them.



There is more documentation in the man page for Extract, but I think it is
incomplete.


Yes, I was looking at that man page, and I don't think it resolves any
of the above questions.


The most complete documentation is of course the source code,
but it may not answer the question of what's intentional and what's
accidental.


Well, that's one issue.  But another is that there should be a
specification addressed to users, who should not have to understand
internals.


I agree, but not so strongly that I will drop everything and write one.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Using and 'eval' and environments with active bindings

2009-03-15 Thread Roger D. Peng
The following code produces an error in current R-devel

f <- function(value) {
if(!missing(value))
100
else
2
}
e <- new.env()
makeActiveBinding("x", f, e)
eval(substitute(list(x)), e)

The error, after calling 'eval' is

Error in eval(expr, envir, enclos) :
  element 1 is empty;
   the part of the args list of 'list' being evaluated was:
   (x)


It has something to do with the change in R_isMissing in revision
r48118 but I'm not quite knowledgeable enough to understand what the
problem is. In R 2.8.1 the result was simply


> eval(substitute(list(x)), e)
[[1]]
[1] 2

I can't say I know what the output should be but I'd like some
clarification on whether this is a bug.

Thanks,
-roger
-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [OT] Debian now has a new section 'gnu-r'

2009-03-15 Thread Dirk Eddelbuettel

Joerg Jaspert, one of the ftpmasters / archive maintainers within Debian,
today posted a new list of 'Sections' to debian-devel-announce (see eg here
http://www.nabble.com/forum/ViewPost.jtp?post=22524830&framed=y )

This now includes a new Section: 

   gnu-rEverything about GNU R, a statistical computation 
and
graphics system 

which gives R just about the same footing Perl and Python had -- a new
section in the archive (and Ruby, Java, Haskell, OcaML, Php got the same
treatment). I think none of the 'R-within-Debian' maintainers saw this
coming.   For the record, the current list of R packages within Debian is
included below.

Cheers, Dirk


r-base-dev  gnu-r
r-base-core gnu-r
r-base-core-ra  gnu-r
r-base  gnu-r   
r-cran-abindgnu-r
r-cran-acepack  gnu-r
r-cran-adaptgnu-r
r-cran-bayesm   gnu-r
r-cran-bitops   gnu-r
r-cran-boot gnu-r
r-cran-cairodevice  gnu-r
r-cran-car  gnu-r
r-cran-catools  gnu-r
r-cran-chrongnu-r
r-cran-cluster  gnu-r
r-cran-coda gnu-r
r-cran-codetoolsgnu-r
r-cran-combinat gnu-r
r-cran-date gnu-r
r-cran-dbi  gnu-r
r-cran-design   gnu-r
r-cran-eco  gnu-r
r-cran-effects  gnu-r
r-cran-farmagnu-r
r-cran-fasianoptionsgnu-r
r-cran-fassets  gnu-r
r-cran-fbasics  gnu-r
r-cran-fbonds   gnu-r
r-cran-fcalendargnu-r
r-cran-fcopulae gnu-r
r-cran-fecofin  gnu-r
r-cran-fexoticoptions   gnu-r
r-cran-fextremesgnu-r
r-cran-fgarch   gnu-r
r-cran-fimport  gnu-r
r-cran-fmultivargnu-r
r-cran-fnonlinear   gnu-r
r-cran-foptions gnu-r
r-cran-foreign  gnu-r
r-cran-fportfolio   gnu-r
r-cran-fregression  gnu-r
r-cran-fseries  gnu-r
r-cran-ftrading gnu-r
r-cran-funitroots   gnu-r
r-cran-futilities   gnu-r
r-cran-gdatagnu-r
r-cran-getopt   gnu-r
r-cran-gmapsgnu-r
r-cran-gmodels  gnu-r
r-cran-gplots   gnu-r
r-cran-gregmisc gnu-r
r-cran-gtools   gnu-r
r-cran-hdf5 gnu-r
r-cran-hmiscgnu-r
r-cran-its  gnu-r
r-cran-jit  gnu-r
r-cran-kernsmooth   gnu-r
r-cran-latticeextra gnu-r
r-cran-lattice  gnu-r
r-cran-lme4 gnu-r
r-cran-lmtest   gnu-r
r-cran-lpsolve  gnu-r
r-cran-mapdata  gnu-r
r-cran-maps gnu-r
r-cran-matchit  gnu-r
r-cran-matrix   gnu-r
r-cran-mcmcpack gnu-r
r-cran-mgcv gnu-r
r-cran-misc3d   gnu-r
r-cran-mnormt   gnu-r
r-cran-mnp  gnu-r
r-cran-multcomp gnu-r
r-cran-mvtnorm  gnu-r
r-cran-nlme gnu-r
r-cran-nws  gnu-r
r-cran-plotrix  gnu-r
r-cran-polsplinegnu-r
r-cran-pscl gnu-r
r-cran-psy  gnu-r
r-cran-qtl  gnu-r
r-cran-quadprog gnu-r
r-cran-rcmdrgnu-r
r-cran-rcolorbrewer gnu-r
r-cran-rcpp gnu-r
r-cran-relimp   gnu-r
r-cran-rggobi   gnu-r
r-cran-rgl  gnu-r
r-cran-rglpkgnu-r
r-cran-rgtk2gnu-r
r-cran-rjavagnu-r
r-cran-rmetrics gnu-r
r-cran-rmpi gnu-r
r-cran-rmysql   gnu-r
r-cran-robustbase   gnu-r
r-cran-rocr gnu-r
r-cran-rodbcgnu-r
r-cran-rpartgnu-r
r-cran-rpvm gnu-r
r-cran-rquantlibgnu-r
r-cran-rserve   gnu-r
r-cran-rsprng   gnu-r
r-cran-runitgnu-r
r-cran-sandwich gnu-r
r-cran-sm   gnu-r
r-cran-sn   gnu-r
r-cran-snow gnu-r
r-cran-strucchange  gnu-r
r-cran-survival gnu-r
r-cran-timedate gnu-r
r-cran-timeseries   gnu-r
r-cran-tkrplot  gnu-r
r-cran-tseries  gnu-r
r-cran-urca  

Re: [Rd] surprising behaviour of names<-

2009-03-15 Thread Berwin A Turlach
G'day Wacek,

On Sun, 15 Mar 2009 21:01:33 +0100
Wacek Kusnierczyk  wrote:

> Berwin A Turlach wrote:
> >
> > Obviously, assuming that R really executes 
> > *tmp* <- x
> > x <- "names<-"('*tmp*', value=c("a","b"))
> > under the hood, in the C code, then *tmp* does not end up in the
> > symbol table and does not persist beyond the execution of 
> > names(x) <- c("a","b")
> >
> >   
> 
> to prove that i take you seriously, i have peeked into the code, and
> found that indeed there is a temporary binding for *tmp* made behind
> the scenes -- sort of. unfortunately, it is not done carefully enough
> to avoid possible interference with the user's code:
> 
> '*tmp*' = 0
> `*tmp*`
> # 0
> 
> x = 1
> names(x) = 'foo'
> `*tmp*`
> # error: object "*tmp*" not found
> 
> `*ugly*`

I agree, and I am a bit flabbergasted.  I had not expected that
something like this would happen and I am indeed not aware of anything
in the documentation that warns about this; but others may prove me
wrong on this.

> given that `*tmp*`is a perfectly legal (though some would say
> 'non-standard') name, it would be good if somewhere here a warning
> were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
> not just any non-standard name, but one that is 'obviously' used
> under the hood to perform black magic.

Now I wonder whether there are any other objects (with non-standard)
names) that can be nuked by operations performed under the hood.  

I guess the best thing is to stay away from non-standard names, if only
to save the typing of back-ticks. :)

Thanks for letting me know, I have learned something new today.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel