[Rd] Summer of Code, LLVM, parallelization and R
Hi everybody, I'm currently working towards my Master's degree as a student of Computer Science at the University of Saarbrücken and highly interested in compiler construction, interpretation techniques, optimization, programming languages and more. :) Two professors of my university approached me about an interesting project just a few days ago: Developing a LLVM-based JIT compilation back-end for R. The primary goal would be the generation of parallel / vectorized code, but other ways of increasing performance might be very interesting as well. I've thought a bit about this and am now wondering if this would make sense as a project for Google's Summer of Code program -- I have seen that the R foundation was accepted as a mentoring organization in 2008 and has applied to be one again in this year. I've already taken part in the SoC program thrice (working on Novell's JScript.NET compiler and run-time environment in 2005, writing a debugger for the Ruby programming language in 2006 and working on a detailed specification for the Ruby programming language in 2007) and it has always been a lot of fun and a great experience. One thing that was particularly helpful was getting into contact with the development communities so easily. What do you folks think? Would this be of benefit to the R community? Would it be a good candidate for this year's SoC installment? :) Also, if some thinking in this direction has already been done or if you have any other pointers, please don't hesitate to reply! Thanks a lot in advance! Kind regards, Florian Gross __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Could you please add "time<-" as a generic function in the 'stats' package ?
"JC" == John Chambers on Wed, 11 Mar 2009 19:10:29 -0700 JC> The problems are related to masking objects (in this case ) in JC> the search list, not especially related to methods. JC> JC> It was in order to get around such problems that NAMESPACE JC> was added to JC> R. You should use it, but it applies to evaluating calls JC> to functions JC> in the package, by avoiding the dependency on the order of JC> packages in JC> the search list. To ensure correct results, you need to call a JC> function from your package (i.e., one that is not masked). The JC> computations in the function will see what has been imported JC> into the JC> namespace. JC> JC> For example, if you do the following: JC> JC> 1. add a NAMESPACE file, for example containing: JC> JC> import(stats) JC> import(zoo) JC> exportPattern(^[a-zA-Z]) JC> JC> 2. Do the computations in a function in your package, JC> say doDemo(), JC> with a few show(time()) lines added to print things. JC> JC> 3. With the import(zoo), no need to define as an S3 generic. JC> JC> Then things behave with or without zoo attached, because the JC> computations are defined by your namespace. Thank you for your responses. 'timeSeries' and 'zoo' both have functionality for time series management. Although they have similar concepts, they are intrinsically different; the former package uses S4 classes and the latter S3 classes. Until now both packages have been able to coexist and have been independent from each other. As I mentioned in my previous post, both packages define methods to extract timestamps of their respective classes with the function 'time' . I agree with you that if we had used a function name and its assignment version defined in 'zoo', we should import it from their namespace. But in this case, 'time<-' is the natural extension of a function already present in a base package. Until now we defined the S3 generic 'time<-' so that both packages could coexist without needing to import the function from the namespace of the other. But this workaround won't work anymore if we define an S4 generic. We are thus asking the R developers if they could add 'time<-' as a generic in 'stats' because it is the natural extension of an existing function. This will ensure that packages can continue to coexist and remain independent. Best regards, Yohan -- PhD student Swiss Federal Institute of Technology Zurich www.ethz.ch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Assigning to factor[[i]]
I am a bit confused about the semantics of classes, [, and [[. For at least some important built-in classes (factors and dates), both the getter and the setter methods of [ operate on the class, but though the getter method of [[ operates on the class, the setter method operates on the underlying vector. Is this behavior documented? (I haven't found any documentation of it.) Is it intentional? (i.e. is it a bug or a feature?) There are also cases where invalid assignments don't signal an error. A simple example: > fact <- factor(2,levels=2:4)# master copy > f0 <- fact; f0; dput(f0) [1] 2 Levels: 2 3 4 structure(1L, .Label = c("2", "3", "4"), class = "factor") > f0 <- fact; f0[1] <- 3; f0; dput(f0) # use [ setter [1] 3 Levels: 2 3 4 structure(2L, .Label = c("2", "3", "4"), class = "factor") > f0 <- fact; f0[[1]] <- 3L; f0; dput(f0) # use [[ setter [1] 4# ? didn't convert 3 to factor Levels: 2 3 4 structure(3L, .Label = c("2", "3", "4"), class = "factor") # modified underlying vector > f0[1] [1] 4 Levels: 2 3 4 # but result is a valid factor > f0 <- fact; f0[[1]] <- 3; f0; dput(f0) # use [[ setter [1] 4 Levels: 2 3 4 structure(3, .Label = c("2", "3", "4"), class = "factor") # didn't convert to 3L > f0[1] Error in class(y) <- oldClass(x) : adding class "factor" to an invalid object I suppose f0[1] and f0[[1]] fail here because the underlying vector must be integer and not numeric? If so, why didn't assigning to f0[[1]] cause an error? And why didn't printing f0 cause the same error? Here are some more examples. Consider fac <- factor(c("b","a","c"),levels=c("b","c","a")) f <- fac; f[1] <- "c"; dput(f) # structure(c(2L, 3L, 2L), .Label = c("b", "c", "a"), class = "factor") OK, implicit conversion of "c" to factor(c) was performed f <- fac; f[1] <- 25; dput(f) # Warning message: # In `[<-.factor`(`*tmp*`, 1, value = 25) : # invalid factor level, NAs generated # structure(c(NA, 3L, 2L), .Label = c("b", "c", "a"), class = "factor") OK, error given for invalid value, which becomes an NA Same thing happens for f[1]<-"foo" So far, so good. Now compare to what happens with fac[[...]] <- ... f <- fac; f[[1]] <- 25; dput(f) # structure(c(25, 3, 2), .Label = c("b", "c", "a"), class = "factor") No error given, but invalid factor generated f <- fac; f[[1]] <- "c"; dput(f) # structure(c("c", "3", "2"), .Label = c("b", "c", "a"), class = "factor") No conversion performed; no error given; invalid factor generated f # [1] # Levels: b c a Prints as though it were factor(c(NA,NA,NA)) with no warning/error f[] # Error in class(y) <- oldClass(x) : # adding class "factor" to an invalid object But f[] gives an error Same error with f[1] and f[[1]] Another interesting case is f[1] <- list(NULL) -- which correctly gives an error -- versus f[[1]] <- list(), which gives no error but results in an f which is not a factor at all: f <- fac; f[[1]]<-list(); class(f); dput(f) [1] "list" list(list(), 3L, 2L) I can see that being able to modify the underlying vector of a classed object directly would be very valuable functionality, but there is an assymmetry here: f[[1]]<- modifies the underlying vector, but f[[1]] accesses the classed vector. Presumably you need to do unclass(f)[[1]] to see the underlying value. But on the other hand, unclass doesn't have a setter (`unclass<-`), so you can't say unclass(f)[[1]] <- ... I have not been able to find documentation of all this in the R Language Definition or in the man page for [/[[, but perhaps I'm looking in the wrong place? -s __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Conversion and rounding of POSIXct
POSIXct/lt supports fractional seconds (see Sub-second Accuracy section of man page), but there seem to be some inconsistencies in their handling. Converting to POSIXlt and back does not give back the same time for times before the origin: > t0 <- as.POSIXct('1934-01-05 23:59:59.1') > t0 [1] "1934-01-06 00:00:00 EST" # rounding issue, see below > as.POSIXlt(t0) [1] "1934-01-06 00:00:00 EST" > as.POSIXct(as.POSIXlt(t0)) [1] "1934-01-06 00:00:01 EST" # ??? > as.POSIXct(as.POSIXlt(t0)) - t0 Time difference of 1 secs Also, POSIXct always rounds up when printing for times before the origin: > as.POSIXct('1934-01-05 10:10:23') [1] "1934-01-05 10:10:23 EST" > as.POSIXct('1934-01-05 10:10:23.1') [1] "1934-01-05 10:10:24 EST" and always rounds down when printing times after the origin: as.POSIXct('2010-01-05 23:59:59.4') [1] "2010-01-05 23:59:59 EST" > as.POSIXct('2010-01-05 23:59:59.6') [1] "2010-01-05 23:59:59 EST" > as.POSIXct('2010-01-05 23:59:59.999') [1] "2010-01-05 23:59:59 EST" But the Description section says that POSIXct "represent[s] calendar dates and times (to the nearest second)". "Nearest" would seem to imply printing rounding-to-nearest, not rounding-up or rounding-down. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Conversion and rounding of POSIXct
Stavros, Two really quick comments: a) you need to enable sub-second print formats b) AFAIK pre-epoch times are second-class citizens R> options("digits.secs"=6) ## print with 6 digits for microseconds R> t0 <- as.POSIXct('1974-01-05 23:59:59.1') R> t0 [1] "1974-01-05 23:59:59.1 CST" R> as.POSIXlt(t0) [1] "1974-01-05 23:59:59.1 CST" R> as.POSIXct(as.POSIXlt(t0)) - t0 Time difference of 0 secs All that said, POSIXt is still under-documented and rather mysterious so I won't / can't comment on all aspects of your post but the above should shed some light on the first few items. Hth, Dirk -- Three out of two people have difficulties with fractions. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Conversion and rounding of POSIXct
On Sun, Mar 15, 2009 at 1:04 PM, Dirk Eddelbuettel wrote: Dirk, Thanks for your reply. > a) you need to enable sub-second print formats Yes, if I want to display sub-second printing. But I was just looking at the rounding behavior. > b) AFAIK pre-epoch times are second-class citizens In what sense? That bugs in their handling won't be fixed? If so, it would be nice to document that. Thanks again, -s __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Error compiling rgl package
On 12/03/2009 3:16 PM, Mohammad Nikseresht wrote: Hi, I receive the following error while I try to install rgl package: CC -xtarget=native64 -I/opt/R-2.8.1/lib/R/include -I/opt/SUNWhpc/HPC8.1/sun/include -DHAVE_PNG_H -I/usr/include/libpng12 -DHAVE_FREETYPE -Iext/ftgl -I/usr/sfw/include/freetype2 -I/usr/sfw/include -Iext -I/opt/SUNWhpc/HPC8.1/sun/include -I/usr/sfw/include -I/opt/csw/include-KPIC -O -c Background.cpp -o Background.o "math.h", line 47: Error: modf is not a member of file level. "math.h", line 48: Error: modff is not a member of file level. "Shape.hpp", line 58: Error: The function "strncpy" must have a prototype. 3 Error(s) detected. I am using Sun studio 12. I suspect that this is an incompatibility between g++ and Sun studio CC. I would appreciate any you could share your experience with me. Brian Ripley contributed some patches that should help with this. Could you check out the source from R-forge, and confirm that it now compiles on your system? (Or wait for the tarball there to be updated to 0.84-1 in a few hours, and download that.) Thanks Brian, for the patch. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Could you please add "time<-" as a generic function in the 'stats' package ?
I understand the problem and wasn't voting either way on the S3 replacement function generic you want in stats. Prof. Ripley noted that it's odd to have stats doing that to solve the problems of two outside packages when it doesn't even have the function concerned, but others may have opinions either way. It's certainly not a good precedent. Every time a package writes an S3 generic version of a function in a base package (or, in this case, not in a base package), should the base package convert its function to an S3 generic? (This problem is one reason for the S4 implicit generic idea, so that methods can be written compatibly for existing functions.) Your request requires writing and documenting the new function, so at least you should provide a patch that can be inserted without adding more work for R-core. But that was not my main point. The point is that such problems with name conflicts arise in many ways--I agree that they arise especially easily when one package uses S4 and another S3 methods with the same named function. It can and does arise anyway, e.g., the two versions of gam() noted in my book (p. 26). The general solution is to have a namespace for your package and to ensure that it imports only what you want. Then the results are independent of packages attached, _provided_ the user is calling a function from your package. Users calling both packages from the global environment may have to be specific as to which version they want, say by using the "::" operator. This is a consequence (a deficiency if you like) of the classic S and R rule of using the first version of a function encountered. It's possible that the evaluator in the future could be more sophisticated and recognize the situation of compatible S3 and S4 functions, but it won't be for 2.9.0. The addition to stats won't help unless/until zoo and any other package with a replacement version of time() removes that function. John Yohan Chalabi wrote: > "JC" == John Chambers > on Wed, 11 Mar 2009 19:10:29 -0700 > > >JC> The problems are related to masking objects (in this case ) in >JC> the search list, not especially related to methods. >JC> >JC> It was in order to get around such problems that NAMESPACE >JC> was added to >JC> R. You should use it, but it applies to evaluating calls >JC> to functions >JC> in the package, by avoiding the dependency on the order of >JC> packages in >JC> the search list. To ensure correct results, you need to call a >JC> function from your package (i.e., one that is not masked). The >JC> computations in the function will see what has been imported >JC> into the >JC> namespace. >JC> >JC> For example, if you do the following: >JC> >JC> 1. add a NAMESPACE file, for example containing: >JC> >JC> import(stats) >JC> import(zoo) >JC> exportPattern(^[a-zA-Z]) >JC> >JC> 2. Do the computations in a function in your package, >JC> say doDemo(), >JC> with a few show(time()) lines added to print things. >JC> >JC> 3. With the import(zoo), no need to define as an S3 generic. >JC> >JC> Then things behave with or without zoo attached, because the >JC> computations are defined by your namespace. > > > Thank you for your responses. > > 'timeSeries' and 'zoo' both have functionality for time series > management. Although they have similar concepts, they are intrinsically > different; the former package uses S4 classes and the latter S3 classes. > > Until now both packages have been able to coexist and have been > independent from each other. > > As I mentioned in my previous post, both packages define methods to > extract timestamps of their respective classes with the function > 'time' . > > I agree with you that if we had used a function name and its > assignment version defined in 'zoo', we should import it from their > namespace. But in this case, 'time<-' is the natural extension of a > function already present in a base package. > That wasn't my point. It was only your demo that required importing zoo into your dummy package. > Until now we defined the S3 generic 'time<-' so that both packages > could coexist without needing to import the function from the > namespace of the other. But this workaround won't work anymore if we > define an S4 generic. > > We are thus asking the R developers if they could add 'time<-' as a > generic in 'stats' because it is the natural extension of an existing > function. This will ensure that packages can continue to coexist and > remain independent. > > Best regards, > Yohan > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Definition of [[
The semantics of [ and [[ don't seem to be fully specified in the Reference manual. In particular, I can't find where the following cases are covered: > cc <- c(1); ll <- list(1) > cc[3] [1] NA OK, RefMan says: If i is positive and exceeds length(x) then the corresponding selection is NA. > dput(ll[3]) list(NULL) ? i is positive and exceeds length(x); why isn't this list(NA)? > ll[[3]] Error in list(1)[[3]] : subscript out of bounds ? Why does this return NA for an atomic vector, but give an error for a generic vector? > cc[[3]] <- 34; dput(cc) c(1, NA, 34) OK ll[[3]] <- 34; dput(ll) list(1, NULL, 34) Why is second element NULL, not NA? And why is it OK to set an undefined ll[[3]], but not to get it? I assume that these are features, not bugs, but I can't find documentation for them. -s __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] surprising behaviour of names<-
Berwin A Turlach wrote: > > Obviously, assuming that R really executes > *tmp* <- x > x <- "names<-"('*tmp*', value=c("a","b")) > under the hood, in the C code, then *tmp* does not end up in the symbol > table and does not persist beyond the execution of > names(x) <- c("a","b") > > to prove that i take you seriously, i have peeked into the code, and found that indeed there is a temporary binding for *tmp* made behind the scenes -- sort of. unfortunately, it is not done carefully enough to avoid possible interference with the user's code: '*tmp*' = 0 `*tmp*` # 0 x = 1 names(x) = 'foo' `*tmp*` # error: object "*tmp*" not found `*ugly*` given that `*tmp*`is a perfectly legal (though some would say 'non-standard') name, it would be good if somewhere here a warning were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just any non-standard name, but one that is 'obviously' used under the hood to perform black magic. it also appears that the explanation given in, e.g., the r language definition (draft, of course) sec. 3.4.4: " Assignment to subsets of a structure is a special case of a general mechanism for complex assignment: x[3:5] <- 13:15 The result of this commands is as if the following had been executed ‘*tmp*‘ <- x x <- "[<-"(‘*tmp*‘, 3:5, value=13:15) " is incomplete (because the final result is not '*tmp*' having the value of x, as it might seem, but rather '*tmp*' having been unbound). so the suggestion for the documenters is to add to the end of the section (or wherever else it is appropriate) a warning to the effect that in the end '*tmp*' will be removed, even if the user has explicitly defined it earlier in the same scope. or maybe have the implementation not rely on a user-forgeable name? for example, the '.Last.value' name is automatically bound to the most recently returned value, but it resides in package:base and does not collide with bindings using it made by the user: .Last.value = 0 1 .Last.value # 0, not 1 1 base::.Last.value # 1, not 0 why could not '*tmp*' be bound and unbound outside of the user's namespace? (i guess it's easier to update the docs -- or just ignore the issue.) on the margin, traceback('<-') will pick only one of the uses of '<-' suggested by the code above: x <- 1:10 trace('<-') x[3:5] <- 13:15 # trace: x[3:5] <- 13:15 # trace: x <- `[<-`(`*tmp*`, 3:5, value = 13:15) which is somewhat confusing, because then '*tmp*' appears in the trace somewhat ex machina. (again, the explanation is in the source code, but the traceback could have been more informative.) cheers, vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Definition of [[
On 15/03/2009 2:31 PM, Stavros Macrakis wrote: The semantics of [ and [[ don't seem to be fully specified in the Reference manual. In particular, I can't find where the following cases are covered: cc <- c(1); ll <- list(1) cc[3] [1] NA OK, RefMan says: If i is positive and exceeds length(x) then the corresponding selection is NA. dput(ll[3]) list(NULL) ? i is positive and exceeds length(x); why isn't this list(NA)? Because the sentence you read was talking about "simple vectors", and ll is presumably not a simple vector. So what is a simple vector? That is not explicitly defined, and it probably should be. I think it is "atomic vectors, except those with a class that has a method for [". ll[[3]] Error in list(1)[[3]] : subscript out of bounds ? Why does this return NA for an atomic vector, but give an error for a generic vector? cc[[3]] <- 34; dput(cc) c(1, NA, 34) OK ll[[3]] <- 34; dput(ll) list(1, NULL, 34) Why is second element NULL, not NA? NA is a length 1 atomic vector with a specific type matching the type of c. It makes more sense in this context to put in a NULL, and return a list(NULL) for ll[3]. And why is it OK to set an undefined ll[[3]], but not to get it? Lots of code grows vectors by setting elements beyond the end of them, so whether or not that's a good idea, it's not likely to change. I think an argument could be made that ll[[toobig]] should return NULL rather than trigger an error, but on the other hand, the current behaviour allows the programmer to choose: if you are assuming that a particular element exists, use ll[[element]], and R will tell you when your assumption is wrong. If you aren't sure, use ll[element] and you'll get NA or list(NULL) if the element isn't there. I assume that these are features, not bugs, but I can't find documentation for them. There is more documentation in the man page for Extract, but I think it is incomplete. The most complete documentation is of course the source code, but it may not answer the question of what's intentional and what's accidental. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Definition of [[
Duncan, Thanks for the reply. On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch wrote: > On 15/03/2009 2:31 PM, Stavros Macrakis wrote: >> dput(ll[3]) >> list(NULL) >> ? i is positive and exceeds length(x); why isn't this list(NA)? > > Because the sentence you read was talking about "simple vectors", and ll is > presumably not a simple vector. So what is a simple vector? That is not > explicitly defined, and it probably should be. I think it is "atomic > vectors, except those with a class that has a method for [". The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors, 3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures, and 3.4.4 Subset assignment, so the context seems to be saying that "simple vectors" are those which are not matrices or arrays, and those ("other structures") which do not overload [. Even if the definition of 'simple vector' were clarified to cover only atomic vectors, I still can't find any text specifying that list(3)[5] => lsit(NULL). For that matter, it would leave the subscripting of important built-ins such as factors and dates, etc. undefined. Obviously the intuition is that vectors of factors or vectors of dates would do the 'same thing' as vectors of integers or of strings, but 3.4.3 doesn't say what that thing is >>> ll[[3]] >> >> Error in list(1)[[3]] : subscript out of bounds >> ? Why does this return NA for an atomic vector, but give an error for >> a generic vector? >> >>> cc[[3]] <- 34; dput(cc) >> >> c(1, NA, 34) >> OK >> >> ll[[3]] <- 34; dput(ll) >> list(1, NULL, 34) >> Why is second element NULL, not NA? > > NA is a length 1 atomic vector with a specific type matching the type of c. > It makes more sense in this context to put in a NULL, and return a > list(NULL) for ll[3]. Understood that that's the rationale, but where is it documented? Also, if that's the rationale, it seems to say that NULL is the equivalent of NA for list elements, but in fact NULL does not function like NA: > is.na(NULL) logical(0) Warning message: In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL' > is.na(list(NULL)) [1] FALSE Indeed, NA seems to both up-convert and down-convert nicely to other forms of NA: > dput(as.integer(as.logical(c(TRUE,NA,TRUE c(1L, NA, 1L) > dput(as.logical(as.integer(c(TRUE,NA,TRUE c(TRUE, NA, TRUE) and are not converted to NULL when converted to generic vector: > dput(as.list(c(TRUE,NA,TRUE))) list(TRUE, NA, TRUE) and NA is preserved when downconverting: > dput(as.logical(as.list(c(TRUE,NA,23 c(TRUE, NA, TRUE) But if you try to downconvert NULL, you get an error > dput(as.integer(list(NULL))) Error in isS4(x) : (list) object cannot be coerced to type 'integer' So I don't see why NULL is the right way to represent NA, especially since NULL is a perfectly good list element, distinct from NA. >> And why is it OK to set an undefined ll[[3]], but not to get it? > > Lots of code grows vectors by setting elements beyond the end of them, so > whether or not that's a good idea, it's not likely to change. I wasn't suggesting changing this. > I think an argument could be made that ll[[toobig]] should return NULL > rather than trigger an error, but on the other hand, the current behaviour > allows the programmer to choose: if you are assuming that a particular > element exists, use ll[[element]], and R will tell you when your assumption > is wrong. If you aren't sure, use ll[element] and you'll get NA or > list(NULL) if the element isn't there. Yes, that could make sense, but why would it be true for ll[[toobig]] but not cc[[toobig]]? >> I assume that these are features, not bugs, but I can't find >> documentation for them. > There is more documentation in the man page for Extract, but I think it is > incomplete. Yes, I was looking at that man page, and I don't think it resolves any of the above questions. > The most complete documentation is of course the source code, > but it may not answer the question of what's intentional and what's > accidental. Well, that's one issue. But another is that there should be a specification addressed to users, who should not have to understand internals. -s __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Definition of [[
Stavros Macrakis wrote: > > Well, that's one issue. But another is that there should be a > specification addressed to users, who should not have to understand > internals. > this should really be taken seriously. vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] miscomputation (PR#13594)
Full_Name: Majid Sarmad Version: 2.8.1 OS: Linux / Windows Submission from: (NULL) (194.225.128.135) With thanks to Alberto Viglione, in HW.tests function of homtest package, there is the following line V2 <- (sum(ni * ((ti - tauReg)^2 + (t3i - tau3Reg)^2))/sum(ni) )^0.5 which is a mistyping and leads to a miscomputation. It must be V2 <- sum(ni * ((ti - tauReg)^2 + (t3i - tau3Reg)^2) ^0.5) /sum(ni) as it is in help file of the function: V2 = sum[i from 1 to k] ni {(t^(i) - t^R)^2 + (t3^(i) - t3^R)^2}^(1/2) / sum[i from 1 to k] ni Similarly, in V2s[i] <- (sum(ni * ((ti.sim - tauReg.sim)^2 + (t3i.sim - tau3Reg.sim)^2))/sum(ni))^0.5 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug Report Fwd: MANOVA Data (PR#13595)
Hi.? There appears to be a bug in R function manova.? My friend and I both ran it the same way as shown below (his run) with the shown data set. His results are shown below. we both got the same results.? I was running with R 2.3.1. I'm not sure what version he used. Thanks very much, David Booth Kent State University -Original Message- From: dvdbo...@cs.com To: kb...@ilstu.edu Sent: Sun, 15 Mar 2009 7:01 pm Subject: Re: MANOVA Data Ken, Did you notice that Wilks, Roy, etc p-values are all the same?? Pillai is almost the SAS result.? Can't figure it out.? I'll submit a bug report. What's Velleman going to talk about?? Thanks for looking at the R. Best, Dave -Original Message- From: Ken Berk To: dvdbo...@cs.com Sent: Sun, 15 Mar 2009 3:45 pm Subject: Re: Fwd: MANOVA Data At 08:07 PM 3/5/2009, you wrote: Hi Ken, I've run the attached data set ( a one way MANOVA ex. from the SAS manual chapter on MANOVA) in both SAS and R and I don't get the same results.? Do you have any suggestions about how I can find out what's going on? Thanks, Dave -Original Message- From: dvdbo...@cs.com To: dvdbo...@aol.com Sent: Thu, 5 Mar 2009 5:06 pm Subject: MANOVA Data Email message sent from CompuServe - visit us today at http://www.cs.com Email message sent from CompuServe - visit us today at http://www.cs.com Hello, David My R results are clearly crap, as shown below. The degrees of freedom are clearly wrong, as is apparent when looking at the univariate anovas. SAS gives the correct answers. I don't know what to do about R. Ken COUNT??? REWGRP??? COMMIT??? SATIS??? STAY 1 1 16??? 19?? 18 2 1 18??? 15?? 17 3 1 18??? 14?? 14 4 1 16??? 20?? 10 5 1 15??? 13?? 17 6 1 12??? 15?? 11 7 2 16??? 20?? 13 8 2 18??? 14?? 16 9 2 13??? 10?? 14 10??? 2 17??? 13?? 19 11??? 2 14??? 18?? 15 12??? 2 19??? 16?? 18 13??? 3 20??? 18?? 16 14??? 3 18??? 15?? 19 15??? 3 13??? 14?? 17 16??? 3 12??? 16?? 15 17??? 3 16??? 17?? 18 18??? 3 14??? 19?? 15 > attach(booth) > Y <- cbind(COMMIT, SATIS, STAY) > fit <- manova(Y ~ REWGRP) > summary(fit, test="Pillai") ? Df? Pillai approx F num Df den Df Pr(>F) REWGRP 1 0.22731? 1.37283? 3 14 0.2918 Residuals 16? > summary(fit, test="Wilks") ? Df?? Wilks approx F num Df den Df Pr(>F) REWGRP 1 0.77269? 1.37283? 3 14 0.2918 Residuals 16? > summary(fit, test="Hotelling-Lawley") ? Df Hotelling-Lawley approx F num Df den Df Pr(>F) REWGRP 1? 0.29418? 1.37283? 3 14 0.2918 Residuals 16?? > summary(fit, test="Roy") ? Df Roy approx F num Df den Df Pr(>F) REWGRP 1 0.29418? 1.37283? 3 14 0.2918 Residuals 16? > summary(fit) ? Df? Pillai approx F num Df den Df Pr(>F) REWGRP 1 0.22731? 1.37283? 3 14 0.2918 Residuals 16? > summary.aov(fit) ?Response COMMIT : ??? Df? Sum Sq Mean Sq F value Pr(>F) REWGRP?? 1?? 0.333?? 0.333? 0.0532 0.8204 Residuals?? 16 100.167?? 6.260?? ?Response SATIS : ??? Df? Sum Sq Mean Sq F value Pr(>F) REWGRP?? 1?? 0.750?? 0.750? 0.0945 0.7625 Residuals?? 16 127.028?? 7.939?? ?Response STAY : ??? Df Sum Sq Mean Sq F value Pr(>F) REWGRP?? 1 14.083? 14.083? 2.3013 0.1488 Residuals?? 16 97.917?? 6.120?? > Email message sent from CompuServe - visit us today at http://www.cs.com Email message sent from CompuServe - visit us today at http://www.cs.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Definition of [[
Just a couple of inline comments down below: On 15/03/2009 5:30 PM, Stavros Macrakis wrote: Duncan, Thanks for the reply. On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch wrote: On 15/03/2009 2:31 PM, Stavros Macrakis wrote: dput(ll[3]) list(NULL) ? i is positive and exceeds length(x); why isn't this list(NA)? Because the sentence you read was talking about "simple vectors", and ll is presumably not a simple vector. So what is a simple vector? That is not explicitly defined, and it probably should be. I think it is "atomic vectors, except those with a class that has a method for [". The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors, 3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures, and 3.4.4 Subset assignment, so the context seems to be saying that "simple vectors" are those which are not matrices or arrays, and those ("other structures") which do not overload [. Even if the definition of 'simple vector' were clarified to cover only atomic vectors, I still can't find any text specifying that list(3)[5] => lsit(NULL). For that matter, it would leave the subscripting of important built-ins such as factors and dates, etc. undefined. Obviously the intuition is that vectors of factors or vectors of dates would do the 'same thing' as vectors of integers or of strings, but 3.4.3 doesn't say what that thing is ll[[3]] Error in list(1)[[3]] : subscript out of bounds ? Why does this return NA for an atomic vector, but give an error for a generic vector? cc[[3]] <- 34; dput(cc) c(1, NA, 34) OK ll[[3]] <- 34; dput(ll) list(1, NULL, 34) Why is second element NULL, not NA? NA is a length 1 atomic vector with a specific type matching the type of c. It makes more sense in this context to put in a NULL, and return a list(NULL) for ll[3]. Understood that that's the rationale, but where is it documented? Also, if that's the rationale, it seems to say that NULL is the equivalent of NA for list elements, but in fact NULL does not function like NA: is.na(NULL) logical(0) Warning message: In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL' is.na(list(NULL)) [1] FALSE Indeed, NA seems to both up-convert and down-convert nicely to other forms of NA: dput(as.integer(as.logical(c(TRUE,NA,TRUE c(1L, NA, 1L) dput(as.logical(as.integer(c(TRUE,NA,TRUE c(TRUE, NA, TRUE) and are not converted to NULL when converted to generic vector: dput(as.list(c(TRUE,NA,TRUE))) list(TRUE, NA, TRUE) and NA is preserved when downconverting: dput(as.logical(as.list(c(TRUE,NA,23 c(TRUE, NA, TRUE) But if you try to downconvert NULL, you get an error dput(as.integer(list(NULL))) Error in isS4(x) : (list) object cannot be coerced to type 'integer' So I don't see why NULL is the right way to represent NA, especially since NULL is a perfectly good list element, distinct from NA. And why is it OK to set an undefined ll[[3]], but not to get it? Lots of code grows vectors by setting elements beyond the end of them, so whether or not that's a good idea, it's not likely to change. I wasn't suggesting changing this. I think an argument could be made that ll[[toobig]] should return NULL rather than trigger an error, but on the other hand, the current behaviour allows the programmer to choose: if you are assuming that a particular element exists, use ll[[element]], and R will tell you when your assumption is wrong. If you aren't sure, use ll[element] and you'll get NA or list(NULL) if the element isn't there. Yes, that could make sense, but why would it be true for ll[[toobig]] but not cc[[toobig]]? But it is: > cc <- c(1) > cc[[3]] Error in cc[[3]] : subscript out of bounds I assume that these are features, not bugs, but I can't find documentation for them. There is more documentation in the man page for Extract, but I think it is incomplete. Yes, I was looking at that man page, and I don't think it resolves any of the above questions. The most complete documentation is of course the source code, but it may not answer the question of what's intentional and what's accidental. Well, that's one issue. But another is that there should be a specification addressed to users, who should not have to understand internals. I agree, but not so strongly that I will drop everything and write one. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Using and 'eval' and environments with active bindings
The following code produces an error in current R-devel f <- function(value) { if(!missing(value)) 100 else 2 } e <- new.env() makeActiveBinding("x", f, e) eval(substitute(list(x)), e) The error, after calling 'eval' is Error in eval(expr, envir, enclos) : element 1 is empty; the part of the args list of 'list' being evaluated was: (x) It has something to do with the change in R_isMissing in revision r48118 but I'm not quite knowledgeable enough to understand what the problem is. In R 2.8.1 the result was simply > eval(substitute(list(x)), e) [[1]] [1] 2 I can't say I know what the output should be but I'd like some clarification on whether this is a bug. Thanks, -roger -- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [OT] Debian now has a new section 'gnu-r'
Joerg Jaspert, one of the ftpmasters / archive maintainers within Debian, today posted a new list of 'Sections' to debian-devel-announce (see eg here http://www.nabble.com/forum/ViewPost.jtp?post=22524830&framed=y ) This now includes a new Section: gnu-rEverything about GNU R, a statistical computation and graphics system which gives R just about the same footing Perl and Python had -- a new section in the archive (and Ruby, Java, Haskell, OcaML, Php got the same treatment). I think none of the 'R-within-Debian' maintainers saw this coming. For the record, the current list of R packages within Debian is included below. Cheers, Dirk r-base-dev gnu-r r-base-core gnu-r r-base-core-ra gnu-r r-base gnu-r r-cran-abindgnu-r r-cran-acepack gnu-r r-cran-adaptgnu-r r-cran-bayesm gnu-r r-cran-bitops gnu-r r-cran-boot gnu-r r-cran-cairodevice gnu-r r-cran-car gnu-r r-cran-catools gnu-r r-cran-chrongnu-r r-cran-cluster gnu-r r-cran-coda gnu-r r-cran-codetoolsgnu-r r-cran-combinat gnu-r r-cran-date gnu-r r-cran-dbi gnu-r r-cran-design gnu-r r-cran-eco gnu-r r-cran-effects gnu-r r-cran-farmagnu-r r-cran-fasianoptionsgnu-r r-cran-fassets gnu-r r-cran-fbasics gnu-r r-cran-fbonds gnu-r r-cran-fcalendargnu-r r-cran-fcopulae gnu-r r-cran-fecofin gnu-r r-cran-fexoticoptions gnu-r r-cran-fextremesgnu-r r-cran-fgarch gnu-r r-cran-fimport gnu-r r-cran-fmultivargnu-r r-cran-fnonlinear gnu-r r-cran-foptions gnu-r r-cran-foreign gnu-r r-cran-fportfolio gnu-r r-cran-fregression gnu-r r-cran-fseries gnu-r r-cran-ftrading gnu-r r-cran-funitroots gnu-r r-cran-futilities gnu-r r-cran-gdatagnu-r r-cran-getopt gnu-r r-cran-gmapsgnu-r r-cran-gmodels gnu-r r-cran-gplots gnu-r r-cran-gregmisc gnu-r r-cran-gtools gnu-r r-cran-hdf5 gnu-r r-cran-hmiscgnu-r r-cran-its gnu-r r-cran-jit gnu-r r-cran-kernsmooth gnu-r r-cran-latticeextra gnu-r r-cran-lattice gnu-r r-cran-lme4 gnu-r r-cran-lmtest gnu-r r-cran-lpsolve gnu-r r-cran-mapdata gnu-r r-cran-maps gnu-r r-cran-matchit gnu-r r-cran-matrix gnu-r r-cran-mcmcpack gnu-r r-cran-mgcv gnu-r r-cran-misc3d gnu-r r-cran-mnormt gnu-r r-cran-mnp gnu-r r-cran-multcomp gnu-r r-cran-mvtnorm gnu-r r-cran-nlme gnu-r r-cran-nws gnu-r r-cran-plotrix gnu-r r-cran-polsplinegnu-r r-cran-pscl gnu-r r-cran-psy gnu-r r-cran-qtl gnu-r r-cran-quadprog gnu-r r-cran-rcmdrgnu-r r-cran-rcolorbrewer gnu-r r-cran-rcpp gnu-r r-cran-relimp gnu-r r-cran-rggobi gnu-r r-cran-rgl gnu-r r-cran-rglpkgnu-r r-cran-rgtk2gnu-r r-cran-rjavagnu-r r-cran-rmetrics gnu-r r-cran-rmpi gnu-r r-cran-rmysql gnu-r r-cran-robustbase gnu-r r-cran-rocr gnu-r r-cran-rodbcgnu-r r-cran-rpartgnu-r r-cran-rpvm gnu-r r-cran-rquantlibgnu-r r-cran-rserve gnu-r r-cran-rsprng gnu-r r-cran-runitgnu-r r-cran-sandwich gnu-r r-cran-sm gnu-r r-cran-sn gnu-r r-cran-snow gnu-r r-cran-strucchange gnu-r r-cran-survival gnu-r r-cran-timedate gnu-r r-cran-timeseries gnu-r r-cran-tkrplot gnu-r r-cran-tseries gnu-r r-cran-urca
Re: [Rd] surprising behaviour of names<-
G'day Wacek, On Sun, 15 Mar 2009 21:01:33 +0100 Wacek Kusnierczyk wrote: > Berwin A Turlach wrote: > > > > Obviously, assuming that R really executes > > *tmp* <- x > > x <- "names<-"('*tmp*', value=c("a","b")) > > under the hood, in the C code, then *tmp* does not end up in the > > symbol table and does not persist beyond the execution of > > names(x) <- c("a","b") > > > > > > to prove that i take you seriously, i have peeked into the code, and > found that indeed there is a temporary binding for *tmp* made behind > the scenes -- sort of. unfortunately, it is not done carefully enough > to avoid possible interference with the user's code: > > '*tmp*' = 0 > `*tmp*` > # 0 > > x = 1 > names(x) = 'foo' > `*tmp*` > # error: object "*tmp*" not found > > `*ugly*` I agree, and I am a bit flabbergasted. I had not expected that something like this would happen and I am indeed not aware of anything in the documentation that warns about this; but others may prove me wrong on this. > given that `*tmp*`is a perfectly legal (though some would say > 'non-standard') name, it would be good if somewhere here a warning > were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is > not just any non-standard name, but one that is 'obviously' used > under the hood to perform black magic. Now I wonder whether there are any other objects (with non-standard) names) that can be nuked by operations performed under the hood. I guess the best thing is to stay away from non-standard names, if only to save the typing of back-ticks. :) Thanks for letting me know, I have learned something new today. Cheers, Berwin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel