[Rd] Rmpi on Fedora 8
A yum update to lam 7.1.4 (from 7.1.2) broke Rmpi for me this last week and quite a few changes were needed to repair this, so I'm reporting here in case it helps others. This was an x86_64 system - adjust 'lib64' suitably for 32-bit systems. There seems to me to be major organizational changes for a 'patchlevel' update to a setup that previously worked out of the box. They almost certainly apply to Fedora 9 too. - yum left some lam 7.1.2 RPMs behind, and I have been unable to remove them via yum. This causes some confusion. - The lam libs are in /usr/lib64/lam/lib, and ldconfig needs to be told about this, so cat > /etc/ld.so.conf.d/lam.ld.conf /usr/lib64/lam/lib ^D /sbin/ldconfig (AFAIR, the previous version was in /usr/lib64/lam, and installed a ld.so.conf.d file. Make sure /usr/lib64/lam is not in the ldconfig path.) - At this point Rmpi may load and then immediately terminate R as the lam helpfile is not found (which is not nice of the lam libs). You may need to export LAMHOME=/usr/lib64/lam . Even if the helpfile is found, it still terminates R if lamd is not running. (As I recall previous RPM installations had run lamboot at system boot.) - The final step is to start a lam configuration. I was only able to do this by setting -prefix, e.g. /usr/lib64/lam/bin/lamboot -prefix /usr/lib64/lam -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] importing explicitly declared missing values in read.spss (foreign)
There is a problem when importing an spss-file containing explicitly declared missing values in R using the read.spss function from the foreign package. I'm not sure these problems are the same in every version of spss, I am using the latest version 16.0.2. I included http://www.nabble.com/file/p18776776/missingdata.sav missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg frequencies.jpg as an example. The data contains 3 types of missing data: 2 are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the third type are the system missings. When this file is imported in R, only the system missings are recognized as missing values, the others are just imported as levels in the nominal case, and as (labeled) real values 8 and 9 in the continuous case. There are also no attributes in the object returned by read.spss that contain information about which values/levels are the missing values; their missingness seems to be completely ignored by the function. Is there some way or other function to be able to import spss files, with an option that replaces all missing values with 's in R? Of course this comes with the trade-off of losing the meaning of the missingness when there are multiple types of missingness, but I think this is far less harmfull than treating all missing values as normal values. [code] > mydata <- read.spss("c:/users/jeroen/desktop/missingdata.sav", > to.data.frame=T) Warning messages: 1: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 16 encountered in system file 3: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 20 encountered in system file > mydata SUBJECT CATEGORI CONTINUO 11 yes 3.11 22 yes 2.10 33 yes 5.34 44 yes 1.54 55 yes 3.89 66 no 2.98 77 no 4.53 88 no 1.98 99 no 3.68 10 10 no 2.94 11 11 NA 8.00 12 12 NA 8.00 13 13 NA 8.00 14 14 NA 8.00 15 15 NA 8.00 16 16 NAP 9.00 17 17 NAP 9.00 18 18 NAP 9.00 19 19 NAP 9.00 20 20 NAP 9.00 21 21NA 22 22NA 23 23NA 24 24NA 25 25NA > is.na(mydata$CONTINUO) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE > is.na(mydata$CATEGORI) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE > summary(mydata) SUBJECT CATEGORICONTINUO Min. : 1 yes :5 Min. :1.540 1st Qu.: 7 no :5 1st Qu.:3.078 Median :13 NA :5 Median :6.670 Mean :13 NAP :5 Mean :5.854 3rd Qu.:19 NA's:5 3rd Qu.:8.250 Max. :25Max. :9.000 NA's :5.000 [/code] -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}
[[Topic diverted from R-help]] > "VK" == Vadim Kutsyy <[EMAIL PROTECTED]> > on Fri, 01 Aug 2008 07:35:01 -0700 writes: VK> Martin Maechler wrote: >> VK> The problem is in array.c, where allocMatrix check for VK> "if ((double)nrow * (double)ncol > INT_MAX)". But why VK> itn is used and not long int for indexing? (max int is VK> 2147483647, max long int is 9223372036854775807) >> Well, Brian gave you all info: >> ( ?Memory-limits ) VK> exactly, and given that most modern system used for VK> computations (i.e. 64bit system) have long int which is VK> much larger than int, I am wondering why long int is not VK> used for indexing (I don't think that 4 bit vs 8 bit VK> storage is an issue). Well, fortunately, reasonable compilers have indeed kept 'long' == 'long int' to mean 32-bit integers ((less reasonable compiler writers have not, AFAIK: which leads of course to code that no longer compiles correctly when originally it did)) But of course you are right that 64-bit integers (typically == 'long long', and really == 'int64') are very natural on 64-bit architectures. But see below. >> Did you really carefully read ?Memory-limits ?? VK> Yes, it is specify that 4 bit int is used for indexing VK> in all version of R, but why? I think 2147483647 VK> elements for a single vector is OK, but not as total VK> number of elements for the matrix. I am running out of VK> indexing at mere 10% memory consumption. If you have too large a numeric matrix, it would be larger than 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes. If that is is 10% only for you, you'd have around 160 GB of RAM. That's quite a impressive. I agree that it is at least in the "ball park" of what is available today. [] VK> PS: I have no problem to go and modify C code, but I am VK> just wondering what are the reasons for having such VK> limitation. Compatibility for one: Note that R objects are (pointers to) C structs that are "well-defined" platform independently, and I'd say that this should remain so. Consequently 64ints (or another "longer int"), would have to be there "in R", also on 32bit platforms. That may well be feasible, but it would double the size of quite a few objects. I think what you are implicitly proposing is that we'd want 64-bit integer as an R-level type, and that are R would use (and/or coerce to it from 'int32') for indexing everywhere. But more importantly, all (or very much of) the currently existing C- and Fortran-code (called via .Call(), .C(), .Fortran) would also have to be able to deal with the "longer ints". One of the last times this topic came up (within R-core), we found that for all the matrix/vector operations, we really would need versions of BLAS / LAPACK that would also work with these "big" matrices, ie. such a BLAS/Lapack would also have to internally use "longer int" for indexing. At that point in time, we had decied we would at least wait to hear about the development of such BLAS/LAPACK libraries. Interested to hear other opinions / get more info on this topic. I do agree that it would be nice to get over this limit within a few years. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}
Martin Maechler wrote: [[Topic diverted from R-help]] Well, fortunately, reasonable compilers have indeed kept 'long' == 'long int' to mean 32-bit integers ((less reasonable compiler writers have not, AFAIK: which leads of course to code that no longer compiles correctly when originally it did)) But of course you are right that 64-bit integers (typically == 'long long', and really == 'int64') are very natural on 64-bit architectures. But see below. well in 64bit Ubunty, /usr/include/limits.h defines: /* Minimum and maximum values a `signed long int' can hold. */ # if __WORDSIZE == 64 # define LONG_MAX 9223372036854775807L # else # define LONG_MAX 2147483647L # endif # define LONG_MIN (-LONG_MAX - 1L) and using simple code to test (http://home.att.net/~jackklein/c/inttypes.html#int) my desktop, which is standard Intel computer, does show. Signed long min: -9223372036854775808 max: 9223372036854775807 If you have too large a numeric matrix, it would be larger than 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes. If that is is 10% only for you, you'd have around 160 GB of RAM. That's quite a impressive. > cat /proc/meminfo | grep MemTotal MemTotal: 145169248 kB We have "smaller" SGI NUMAflex to play with, where the memory can increased to 512Gb ("larger" version doesn't have this "limitation"). But with even commodity hardware you can easily get 128Gb for reasonable price (i.e. Dell PowerEdge R900) Note that R objects are (pointers to) C structs that are "well-defined" platform independently, and I'd say that this should remain so. I forgot that R stores two dimensional array in a single dimensional C array. Now I understand why there is a limitation on total number of elements. But this is a big limitations. One of the last times this topic came up (within R-core), we found that for all the matrix/vector operations, we really would need versions of BLAS / LAPACK that would also work with these "big" matrices, ie. such a BLAS/Lapack would also have to internally use "longer int" for indexing. At that point in time, we had decied we would at least wait to hear about the development of such BLAS/LAPACK libraries BLAS supports two dimensional metrics definition, so if we would store matrix as two dimensional object, we would be fine. But than all R code as well as all packages would have to be modified. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}
> "VK" == Vadim Kutsyy <[EMAIL PROTECTED]> > on Fri, 01 Aug 2008 10:22:43 -0700 writes: VK> Martin Maechler wrote: >> [[Topic diverted from R-help]] >> >> Well, fortunately, reasonable compilers have indeed kept >> 'long' == 'long int' to mean 32-bit integers ((less >> reasonable compiler writers have not, AFAIK: which leads >> of course to code that no longer compiles correctly when >> originally it did)) But of course you are right that >> 64-bit integers (typically == 'long long', and really == >> 'int64') are very natural on 64-bit architectures. But >> see below. ... I wrote complete rubbish, and I am embarrassed ... >> VK> well in 64bit Ubunty, /usr/include/limits.h defines: VK> /* Minimum and maximum values a `signed long int' can hold. */ VK> # if __WORDSIZE == 64 VK> # define LONG_MAX 9223372036854775807L VK> # else VK> # define LONG_MAX 2147483647L VK> # endif VK> # define LONG_MIN (-LONG_MAX - 1L) VK> and using simple code to test VK> (http://home.att.net/~jackklein/c/inttypes.html#int) my desktop, which VK> is standard Intel computer, does show. VK> Signed long min: -9223372036854775808 max: 9223372036854775807 yes. I am really embarrassed. What I was trying to say was that the definition of int / long /... should not change when going from 32bit architecture to 64bit and that the R internal structures consequently should also be the same on 32-bit and 64-bit platforms >> If you have too large a numeric matrix, it would be larger than >> 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes. >> If that is is 10% only for you, you'd have around 160 GB of >> RAM. That's quite a impressive. >> >> cat /proc/meminfo | grep MemTotal VK> MemTotal: 145169248 kB VK> We have "smaller" SGI NUMAflex to play with, where the memory can VK> increased to 512Gb ("larger" version doesn't have this "limitation"). VK> But with even commodity hardware you can easily get 128Gb for reasonable VK> price (i.e. Dell PowerEdge R900) >> Note that R objects are (pointers to) C structs that are >> "well-defined" platform independently, and I'd say that this >> should remain so. >> VK> I forgot that R stores two dimensional array in a single dimensional C VK> array. Now I understand why there is a limitation on total number of VK> elements. But this is a big limitations. Yes, maybe >> One of the last times this topic came up (within R-core), >> we found that for all the matrix/vector operations, >> we really would need versions of BLAS / LAPACK that would also >> work with these "big" matrices, ie. such a BLAS/Lapack would >> also have to internally use "longer int" for indexing. >> At that point in time, we had decied we would at least wait to >> hear about the development of such BLAS/LAPACK libraries VK> BLAS supports two dimensional metrics definition, so if we would store VK> matrix as two dimensional object, we would be fine. But than all R code VK> as well as all packages would have to be modified. exactly. And that was what I meant when I said "Compatibility". But rather than changing the "matrix = colmunwise stored as long vector" paradigm, should rather change from 32-bit indexing to longer one. The hope is that we eventually make up a scheme which would basically allow to just recompile all packages : In src/include/Rinternals.h, we have had the following three lines for several years now: /* type for length of vectors etc */ typedef int R_len_t; /* will be long later, LONG64 or ssize_t on Win64 */ #define R_LEN_T_MAX INT_MAX and you are right, that it may be time to experiment a bit more with replacing 'int' with long (and also the corresponding _MAX) setting there, and indeed, in the array.c code you cited, should repalce INT_MAX by R_LEN_T_MAX This still does not solve the problem that we'd have to get to a BLAS / Lapack version that correctly works with "long indices"... which may (or may not) be easier than I had thought. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] data.matrix (was sapply(Date, is.numeric))
I've committed a more liberal version to R-devel. (It even handles S4 classes with an as() method.) On Thu, 31 Jul 2008, Martin Maechler wrote: "PBR" == Prof Brian Ripley <[EMAIL PROTECTED]> on Thu, 31 Jul 2008 08:36:22 +0100 (BST) writes: PBR> I've now committed fixes in R-patched and R-devel. PBR> There is one consequence: data.matrix() was testing for numeric columns by PBR> unlist(lapply(x, is.numeric)) and so incorrectly treating Date and POSIXct PBR> columns as numeric (which we had decided they were not). This affects PBR> package gvlma. PBR> data.matrix() is now working as documented, but as we have an exception PBR> for factors, do we also want exceptions for Date and POSIXct? Yes, that's a good idea, and much in the spirit of data.matrix() as I have understood it. Note the following from help(data.matrix) where I think the 'Title' and 'Description' are more liberal (rightly so) than 'Details' : >> Convert a Data Frame to a Numeric Matrix >> >> Description: >> >> Return the matrix obtained by converting all the variables in a >> data frame to numeric mode and then binding them together as the >> columns of a matrix. Factors and ordered factors are replaced by >> their internal codes. [...] >> Details: >> >> Supplying a data frame with columns which are not numeric, factor >> or logical is an error. A warning is given if any non-factor >> column has a class, as then information can be lost. Do we really have good reasons to give an error if a column is not numeric (nor of the "exception class")? Couldn't we just lapply(., as.numeric) and if that doesn't give errors just "be happy" ? Martin PBR> On Wed, 30 Jul 2008, Martin Maechler wrote: >>> "BDR" == Prof Brian Ripley <[EMAIL PROTECTED]> >>> on Wed, 30 Jul 2008 13:29:38 +0100 (BST) writes: >> BDR> On Wed, 30 Jul 2008, Martin Maechler wrote: >> >>> "RobMcG" == McGehee, Robert <[EMAIL PROTECTED]> >> >>> on Tue, 29 Jul 2008 15:40:37 -0400 writes: >> >> RobMcG> FYI, RobMcG> I've tried posting the below message twice to the bug tracking system, >> >> >> >> [... r-bugs problems discussed in a separate thread ] >> >> >> >> >> >> RobMcG> R-developers, RobMcG> The results below are inconsistent. From the documentation for RobMcG> is.numeric, I expect FALSE in both cases. >> >> >> >> >> x <- data.frame(dt=Sys.Date()) >> >> >> is.numeric(x$dt) RobMcG> [1] FALSE >> >> >> sapply(x, is.numeric) RobMcG> dt RobMcG> TRUE >> >> RobMcG> ## Yet, sapply seems aware of the Date class >> >> >> sapply(x, class) RobMcG> dt RobMcG> "Date" >> >> >> >> Yes, thanks a lot, Robert, for the report. >> >> >> >> That *is* a bug somewhere in the .Internal(lapply(...)) C code, >> >> when S3 dispatch of primitive functions should happen. >> BDR> The bug is in do_is, which uses CHAR(PRINTNAME(CAR(call))), and when BDR> called from lapply that gives "FUN" not "is.numeric". The root cause is BDR> the following comment >> BDR> FUN = CADR(args); /* must be unevaluated for use in e.g. bquote */ >> BDR> and hence that the function in the *call* passed to do_is can be BDR> unevaluated. >> >> aah! I see. >> >> >> Here's an R scriptlet exposing a 2nd example >> >> >> >> ### lapply(list, FUN) >> >> ### -- seems to sometimes fail for >> >> ### .Primitive S3-generic functions >> >> >> >> (ds <- seq(from=Sys.Date(), by=1, length=4)) >> >> ##[1] "2008-07-30" "2008-07-31" "2008-08-01" "2008-08-02" >> >> ll <- list(d=ds) >> >> lapply(list(d=ds), round) >> >> ## -> Error in lapply(list(d = ds), round) : dispatch error >> >> BDR> And that's a separate issue, in DispatchGroup which states that arguments BDR> have been evaluated (true) but the 'call' from lapply gives the BDR> unevaluated arguments and so there is a mismatch. >> >> yes, I too found that this was a separate issue, the latter >> one being new since version 2.7.0 >> BDR> I'm testing fixes for both. >> >> Excellent! >> Martin >> >> >> >> ## or -- related to bug report by Robert McGehee on R-devel, on 2008-07-29: >> >> sapply(list(d=ds), is.numeric) >> >> ## TRUE >> >> >> >> ## in spite of >> >> is.numeric(`[[`(ll,1)) ## FALSE , because of >> >> is.numeric.date >> >> >> >> ## or >> >> round(`[[`(ll,1)) >> >> ## [1] "2008-07-30" "2008-07-31" "2008-08-01" "2008-08-02" >> >> >> >> ##- >> >> >> >> But I'm currently too much tied up with other duties, >> >> to find and test bug-fix. >> >> >> >> Martin Maechler, ETH Zurich and R-Core Team >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> PBR> --