Re: [Rd] source bug ? (PR#7929)
On Fri, 10 Jun 2005 [EMAIL PROTECTED] wrote: > hello bug fixers > > i think this bug is probably mentioned before but i couldn't find the > answer to it on google. > i have to build R from the source files ( on a mac os x system ) > because i need the python R interface too work. for this too happen r > needs to be configured in the following way > > ./configure --enable-R-shlib > The R for Mac OS X FAQ tells you how to configure. You do need the --with-blas='-framework vecLib' --with-lapack flags that it lists. -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Citation for R
On Mon, 13 Jun 2005, Gordon K Smyth wrote: > This is just a note that R would get a lot more citations if the > recommended citation was an article in a recognised journal or from a > recognised publisher. > This is unfortunately true, but R is *not* an article or a book, it is a piece of software. I don't think I'm the only person who thinks it is counterproductive in the long run to encourage users to cite an article that they probably haven't read instead of citing the software they actually used. Jan's suggestion of the Journal of Statistical Software might provide a solution, since JSS *does* publish software. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] (no subject)
Online registration for DSC 2005 in Seattle, August 13-14 is now open. See the conference web page at http://depts.washington.edu//dsc2005/ -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] DSC 2005
Persons conversant with W3C standards would have noticed too many // in the URL quoted in my previous message. The correct conference page URL is http://depts.washington.edu/dsc2005 -thomas On Mon, 13 Jun 2005, Thomas Lumley wrote: > > Online registration for DSC 2005 in Seattle, August 13-14 is now open. See > the conference web page at > http://depts.washington.edu//dsc2005/ > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Open device -> glibc 2.3.4 bug for Redhat Enterprise 4?
This was supposed to be fixed in 2.1.1 -- which version are you using? -thomas On Tue, 21 Jun 2005, Martin Maechler wrote: > We have been using Redhat Enterprise 4, on some of our Linux > clients for a while, > and Christoph has just found that opening an R device for a file > without write permission gives a bad glibc error and subsequent > seg.fault: > >> postscript("/blabla.ps") > *** glibc detected *** double free or corruption (!prev): 0x01505f10 > *** > > or > >> xfig("/blabla.fig") > *** glibc detected *** double free or corruption (!prev): 0x01505f10 > *** > > and similar for pdf(); > does not happen for jpeg() {which runs via x11}, > nor e.g. for > >> sink("/bla.txt") > > --- > > Happens both on 32-bit (Pentium) and 64-bit (AMD Athlon) > machines with the following libc : > > 32-bit: > -rwxr-xr-x 1 root root 1451681 May 13 00:17 /lib/tls/libc-2.3.4.so* > 64-bit: > -rwxr-xr-x 1 root root 1490956 May 12 23:26 /lib64/tls/libc-2.3.4.so* > > --- > > Can anyone reproduce this problem? > > Regards, > Martin > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Open device -> glibc 2.3.4 bug for Redhat Enterprise 4?
On Tue, 21 Jun 2005, Martin Maechler wrote: >>>>>> "TL" == Thomas Lumley <[EMAIL PROTECTED]> >>>>>> on Tue, 21 Jun 2005 09:59:31 -0700 (PDT) writes: > >TL> This was supposed to be fixed in 2.1.1 -- which version are you using? > > 2.1.1 -- and 2.1.0 and 2.0.0 all showed the problem. > > But thanks, Thomas, looking in "NEWS" of R-devel showed that > there was a fix for this in R-devel only --- too bad it didn't > make it for R 2.1.1. > It's a double free(), so it produces undefined behaviour and anything can happen. On the Mac (where it was first reported), "anything" was a warning message from malloc, and on many systems "anything" is nothing. I thought I had added it to R-patched as well, but obviously not. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segmentation fault after max level of parenthesis (PR#7990)
Already fixed, I believe. -thomas On Mon, 4 Jul 2005 [EMAIL PROTECTED] wrote: > Full_Name: Matthias Laabs > Version: 2.1.0 (source compiled) > OS: debian linux (sarge) > Submission from: (NULL) (195.143.236.2) > > > Hi! > R crashes with a segmentation fault when I use more than 85 parenthesis (it > actually happened by accidently hitting a wrong shortcut in emacs ... ) > > code: > sum(() > > > cheers from Berlin, > > Matthias > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R_alloc problems in R v1.11
On Wed, 6 Jul 2005, [iso-8859-1] Marie-Hélène Ouellette wrote: Dear Dr. Ripley, Or possibly other people on the list. I'm using the R v1.11 on Macintoch Unlikely. There is no R v1.11, and R 1.1.1 (following the most common misspelling) wasn't available for the Mac. and I seem to have a problem with the function R_alloc. It crashes when using the following .C function (only an example): /// # include void Hello(int *n) { int i,x; for(i=1;1< *n ; i++) This is an infinite loop. You probably mean i<*n as the second expression, or perhaps i<=*n (did you want n or n-1 repeats?) { Rprintf('salut!!!\n'); This is invalid C, as your compiler should have told you. You need double quotes. } x = (int *) R_alloc(5,sizeof(int)); x was declared as int, not int *, so again this is invalid. } /// I call it in R with this line: .C('Hello',as.integer(5)) Any idea why and how I can resolve this problem? After fixing the C errors and in a version of R that exists (2.0.1) I get dyn.load("hello.so") .C('Hello',as.integer(5)) salut!!! salut!!! salut!!! salut!!! [[1]] [1] 5 -thomas__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem with dyn.load...or else...
On Wed, 6 Jul 2005, [iso-8859-1] Marie-H?l?ne Ouellette wrote: And try to use the function: > K_MEANSR(tab,centers=c(2,4)) [1] "AA" [1] "AAA" [1] "A" [1] "B" Error in .C("K_MEANSC", xrows = as.integer(xrows), xcols = as.integer(xcols), : "C" function name not in load table Hmm. Strange. R doesn't think your C function is called K_MEANSC (I assume that K_MEANSC *is* actually the name of your C function.) In a Terminal window you can use nm K_MEANSC.so to see all the names of externally visible objects in your DLL. This would narrow down whether the change of names is happening in R or in compilation. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R_AllocatePtr
On Tue, 19 Jul 2005, Paul Roebuck wrote: > Had been looking into Luke Tierney's R_AllocatePtr() and > was left with a question about exactly when does R reclaim > heap memory. Implication of 'simpleref.nw' is that one can > allocate C data on the R heap, and as long as pointer object > is alive, the data and pointer will remain valid. But it > calls allocString() which is implemented using R_alloc(). Um, no. allocString is SEXP allocString(int length) { return allocVector(CHARSXP, length); } In fact the reverse is true, R_alloc is implemented using allocString. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Follow-Up: R on FC4
On Mon, 1 Aug 2005, Gavin Simpson wrote: > R-devel compiles without error (same set of flags as above) but fails > make check-all in p-r-random-tests.R - the relevant section of p-r- > random-tests.Rout.fail is: > > ... >> dkwtest("norm") > norm() PASSED > [1] TRUE >> dkwtest("norm",mean = 5,sd = 3) > norm(mean = 5, sd = 3) PASSED > [1] TRUE >> >> dkwtest("gamma",shape = 0.1) > gamma(shape = 0.1) PASSED > [1] TRUE >> dkwtest("gamma",shape = 0.2) > gamma(shape = 0.2) PASSED > [1] TRUE >> dkwtest("gamma",shape = 10) > gamma(shape = 10) FAILED > Error in dkwtest("gamma", shape = 10) : dkwtest failed > Execution halted > > Is this a tolerance setting in R's tests not being met or is this > indicative of a bad compilation of R-Devel on FC4 with gfortran? > Try this again -- it is a random test. If it fails again then something is wrong. The tolerance on these tests is not very tight, certainly nowhere near rounding error. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? (PR#8074)
On Wed, 17 Aug 2005 [EMAIL PROTECTED] wrote: > I just don't understand this: > >> (2*2)==4 > [1] TRUE >> .2*.2 > [1] 0.04 >> (.2*.2)==.04 > [1] FALSE It's a FAQ, not a bug. Consider: > (.2*.2) - .04 [1] 6.938894e-18 and read the FAQ -thomas > or > >> x=.04 >> x > [1] 0.04 >> y=.2*.2 >> y > [1] 0.04 >> y==x > [1] FALSE > > ______ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug? (PR#8074)
On Tue, 16 Aug 2005, Paul Mosquin wrote: > I guess that I expect R to act pretty much as C or C++ would do if I were to > program the same code. It's a bit of a surprise that assignment of > rationals, well within precision, followed by multiplication leading to a > result well within precision picks up those extra bits along the way. > Something to watch out for, to be sure. But those rationals are *not* well within precision. 0.2 is a infinite repeating binary fraction (in base 16 it is 0.333) so it is not stored precisely. 0.04 is also not stored precisely, and it so happens that the error in representing 0.04 is not the same as the error in representing 0.02*0.02. Of course this will still happen in C: R is written in C. For example, on my computer the following C program --- #include int main(){ double d=0.2; double dd; dd=d*d; if(dd==d) printf("Equal\n"); else printf("Difference=%20.18f\n",0.04-dd); } -- prints [al:~] thomas% ./a.out Difference=-0.07 which happens to agree with the result R gives, though this isn't guaranteed. You simply cannot rely on floating point equality unless you know how the last bit rounding errors are handled. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Memory leakage/violation?
I can't reproduce this on R2.2.0dev on Windows XP (in a few hundred tries), or running under Valgrind on AMD64 Linux (in four or five tries). -thomas On Fri, 26 Aug 2005, Henrik Bengtsson wrote: > Hi, > > I've spotted a possible memory leakage/violation in the latest R v2.1.1 > patched and R v2.2.0dev on Windows XP Pro SP2 Eng. > > I first caught it deep down in a nested svd algorithm when subtracting a > double 'c' from a integer vector 'a' where both had finite values but > when assigning 'a <- a - c' would report NaNs whereas (a-c) alone would > not. Different runs with the identical data would introduce NaNs at > random positions, but not all the time. > > Troubleshooting is after a couple of hours still at v0.5, but here is a > script that generates the strange behavior on the above R setups. I let > the script speak for itself. Note that both the script 'strange.R' and > the data 'strange.RData' is online too, see code below. > > People on other systems (but also on Windows), could you please try it > and see if you can reproduce what I get. > > Cheers > > Henrik > > > # The following was tested on: Windows XP Pro SP2 Eng with > # i) R Version 2.1.1 Patched (2005-08-25) > # ii) R 2.2.0 Under development (unstable) (2005-08-25 r35394M) > > # Start 'R --vanilla' and source() this script, i.e. > # source("http://www.maths.lth.se/help/R/strange.R";) > # If you do not get any errors, retry a few times. > > foo <- function(x) { > print(list( > name=as.character(substitute(x)), > storage.mode=storage.mode(x), > na=any(is.na(x)), > nan=any(is.nan(x)), > inf=any(is.infinite(x)), > ok=all(is.finite(a)) > )) > print(length(x)) > print(summary(x)) > } > > # Load data from a complicated "non-reproducible" algorithm. > # The below errors occur also when data is not > # saved and then reloaded from file. Data was generated in > # R v2.1.1 patched (see above). > if (file.exists("strange.RData")) { > load("strange.RData") > } else { > load(url("http://www.maths.lth.se/help/R/strange.RData";)) > } > > # First glance at data... > foo(a) > foo(c) > > ## $name > ## [1] "a" > ## > ## $storage.mode > ## [1] "integer" > ## > ## $na > ## [1] FALSE > ## > ## $nan > ## [1] FALSE > ## > ## $inf > ## [1] FALSE > ## > ## $ok > ## [1] TRUE > ## > ## [1] 15000 > ##Min. 1st Qu. MedianMean 3rd Qu.Max. > ##41.051.063.0 292.2 111.0 65170.0 > ## $name > ## [1] "c" > ## > ## $storage.mode > ## [1] "double" > ## > ## $na > ## [1] FALSE > ## > ## $nan > ## [1] FALSE > ## > ## $inf > ## [1] FALSE > ## > ## $ok > ## [1] TRUE > ## > ## [1] 1 > ##Min. 1st Qu. MedianMean 3rd Qu.Max. > ## 53.43 53.43 53.43 53.43 53.43 53.43 > ## > > # But, trying the following, will result in > # no-reproducible error messages. Sometimes > # it errors at kk==1, sometimes at kk >> 1. > # Also, look at the different output for > # different kk:s. > for (kk in 1:100) { > cat("kk=",kk, "\n") > print(summary(a-c)) > } > > ## kk= 1 > ## Min. 1st Qu. MedianMean 3rd Qu.Max. > ## -7.741e+307 -2.431e+00 9.569e+00 5.757e+01 > ## kk= 2 > ##Min. 1st Qu.Median Mean 3rd Qu. Max. > ## -12.430-2.431 9.569 238.70057.570 65120.000 > ## kk= 3 > ##Min. 1st Qu.Median Mean 3rd Qu. Max. > ## -12.430-2.431 9.569 57.570 65120.000 > ## kk= 4 > ##Min. 1st Qu.Median Mean 3rd Qu. Max. > ## -12.430-2.431 9.569 238.70057.570 65120.000 > ## kk= 5 > ##Min. 1st Qu.Median Mean 3rd Qu. Max. > ## -12.430-2.431 9.569 238.70057.570 65120.000 > ## kk= 6 > ## Error in quantile.default(object) : missing values and NaN's > ## not allowed if 'na.rm' is FALSE > > > ## Comments: If you shorten down 'a', the bug occurs less frequently. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Compile warning in unique.c
On Mon, 29 Aug 2005, Harris, Michael (NIH/NCI) [E] wrote: > > I am getting a compile warning when building R from source. I am building > on a AMD64 Opteron system with gcc (GCC) 3.3.3 (SuSE Linux) > > The warning is: > >unique.c: In function `cshash': > > unique.c:1146: warning: cast from pointer to integer of different size > The comment immediately above this suggests that it is deliberate /* Use hashing to improve object.size. Here we want equal CHARSXPs, not equal contents. This only uses the bottom 32 bits of the pointer, but for now that's almost certainly OK */ The warning is presumably because casting this int back to a pointer would fail (and is a common 32 to 64bit conversion error), but that's not what is happening here. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call and Segmentation Fault
On Tue, 31 Aug 2005, Peter Dalgaard wrote: > Well, did you try running under a debugger? > > R -d gdb > > then "bt" after the segfault. (Make sure that things are compiled with > -g option) It would also be worth renaming the the function -- while I don't see exactly how it could be causing the problem, I would be nervous about having a C function called main() that wasn't the real main(). -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 64 bit R for Windows
On Fri, 2 Sep 2005, Martin Maechler wrote: >> "PD" == Peter Dalgaard <[EMAIL PROTECTED]> >> on 02 Sep 2005 18:48:24 +0200 writes: > >PD> "Milton Lopez" <[EMAIL PROTECTED]> writes: > >>> I appreciate the update. We will consider using Linux, >>> which leads me to one more question: what is the maximum >>> RAM that R can use on each platform (Linux and Windows)? >>> >>> Thanks again for your prompt responses. > >PD> On Win32, something like 3GB. Maybe a little more on >PD> Linux32, but there's a physical limit at 4GB. > > for a *single* object, yes. However (and Peter knows this > probably better than me ..), R's workspace can be very much > larger which makes it realistically possible to start *using* R > functions on objects of around 4GB. No, no. On *Windows* there is an address space limit of about 3Gb (and on other 32bit systems) On a 64bit system the limit is that a vector can't have length greater than 2^31, but this would be 8Gb for integers or 16Gb for doubles and so represents larger objects than you would want to handle on most current 64-bit systems. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call with C and Fortran together (PR#8122)
> > In some machines I don't get the segmentation fault problem, but I don't get > the > message "Just a simple test" either (when using "cg" as the subroutine's > name). > I believe this is bug in R because if I change my C interface again to return > a > 0 instead of a R_NilValue, and then use it with another C program wich loads > the > dynamic library amd call the function simple_program(), everything work > perfectly. > I don't think it is an R bug. I think it is because there is already a Fortran function called cg in R. The fact that changing the name matters suggest that you have a linking problem, and this turns out to be the case. When I try running your code under gdb in R as Peter Dalgaard suggested (after editing it to use R's macros for calling fortran from C instead of "cfortran.h" which I don't have), I get > .Call("simple_program") Calling the function... Program received signal SIGSEGV, Segmentation fault. 0x081604e5 in cg_ (nm=0x9e5dda4, n=0xbfefccfc, ar=0xbfefcce8, ai=0x89a826, wr=0x9e5dda4, wi=0x9790cc0, matz=0x56090a0, zr=0x80992d4, zi=0x0, fv1=0x0, fv2=0x9e745f8, fv3=0x89a810, ierr=0x706d6973) at eigen.f:3416 3416 IERR = 10 * N Current language: auto; currently fortran That is, your program is calling the Fortran subroutine CG in eigen.f, rather than your CG. There should be some set of linker flags that makes sure your definition of CG is used, but I don't know what it would be (and it's probably very platform dependent) -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]
On Fri, 9 Sep 2005, Gabor Grothendieck wrote: > > I personally put NEWS, WISHLIST and THANKS files in the 'inst' > directory of all my source packages. This has the effect of copying them to > the > top level of the built version so that they are accessible from R via: > I'm not sure that WISHLIST and THANKS need to be available to people who haven't installed the package. NEWS, on the other hand, really does. One option (if it doesn't turn out to be too much work for the CRAN maintainers) would be to have an optional Changelog field in the DESCRIPTION file giving the relative path to the file. This would mean that maintainers would not all have to switch to the same format. eg for foreign Changelog: ChangeLog and for survey Changelog: inst/NEWS This might be enough to make it easy for CRAN to display these when the maintainer provides them. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]
On Fri, 9 Sep 2005, Gabor Grothendieck wrote: > How about if there were just a standard location and name such as inst/NEWS, > inst/WISHLIST, inst/THANKS (which has the advantage that they are > automatically > made available in the built package under the current way packages are > built) The problem is that there *isn't* a standard location. As Robert Gentleman has pointed out, if you only maintain two or three packages it isn't too bad to change them to some new layout, but if you are the bioconductor project it gets painful quite quickly. Also, there are good reasons for having NEWS in the top level directory. Nearly everything that isn't an R package does this, because it's a useful standard. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]
> > Standard location or a mechachanism like the one you describe are both > similar amount of work (and not much at all), the HTML pages are > generated by perl and I have the parsed DESCRIPTION file there, i.e., > using a fixed name or the value of the Changelog field is basically > the same. > In which case a Changlog entry in DESCRIPTION would be a very nice addition, and would have the advantage of not requiring changes to packages. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]
On Sat, 10 Sep 2005, Gabor Grothendieck wrote: > > And one more comment. The DESCRIPTION file does not record the > location or existence of the various subdirectories such as R, man, > exec, etc. If NEWS is to be recorded as a meta data line item in > DESCRIPTION then surely all of these should be too so its symmetric > and they are all on an equal footing (or else none of them > should be, which in fact I think is preferable). > I don't see any advantage in symmetry. The locations of these subdirectories are fixed and I can't see why someone trying to decide whether to install an upgrade needs to know if it has an exec subdirectory before they download the package. I also don't see why THANKS and WISHLIST should need to be visible before you download the package. CRAN does display a URL if one is given, and if these are important they could be at that URL. The changelog, on the other hand, is one piece of information that is really valuable in deciding whether or not to update a package, so it would be worth having it visible on CRAN. Since other coding standards suggest different things for the name and location of this file, a path in DESCRIPTION seems a minimal change. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in, read.spss, package foreign?]
On Sat, 10 Sep 2005, Frank E Harrell Jr wrote: > I would vote for allowing a URL or external file name in in DESCRIPTION, > whose contents could be automatically displayed for the user when > needed. Our changelogs are automatically generated by CVS and are on > the web. Yes, this would be nice. However, a URL facility is already present (and you already use it, and link changelogs to the URL, as do I). -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages
On Sat, 10 Sep 2005, Seth Falcon wrote: > For what its worth, I don't like this idea of adding a ChangeLog field > to the DESCRIPTION file. > > Agreeing upon a standard location for NEWS or CHANGES or some such > seems a more simple solution. As long as the presence of such a file > is *optional*. And if the location really needs to be at the top, > then the build tools could grab it from there as they do the > DESCRIPTION file. We're certainly agreed on its being optional. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue tracking in packages [was: Re: [R] change in read.spss, package foreign?]
On Sat, 10 Sep 2005, Gabor Grothendieck wrote: > On 9/10/05, Thomas Lumley <[EMAIL PROTECTED]> wrote: >> On Sat, 10 Sep 2005, Gabor Grothendieck wrote: >>> >>> And one more comment. The DESCRIPTION file does not record the >>> location or existence of the various subdirectories such as R, man, >>> exec, etc. If NEWS is to be recorded as a meta data line item in >>> DESCRIPTION then surely all of these should be too so its symmetric >>> and they are all on an equal footing (or else none of them >>> should be, which in fact I think is preferable). >>> >> >> I don't see any advantage in symmetry. The locations of these > > The present discussion is where the change information may be located > but that is also true of the source and other information.We could > just as easily have a field in the DESCRIPTION that tells the build > where to find the R source. > Its really the same issue. > There are two important differences 1/ No existing package has its source anywhere other than in the R subdirectory. Existing packages have their change logs in different places and different formats. 2/ Having source code where it will not be found must be an error -- making the source code available to R *cannot* be optional. Making a change log available *must* be optional. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Typo [Was: Rd and guillemots]
On Fri, 16 Sep 2005, [EMAIL PROTECTED] wrote: The name of the "continental" quotation mark « is "guillemet". For anyone who is still confused: Left pointing guillemet (U+00BB) http://www.mathmlcentral.com/characters/glyphs/LeftGuillemet.html Left pointing guillemot (Uria aalge) http://www.rspb.org.uk/scotland/action/disaster/index.asp Right pointing guillemet: (Unicode U+00AB) http://www.mathmlcentral.com/characters/glyphs/RightGuillemet.html Right pointing guillemot: (Uria aalge) http://www.yptenc.org.uk/docs/factsheets/animal_facts/guillemot.html -thomas__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Typo [Was: Rd and guillemots]
On Fri, 16 Sep 2005, Thomas Lumley wrote: On Fri, 16 Sep 2005, [EMAIL PROTECTED] wrote: The name of the "continental" quotation mark ? is "guillemet". For anyone who is still confused: It should perhaps be noted that the Postscript name for the Unicode "Left pointing guillemet" is guillemotleft, which explains some of the confusion. There does not seem to be a Postscript name for "Left pointing guillemot" -thomas__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.data.frame segfaults on large lists (PR#8141)
Under Valgrind on x86_64 I get ==27405== Access not within mapped region at address 0x33FFEFD8 ==27405==at 0x447045: Rf_substituteList (coerce.c:2003) ==27405== Stack overflow in thread 1: can't grow stack to 0x33FFEF98 -thomas On Sun, 18 Sep 2005, Peter Dalgaard wrote: [EMAIL PROTECTED] writes: Full_Name: Ulrich Poetter Version: 2.1.1 OS: i686-pc-linux-gnu FC2 Submission from: (NULL) (134.147.95.187) as.data.frame() segfaults on lists with very many elements: dfn <- rep(list(rep(0,2)),198000) test <- as.data.frame.list(dfn) Process R segmentation fault at Sun Sep 18 17:06:02 2005 Not for me on FC4. The process size grows to about 180M and the system thrashes badly, but the calculation runs to completion. It's not unlikely that we are ignoring a failed allocation somewhere, but there's not much hope of finding it from the available information. You could try running under gdb and see where things go wrong for you. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subscripting fails if name of element is "" (PR#8161)
On Fri, 30 Sep 2005, "Jens Oehlschlägel" wrote: Dear all, The following shows cases where accessing elements via their name fails (if the name is a string of length zero). This looks deliberate (there is a function NonNullStringMatch that does the matching). I assume this is because there is no other way to indicate that an element has no name. If so, it is a documentation bug -- help(names) and FAQ 7.14 should specify this behaviour. Too late for 2.2.0, unfortunately. -thomas Best regards Jens Oehlschlägel p <- 1:3 names(p) <- c("a","", as.character(NA)) p a 123 for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 # error 1: vector subsripting with "" fails in second element for (i in names(p)) + print(p[i]) a 1 NA 3 # error 2: print method for list shows no name for second element p <- as.list(p) for (i in names(p)) + print(p[[i]]) [1] 1 [1] 2 [1] 3 # error 3: list subsripting with "" fails in second element for (i in names(p)) + print(p[i]) $a [1] 1 $"NA" NULL $"NA" [1] 3 version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor1.1 year 2005 month06 day 20 language R # -- replication code -- p <- 1:3 names(p) <- c("a","", as.character(NA)) p for (i in names(p)) print(p[[i]]) # error 1: vector subsripting with "" fails in second element for (i in names(p)) print(p[i]) # error 2: print method for list shows no name for second element p <- as.list(p) for (i in names(p)) print(p[[i]]) # error 3: list subsripting with "" fails in second element for (i in names(p)) print(p[i]) -- __________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] access to R parse tree for Lisp-style macros?
On Mon, 3 Oct 2005, Duncan Murdoch wrote: > On 10/3/2005 3:25 AM, Andrew Piskorski wrote: >> R folks, I'm curious about possible support for Lisp-style macros in >> R. I'm aware of the "defmacro" support for S-Plus and R discussed >> here: >> >> http://www.biostat.wustl.edu/archives/html/s-news/2002-10/msg00064.html >> >> but that's really just a syntactic short-cut to the run-time use of >> substitute() and eval(), which you could manually put into a function >> yourself if you cared too. (AKA, not at all equivalent to Lisp >> macros.) Well, yes and no. It is a syntactic shortcut using functions, but what it does is manipulate and then evaluate pieces of parse tree. It doesn't have the efficiency under compilation that real macros would, but we don't have compilation. It doesn't have gensyms, but again, R fails to support these in a fairly fundamental way, so they have to be faked using variables with weird random names. I have a long-term plan to add real macros, but not until after Luke Tierney's byte-code compiler is finished. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 8 char labels in read.spss
On Tue, 11 Oct 2005, Knut Krueger wrote: > I was wondering why it is possible to read long labels from the CVS > files but not from the SPSS files. The SPSS file formats are not documented, and so we rely on the code from PSPP. At the time, PSPP did not read long variable names. It now does, so it would be possible for someone update the SPSS-reading code to handle long variable names. This is much more complicated than just changing a #define; the long variable names are stored in a different part of the file. I don't expect anyone on R-core to get around to this any time soon. If you want to try, the current PSPP code is at http://savannah.gnu.org/projects/pspp -thomas > I did not have much time to search for the code but I found: > > in foreign_0.8-10 source file var.h.in > >> /* Definition of the max length of a short string value, generally >>eight characters. */ >> #define MAX_SHORT_STRING ((SIZEOF_DOUBLE)>=8 ? ((SIZEOF_DOUBLE)+1)/2*2 >> : 8) >> #define MIN_LONG_STRING (MAX_SHORT_STRING+1) >> >> /* FYI: It is a bad situation if sizeof(R_flt64) < MAX_SHORT_STRING: >>then short string missing values can be truncated in system files >>because there's only room for as many characters as can fit in a >>R_flt64. */ >> #if MAX_SHORT_STRING > 8 >> #error MAX_SHORT_STRING must be less than 8. >> #endif > > I am am right then there was a restriction in the year 2000 because the > files are from the year 2000. > > Now there are some questions: > Did I found the right code? > is it possible that anybody could recompile this with long value names > or where is the best manual for a quick start in compiling packages. > > > I found a couple of weeks before a tread where anybody wrote a complete > way for building packages. > He described all problems of him and there were a lot of hints for the > first steps, but I am not able to find it again - I don't know the > search terms which I used before :-( > > > with regards > Knut > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 8 char labels in read.spss
On Tue, 11 Oct 2005, Knut Krueger wrote: > > I found a definition of the SPSS files. > http://www.wotsit.org/download.asp?f=spssdata > but they recommend to use the spss input/output dll to ensure upward > compatbility > "Well, they would say that, wouldn't they" (Rice-Davis 1963) Unfortunately, that document is an old file format. It doesn't describe record 7 subtype 13, which is where the long variable names live. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SPSS I/O dll was: 8 char labels in read.spss
On Wed, 12 Oct 2005, Knut Krueger wrote: > Thomas Lumley schrieb: >> On Tue, 11 Oct 2005, Knut Krueger wrote: >>> I found a definition of the SPSS files. >>> http://www.wotsit.org/download.asp?f=spssdata >>> but they recommend to use the spss input/output dll to ensure upward >>> compatbility >>> >> "Well, they would say that, wouldn't they" (Rice-Davis 1963) >> >> Unfortunately, that document is an old file format. It doesn't describe >> record 7 subtype 13, which is where the long variable names live. >> > What about using the SPSS input dll for those R users which would like to use > their old SPSS files. Most universities here have SPSS licences and therefor > the dll is available. > I did not found yet any copyright notice on the developer part of the SPSS > CD. Maybe the DLL is usable with the own program without the SPSS Licence. I > will look for that if using the I/O dll is possible with R. > All necessary C codes are delivered with the developer kit on the SPSS CD. > Yes, but it can't be incorporated in the "foreign" package, which has a GPL license, and in any case wouldn't work off Windows. It's not clear that it would be much easier than using the PSPP code in any case. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why no .Machine$sizeof.double?
On Tue, 18 Oct 2005, Earl F. Glynn wrote: > Whis is there a .Machine$sizeof.longdouble but no .Machine$sizeof.double? > sizeof(double) is always 8 and sizeof(int) is always 4, because R requires the IEEE/IEC standard arithmetic types. R will not compile with any other sizes. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why no .Machine$sizeof.double?
On Tue, 18 Oct 2005, Earl F. Glynn wrote: > "Thomas Lumley" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> On Tue, 18 Oct 2005, Earl F. Glynn wrote: >> >>> Whis is there a .Machine$sizeof.longdouble but no > .Machine$sizeof.double? >>> >> >> sizeof(double) is always 8 and sizeof(int) is always 4, because R requires >> the IEEE/IEC standard arithmetic types. R will not compile with any other >> sizes. > > But it's a common, recommended software engineering practice to define > mnemonic, named constants. > > If I were to see code like .Machine$sizeof.double * N.COL * N.ROW, I know > that's the number of bytes in a matrix of doubles. If I see code that is 8 * > N.COL * N.ROW, I can guess what "8" means, but I could guess wrong. I wrote > code that looks just like this today because I couldn't find the defined > constant. Will someone else reading my code automatically know what the "8" > means? > But why would you ever want to write either .Machine$sizeof.double * N.COL * N.ROW or 8 * N.COL * N.ROW? If you are doing memory allocation in R then numeric() automatically allocates things of the correct size. If you are doing memory allocation in C then you should use sizeof(double) or, even better, sizeof(*yourpointer). In both cases, the language has the facility to let you work in the correct units without explicit constants, named or unnamed. You would have a stronger case for arguing that there should be typedefs in C so that you didn't need to remember that R's numeric type is double (rather than, er, float?)and its integer type is int (rather than long) and in fact we do provide typedef int Sint; in R.h. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] unvectorized option for outer()
On Sun, 30 Oct 2005, Jonathan Rougier wrote: > I'm not sure about this. Perhaps I am a dinosaur, but my feeling is > that if people are writing functions in R that might be subject to > simple operations like outer products, then they ought to be writing > vectorised functions! I would agree. How about an oapply() function that does multiway (rather than just two-way) outer products. Basing the name on "apply" would emphasize the similarity to other flexible, not particularly optimized second-order functions. -thomas > Maybe it's not possible to hold this line, and > maybe "outer" is not the right place to draw it, but I think we ought to > respect the "x is a vector" mindset as much as possible in the base > package. As Tony notes, the documentation does try to be clear about > what outer actually does, and how it can be used. > > So I am not a fan of the VECTORIZED argument, and definitely not a fan > of the VECTORIZED=FALSE default. > > Jonathan. > > Gabor Grothendieck wrote: >> If the default were changed to VECTORIZED=FALSE then it would >> still be functionally compatible with what we have now so all existing >> software would continue to run correctly yet would not cause >> problems for the unwary. Existing software would not have to be changed >> to add VECTORIZED=TRUE except for those, presumbly few, cases >> where outer performance is critical. One optimization might be to >> have the default be TRUE if the function is * or perhaps if its specified >> as a single character and FALSE otherwise. >> >> Having used APL I always regarded the original design of outer in R as >> permature performance optimization and this would be a chance to get >> it right. >> >> On 10/28/05, Tony Plate <[EMAIL PROTECTED]> wrote: >> >>> [following on from a thread on R-help, but my post here seems more >>> appropriate to R-devel] >>> >>> Would a patch to make outer() work with non-vectorized functions be >>> considered? It seems to come up moderately often on the list, which >>> probably indicates that many many people get bitten by the same >>> incorrect expectation, despite the documentation and the FAQ entry. It >>> looks pretty simple to modify outer() appropriately: one extra function >>> argument and an if-then-else clause to call mapply(FUN, ...) instead of >>> calling FUN directly. >>> >>> Here's a function demonstrating this: >>> >>> outer2 <- function (X, Y, FUN = "*", ..., VECTORIZED=TRUE) >>> { >>>no.nx <- is.null(nx <- dimnames(X <- as.array(X))) >>>dX <- dim(X) >>>no.ny <- is.null(ny <- dimnames(Y <- as.array(Y))) >>>dY <- dim(Y) >>>if (is.character(FUN) && FUN == "*") { >>>robj <- as.vector(X) %*% t(as.vector(Y)) >>>dim(robj) <- c(dX, dY) >>>} >>>else { >>>FUN <- match.fun(FUN) >>>Y <- rep(Y, rep.int(length(X), length(Y))) >>>if (length(X) > 0) >>>X <- rep(X, times = ceiling(length(Y)/length(X))) >>>if (VECTORIZED) >>>robj <- FUN(X, Y, ...) >>>else >>>robj <- mapply(FUN, X, Y, MoreArgs=list(...)) >>>dim(robj) <- c(dX, dY) >>>} >>>if (no.nx) >>>nx <- vector("list", length(dX)) >>>else if (no.ny) >>>ny <- vector("list", length(dY)) >>>if (!(no.nx && no.ny)) >>>dimnames(robj) <- c(nx, ny) >>>robj >>> } >>> # Some examples >>> f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p} >>> outer2(1:2, 3:5, f, 2) >>> outer2(numeric(0), 3:5, f, 2) >>> outer2(1:2, numeric(0), f, 2) >>> outer2(1:2, 3:5, f, 2, VECTORIZED=F) >>> outer2(numeric(0), 3:5, f, 2, VECTORIZED=F) >>> outer2(1:2, numeric(0), f, 2, VECTORIZED=F) >>> >>> # Output on examples >>> >>>> f <- function(x, y, p=1) {cat("in f\n"); (x*y)^p} >>>> outer2(1:2, 3:5, f, 2) >>> >>> in f >>> [,1] [,2] [,3] >>> [1,]9 16 25 >>> [2,] 36 64 100 >>> >>>> outer2(numeric(0), 3:5, f, 2) >>> >>> in f >>> [,1] [,2] [,3] >>> >>>> outer2(1:2, numeric(0), f, 2) >>> >>> in f >>>
Re: [Rd] [R] unvectorized option for outer()
On Mon, 31 Oct 2005, Liaw, Andy wrote: >> From: Thomas Lumley >> >> On Sun, 30 Oct 2005, Jonathan Rougier wrote: >> >>> I'm not sure about this. Perhaps I am a dinosaur, but my feeling is >>> that if people are writing functions in R that might be subject to >>> simple operations like outer products, then they ought to be writing >>> vectorised functions! >> >> I would agree. How about an oapply() function that does >> multiway (rather >> than just two-way) outer products. Basing the name on "apply" would >> emphasize the similarity to other flexible, not particularly >> optimized >> second-order functions. >> >> -thomas > > I'll toss in my $0.02: The following is my attempt at creating a "general > outer" that works with more than two arguments. > > gouter <- function (x, FUN, ...) { >xgrid <- as.list(do.call("expand.grid", x)) >names(xgrid) <- NULL >array(do.call(deparse(substitute(FUN)), c(xgrid, list(...))), >dim = sapply(x, length), dimnames = x) > } > Yes, that's the sort of thing I had in mind. The name "gouter" isn't exactly to my taste -- the point was that outer() is the fast but restricted function and that the general but slow function should have a different name (eg xapply or oapply). -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] unvectorized option for outer()
On Tue, 1 Nov 2005, Duncan Murdoch wrote: > The version I posted yesterday did indeed mess up when some arguments were > unspecified. Here's a revision that seems to work in all the tests I can > think of. I also added the SIMPLIFY and USE.NAMES args from mapply to it, > and a sanity check to the args. > > I did notice and work around one buglet in mapply: if you choose not to > vectorize any arguments, you don't get a call to the original function, > mapply returns "list()". > > For example, > >> mapply(function(x) x^2, MoreArgs = list(x=2)) > list() > > whereas I would think 4 is a more logical answer. > I don't agree at all. The answer should be the length of the longest vectorised argument, and it is. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] A problem with glm() and possibly model-using functions in general?
On Fri, 18 Nov 2005, Byron Ellis wrote: > So, consider the following: > > > example(glm) > > g = function(model) { w = runif(9);glm(model,weights=w); } > > g(counts ~ outcome + treatment) > Error in eval(expr, envir, enclos) : object "w" not found > > Huh?! I suspect that somebody is lazily evaluating arguments in the > wrong environment (probably GlobalEnv in this case). I'm willing to > accept the fact that there's some mysterious reason you'd actually > want this behavior, but this looks like it should be filed as a bug > to me. Yes, there is a reason you'd actually want this behaviour, and it is documented. In help(model.frame) it says All the variables in 'formula', 'subset' and in '...' are looked for first in 'data' and then in the environment of 'formula' (see the help for 'formula()' for further details) and collected into a data frame. In your example the environment of 'formula' is the global environment, since that's where it was created. There isn't a set of scoping rules for formulas that will make everyone happy, but this lexical scope is what R has done for quite some time. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] computing the variance
> > Using the biased variance just because it is the MLE (if that is the > argument) seems confused to me. However, there's another point: > >> var(sample(1:3, 10, replace=TRUE)) > [1] 0.6680556 > > i.e. if we are considering x as the entire population, then the > variance when sampling from it is indeed 1/N*E(X-EX)^2, which is why > some presentations distinguish between the "population" and "sample" > variances. We might want to support this distinction somehow. > We might also consider that the purpose of computing the variance is often to take the square root, and that using 1/(n-1) as the divisor does not give any particular optimality as an estimator of the standard deviation. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 0/1 vector for indexing leads to funny behaviour (PR#8389) (maybe a documentation deficiency?)
On Wed, 14 Dec 2005, Tony Plate wrote: > > That's what I was trying to say: the whole truth is that numeric index > vectors that contain positive integral quantities can also contain > zeros. Upon rereading this passage yet again, I think it is more > misleading than merely incomplete: the phrasings "positive integral > quantities", and "*must* lie in the set ..." rule out the possibility of > the vector containing zeros. > "Someone told me that you can't run without bouncing the ball in basketball. I got a basketball and tried it and it worked fine. He must be wrong" -- a comp.lang.c standard It doesn't rule out the the possibility of the vector containing zeros, it tells you that you should not put zeros in the vector. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 2 x 2 chisq.test (PR#8415)
This is the same as PR#8265, which was reported two months ago by someone else from syd.odn.ne.jp. It still isn't a bug. According to Brian Ripley's response at that time, "almost all" the sources he checked gave the correction that R uses. -thomas On Tue, 20 Dec 2005, [EMAIL PROTECTED] wrote: > Full_Name: nobody > Version: 2.2.0 > OS: any > Submission from: (NULL) (219.66.34.183) > > > 2 x 2 table, such as > >> x > [,1] [,2] > [1,] 10 12 > [2,] 11 13 > >> chisq.test(x) > > Pearson's Chi-squared test with Yates' > continuity correction > > data: x > X-squared = 0.0732, df = 1, p-value = 0.7868 > > but, X-squared = 0.0732 is over corrected. > > when abs(a*d-b*c) <= sum(a,b,c,d), chisq.value must be 0!, and P-value must be > 1! > > code of chisq.test must be as follows > > # if (correct && nrow(x) == 2 && ncol(x) == 2) { > # YATES <- 0.5 > # METHOD <- paste(METHOD, "with Yates' continuity correction") > # } > # else YATES <- 0 > # STATISTIC <- sum((abs(x - E) - YATES)^2/E) > ## replace begin > if (correct && nrow(x) == 2 && ncol(x) == 2) { > STATISTIC <- if (abs(x[1,1]*x[2,2]-x[1,2]*x[2,1]) < sum(x)/2) > 0 > > else sum((abs(x - E) - 0.5)^2/E) > METHOD <- paste(METHOD, "with Yates' continuity correction") > } > else STATISTIC <- sum((abs(x - E))^2/E) > ## replace end > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Multiplication (PR#8466)
On Fri, 6 Jan 2006, [EMAIL PROTECTED] wrote: > hi - in version 2.1 the command > > >-2^2 > > gives > > -4 > > as the answer. (-2)^2 is evaluated correctly. So is -2^2. The precedence of ^ is higher than that of unary minus. It may be surprising, but it *is* documented and has been in S for a long time. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] prod(numeric(0)) surprise
On Mon, 9 Jan 2006, Martin Morgan wrote: > I guess I have to say yes, I'd exepct > > x <- 1:10 > sum(x[x>10]) ==> numeric(0) > > this would be reinforced by recongnizing that numeric(0) is not zero, > but nothing. I guess the summation over an empty set is an empty set, > rather than a set containing the number 0. Certainly these > > exp(x[x>10]) ==> numeric(0) > numeric(0) + 1 ==> numeric(0) > There are some fairly simple rules in how R does it. You do need to distinguish between functions (binary operators) that map two vectors of length n to a vector of length n and functions such as prod and sum that map a vector of length n to a vector of length 1. The output of sum and prod is always of length 1, so sum(numeric(0)) and prod(numeric(0)) should be of length 1 (or give an error). It is convenient that sum(c(x,y)) is equal to sum(x)+sum(y) and that prod(c(x,y)) is equal to prod(x)*prod(y), which motivates making sum(numeric(0)) give 0 and prod(numeric(0)) give 1. Single argument functions such as exp(numeric(0)) seem fairly obvious: you have no numbers and you exponentiate them so you still have no numbers. You could also argue based on c() and exp() commuting. The rules for binary operators are a little less tidy [my fault]. They come from the idea that x+1 should always add 1 to each element of x. If you add 1 to each element of numeric(0) you get numeric(0). The usual recycling rule says that the shorter vector should be repeated to make it the same length as the longer vector, so this is a wart. On the other hand, you can't recycle a vector of length 0 to have length 1, so the usual recycling rule can't be applied here. This also makes matrix operations work, at least in the sense of getting matrices of the right dimension. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with c++ .C call
On Tue, 10 Jan 2006, Dominick Samperi wrote: > Sean, > > prm in your function calcStepgram is NOT a vector of doubles, it is of type > SEXP, and you need to use R macros to fetch the value(s). This is done > automatically in the Rcpp package, and if you want to see how this is > done look at the definition of the class RcppVector in Rcpp.cpp Not at all. He is using .C, which passes a double * to the C function. You may be thinking of .Call -thomas > Dominick > > Sean Davis wrote: >> I am still having some difficulties with connecting R to a C++ function. I >> am able to call the function as expected after compiling the shared library >> and such. However, the call to the function is via .C; parameters from the >> .C call are not being passed correctly to the function. As an example, I >> have attached a GDB run of the code. I set a breakpoint on entry to the >> function I am calling from R. What is bothering me (and probably causing >> the segmentation faults I am seeing) is that the parameter >> prm=as.double(c(3.2,1.1)) is not 3.2,1.1 IMMEDIATELY after the call to .C. >> I am sure I am missing something very basic. >> >> Thanks, >> Sean >> >> >> >>> sessionInfo() >>> >> R version 2.2.0, 2005-08-11, powerpc-apple-darwin7.9.0 >> >> attached base packages: >> [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" >> [7] "base" >> >> >> >> >> holmes:~/Mercury/projects/R/StepGram sdavis$ R -d gdb >> GNU gdb 6.1-20040303 (Apple version gdb-413) (Wed May 18 10:17:02 GMT 2005) >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "powerpc-apple-darwin"...Reading symbols for >> shared libraries ... done >> >> (gdb) r >> Starting program: >> /Users/sdavis/R-devel2/R.framework/Versions/2.2.0/Resources/bin/exec/R >> Reading symbols for shared libraries ...+ done >> >> R : Copyright 2005, The R Foundation for Statistical Computing >> Version 2.2.0 Under development (unstable) (2005-08-11 r35256) >> ISBN 3-900051-07-0 >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for a HTML browser interface to help. >> Type 'q()' to quit R. >> >> Reading symbols for shared libraries >> . done >> Reading symbols for shared libraries . done >> Reading symbols for shared libraries . done >> >>> dyn.load('StepGram/src/Stepgram.so') >>> >> Reading symbols for shared libraries .. done >> >>> mySG <- function(dat1,thresh,noise) { >>> >> vec <- c(thresh,noise) >> .C('calcStepgram', >> data=as.double(dat1), >> prm=as.double(vec), >> intervals=double(1*3+1), >> max=as.integer(1), >> n=as.integer(length(dat1)), >> plot=double(length(dat1)))} >> >>> dat <- c(0.01,0.1,-0.2, 0.1,-0.1,-1000,3.2,3.5,-1.3,3.1, >>> >> 3.2,3.1,-0.1,0.2,0.15,-0.05,-0.1,0.2,0.1,-0.1) >> >> Program received signal SIGINT, Interrupt. >> 0x9001f208 in select () >> (gdb) break calcStepgram >> Breakpoint 1 at 0x10bb418: file Stepgram.cpp, line 22. >> (gdb) c >> Continuing. >> >>> mySG(dat1=dat,thresh=3.2,noise=1.1) >>> >> >> Breakpoint 1, calcStepgram (data=0x1137048, prm=0x1c81eb0, >> intervals=0xbfffd7e0, max=0x2954840, n=0xbfffd6c0, plot=0x180d574) at >> Stepgram.cpp:22 >> (gdb) print prm[0] >> $1 = 1.7716149411915527e-303<<<<<--This should be 3.2! >> Current language: auto; currently c++ >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with c++ .C call
On Tue, 10 Jan 2006, Sean Davis wrote: > and such. However, the call to the function is via .C; parameters from the > .C call are not being passed correctly to the function. As an example, I > have attached a GDB run of the code. I set a breakpoint on entry to the > function I am calling from R. What is bothering me (and probably causing > the segmentation faults I am seeing) is that the parameter > prm=as.double(c(3.2,1.1)) is not 3.2,1.1 IMMEDIATELY after the call to .C. Is this compiled with optimization? If so, you can't conclude much from the gdb info as the code can be executed in a different order from how it's written. When I use this example extern "C" void calcStepgram(double *data, double *prm, double *intervals, int *max, int *n,double *plot) { prm[0]=data[0]; return; } if I compile with -g -02 (the default) it looks as though there are problems with initialization like the ones you report, but in fact the function works correctly. If I compile without optimization the initialization looks fine. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with c++ .C call
On Tue, 10 Jan 2006, Sean Davis wrote: > > Thanks, Thomas. That did fix the initialization issue (or apparent one). > Unfortunately, the reason that I started debugging was for segmentation > faults, which have not gone away. However, it now looks like the problem is > internal to the C++ code and not with the way the arguments were being > passed. If you can get access to a Linux machine then it's worth trying Valgrind, which is very helpful for this sort of thing. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [OT] GPL3 draft
Since someone is bound to point this out soon I will note that a) A discussion draft of the proposed GPL version 3 is up at http://gplv3.fsf.org/ b) If you have comments on the draft, send them to the FSF rather than to r-devel -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8500) standard install on OS 10.3.9 crashes on start without useful diagnostics (PR#8500)
This won't actually help you at all, but I used the standard install on OS 10.3.9 (Powerbook G4) just last week without any problems. On the other hand, my install was on a machine that had previously had other versions of R. -thomas On Wed, 18 Jan 2006, [EMAIL PROTECTED] wrote: > Full_Name: Bob Donahue > Version: 2.2 > OS: Mac OS 10.3.9 > Submission from: (NULL) (204.152.13.26) > > > That's pretty much it. I did the most basic install possible, tried running > the > package through the GUI and on the command line, it crashed hard, immediately, > with absolutely NO useful information as to WHY it crashed. > > To reproduce: > 1) get the installer for OS X > 2) install in the default places > 3) run > 4) watch it crash with no useful diagnostics > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Minumum memory requirements to run R.
On Mon, 23 Jan 2006, Hin-Tak Leung wrote: > Prof Brian Ripley wrote: >> We know: we even document it in the appropriate places. > > I went and have a look - it is the last section of R-admin (and of > course, for those who "read the source", R/include/Rinternals.h). It > would be good to mention this in the FAQ (which it doesn't, or maybe I > didn't look hard enough), or the beginning of R-admin? > It's not in the FAQ because it isn't a FAQ (yet). If you use the PDF manual it is in the table of contents on page i. In the HTML manual it is admittedly less clear: there isn't a table of contents and there is nothing obvious in the index. To some extent this is a problem with all the manuals. The structure in the .texi file isn't translated well to HTML form by the makeinfo tools. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Minumum memory requirements to run R.
On Mon, 23 Jan 2006, Hin-Tak Leung wrote: > > The 32-bit/64-bit issue affects purchasing or upgrading decisions > - whether one wants to spend the money on buying cheaper > 32-bit machines, versus more expensive 64-bit machines. That > decision would be based on information available while *not* having > an operational R installation... > Not necessarily. It's perfectly feasible to use a 32-bit build on a 64-bit machine, as it says in the manual, which is available from http://www.r-project.org whether or not you have an R installation. -thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R support for 64 bit integers
On Tue, 10 Aug 2010, Martin Maechler wrote: {Hijacking the thread from from R-help to R-devel -- as I am consciously shifting the focus away from the original question ... } David Winsemius on Tue, 10 Aug 2010 08:42:12 -0400 writes: > On Aug 9, 2010, at 2:45 PM, Theo Tannen wrote: >> Are integers strictly a signed 32 bit number on R even if >> I am running a 64 bit version of R on a x86_64 bit >> machine? >> >> I ask because I have integers stored in a hdf5 file where >> some of the data is 64 bit integers. When I read that >> into R using the hdf5 library it seems any integer >> greater than 2**31 returns NA. > That's the limit. It's hard coded and not affected by the > memory pointer size. >> >> Any solutions? > I have heard of packages that handle "big numbers". A bit > of searching produces suggestions to look at gmp on CRAN > and Rmpfr on R-Forge. Note that Rmpfr has been on CRAN, too, for a while now. If you only need large integers (and rationals), 'gmp' is enough though. *However* note that the gmp or Rmpfr (or any other arbitray precision) implementation will be considerably slower in usage than if there was native 64-bit integer support. Introducing 64-bit integers natively into "base R" is an "interesting" project, notably if we also allowed using them for indices, and changed the internal structures to use them instead of 32-bit. This would allow to free ourselves from the increasingly relevant maximum-atomic-object-length = 2^31 problem. The latter is something we have planned to address, possibly for R 3.0. However, for that, using 64-bit integers is just one possibility, another being to use "double precision integers". Personally, I'd prefer the "long long" (64-bit) integers quite a bit, but there are other considerations, e.g., one big challenge will be to go there in a way such that not all R packages using compiled code will have to be patched extensively... another aspect is how the BLAS / Lapack team will address the problem. At the moment, all the following are the same type: length of an R vector R integer type C int type Fortran INTEGER type The last two are fixed at 32 bits (in practice for C, by standard for Fortran), and we would like the first and perhaps the second to become 64bit. If both the R length type and the R integer type become the same 64bit type and replace the current integer type then every compiled package has to change to declare the arguments as int64 (or long, on most 64bit systems) and INTEGER*8. That should be all that is needed for most code, since C compilers nowadays already complain if you do unclean things like stuffing an int into a pointer. If the R length type changes to something /different/ from the integer type then any compiled code has to be checked to see if C int arguments are lengths or integers, which is more work and more error-prone. On the other hand, changing the integer type to 64bit will presumably make integer code run noticeably more slowly on 32bit systems. In both cases, the changes could be postponed by having an option to .C/.Call forcing lengths and integers to be passed as 32-bit. This would mean that the code couldn't use large integers or large vectors, but it would keep working indefinitely. -thomas Thomas Lumley Professor of Biostatistics University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Non identical numerical results from R code vs C/C++ code?
On Fri, 10 Sep 2010, Duncan Murdoch wrote: On 10/09/2010 7:07 AM, Renaud Gaujoux wrote: Thank you Duncan for your reply. Currently I am using 'double' for the computations. What type should I use for extended real in my intermediate computations? I think it depends on compiler details. On some compilers "long double" will get it, but I don't think there's a standard type that works on all compilers. (In fact, the 80 bit type on Intel chips isn't necessarily supported on other hardware.) R defines LDOUBLE in its header files and it is probably best to use that if you want to duplicate R results. As a little more detail, 'long double' is in the C99 standard and seems to be fairly widely implemented, so code using it is likely to compile. The Standard, as usual, doesn't define exactly what type it is, and permits it to be a synonym for 'double', so you may not get any extra precision. On Intel chips it is likely to be the 80-bit type, but the Sparc architecture doesn't have any larger hardware type. Radford Neal has recently reported much slower results on Solaris with long double, consistent with Wikipedia's statement that long double is sometimes a software-implemented 128-bit type on these systems. The result will still be 'double' anyway right? Yes, you do need to return type double. Duncan Murdoch On 10/09/2010 13:00, Duncan Murdoch wrote: On 10/09/2010 6:46 AM, Renaud Gaujoux wrote: Hi, suppose you have two versions of the same algorithm: one in pure R, the other one in C/C++ called via .Call(). Assuming there is no bug in the implementations (i.e. they both do the same thing), is there any well known reason why the C/C++ implementation could return numerical results non identical to the one obtained from the pure R code? (e.g. could it be rounding errors? please explain.) Has anybody had a similar experience? R often uses extended reals (80 bit floating point values on Intel chips) for intermediate values. C compilers may or may not do that. By not identical, I mean very small differences (< 2.4 e-14), but enough to have identical() returning FALSE. Maybe I should not bother, but I want to be sure where the differences come from, at least by mere curiosity. Briefly the R code perform multiple matrix product; the C code is an optimization of those specific products via custom for loops, where entries are not computed in the same order, etc... which improves both memory usage and speed. The result is theoretically the same. Changing the order of operations will often affect rounding. For example, suppose epsilon is the smallest number such that 1 + epsilon is not equal to 1. Then 1 + (epsilon/2) + (epsilon/2) will evaluate to either 1 or 1 + epsilon, depending on the order of computing the additions. Duncan Murdoch Thank you, Renaud ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Professor of Biostatistics University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Assignment to a slot in an S4 object in a list seems to violate copy rules?
On Thu, Sep 30, 2010 at 8:15 AM, peter dalgaard wrote: > > On Sep 30, 2010, at 16:19 , Niels Richard Hansen wrote: > >> setClass("A", representation(a = "numeric")) >> B <- list() >> myA <- new("A", a = 1) >> B$otherA <- myA >> b$oth...@a <- 2 >> m...@a > > R version 2.12.0 Under development (unstable) (2010-09-13 r52905) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > --- not anymore, it seems: --- >> setClass("A", representation(a = "numeric")) > [1] "A" >> B <- list() >> myA <- new("A", a = 1) >> B$otherA <- myA >> b$oth...@a <- 2 >> m...@a > [1] 1 >> sessionInfo() > R version 2.12.0 alpha (2010-09-29 r53067) > Platform: x86_64-apple-darwin10.4.0 (64-bit) > > So somewhere in the last 162 commits, this got caught. Probably r52914, but > it looks like it hasn't been recored in NEWS (and it should be as this was > apparently a live bug, not an obscure corner case): > > r52914 | luke | 2010-09-15 19:06:13 +0200 (Wed, 15 Sep 2010) | 4 lines > > Modified applydefine to duplicate if necessary to ensure that the > assignment target in calls to assignment functions via the complex > assignment mechanism always has NAMED == 1. > Yes, that was the one. It was reported as a bug back then too, and there was quite a bit of discussion that ended up with Luke's fix. -thomas -- Thomas Lumley Professor of Biostatistics University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] DESCRIPTION file and Rd examples
On Mon, Apr 18, 2011 at 2:00 PM, Dario Strbenac wrote: > It turns out what I needed was IRanges in the Imports: field. I assumed that > require(GenomicRanges) at the top of my function would, as a result of > loading GenomicRanges, implicitly already know about all the IRanges classes, > because GenomicRanges itself depends on them. If GenomicRanges just imports IRanges (rather than using library()/require()) then the IRanges functions won't be exported to your package. This is a Good Thing -- you only get the function redefinitions you ask for, rather than everything in the dependency tree. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wishlist: write R's bin path to the PATH variable and remove the version string in the installation dir under Windows
On Wed, May 4, 2011 at 3:25 PM, Yihui Xie wrote: > 1. "Few Windows users use these commands" does not imply they are not > useful, and I have no idea how many Windows users really use them. How > do you run "R CMD build" when you build R packages under Windows? You > don't write "C:/Program Files/R/R-2.13.0/bin/i386/R.exe CMD build", do > you? > > I think the reason we have to mess with the PATH variable for each > single software package is that Windows is Not Unix, so you may hate > Windows instead of a package that modifies your PATH variable. > > For the choice of i386 and x64, you can let the user decide which bin > path to use. I believe the number of users who frequently switch back > and forth is fairly small. > > 2. Under most circumstances I just keep the latest version of R. To > maintain R code with old R versions will be more and more difficult > with new features and changes coming in. Disk space is cheap, but time > is not. > I keep old versions for basically the same reasons you don't -- that is, I have analyses that ran under the old versions, and I can be sure they will give the same answer a year later if I keep the old versions. This isn't so much because of changes in R as because of changes in packages. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RProfmem output format
On Sat, May 14, 2011 at 5:00 AM, Hadley Wickham wrote: > Hi all, > > When I run the example in RProfmem, I get: > > > Rprofmem("Rprofmem.out", threshold=1000) > example(glm) > Rprofmem(NULL) > noquote(readLines("Rprofmem.out", n=5)) > > ... > > [1] 1384 :5416 :5416 :1064 :1064 :"readRDS" "index.search" "example" > [2] 1064 :"readRDS" "index.search" "example" > [3] 4712 :"readRDS" "index.search" "example" > [4] 4712 :"readRDS" "index.search" "example" > [5] 1064 :"readRDS" "index.search" "example" > > What do 5 the numbers in the first line mean? > > In the subsequence lines I'm assuming the structure is bytes allocated : call. I think the five numbers come from four memory allocations before example() is called. Looking at the code in src/main/memory.c, it prints newline only when the call stack is not empty. I don't see why this is done, and I may well be the person who did it (I don't have svn on this computer to check), but it is clearly deliberate. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RProfmem output format
On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham wrote: > So what causes allocations when the call stack is empty? Something > internal? Does the garbage collector trigger allocations (i.e. could > it be caused by moving data to contiguous memory)? The garbage collector doesn't move anything, it just swaps pointers in a linked list. The lexer, parser, and evaluator all have to do some work before a function context is set up for the top-level function, so I assume that's where it is happening. > Any ideas what the correct thing to do with these memory allocations? > Ignore them because they're not really related to the function they're > attributed to? Sum them up? > >> I don't see why this is done, and I may well be the person who did it >> (I don't have svn on this computer to check), but it is clearly >> deliberate. > > It seems like it would be more consistent to always print a newline, > and then it would obvious those allocations occurred when the call > stack was empty. This would make parsing the file a little bit > easier. Yes. It's obviously better to always print a newline, and so clearly deliberate not to, that I suspect there may have been a good reason. If I can't work it out (after my grant deadline this week) I will just assume it's wrong. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible bug in termplot function (stats package) ?
On Tue, Jun 7, 2011 at 7:30 AM, Joris Meys wrote: > On Mon, Jun 6, 2011 at 9:15 PM, peter dalgaard wrote: >> >> I'd say that the burden of proof is really on your side, but how hard can it >> be: >> >>> x <- 1:10 >>> y <- rnorm(10) >>> m <- lm(y~x) >>> m$call >> lm(formula = y ~ x) >>> m$call$data >> NULL >> > I see... indeed, thx for the answer and sorry for my stupidity. Should > have thought about that case. Still, I wonder why it is necessary to > go look for the data in a calling environment if it should be > contained in the model frame of the fitted object. Or am I again wrong > in assuming that's always the case? You are again wrong. Life would be much simpler if the data were always available like that, but there are at least two problems. 1) There need not be a model frame in the fitted object. (it's optional) 2) More importantly, if you have y~sin(x), the model frame will contain sin(x), not x. For what termplot() does, it has to be able to reconstruct 'x', which isn't possible without the original data. It's quite possible that termplot() could be rewritten to be simpler and more general, but I don't think minor editing will do it. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .C and .Call: convolve example not working
On Fri, Aug 12, 2011 at 1:06 AM, Lars Wißler wrote: > Dear R users, > > I want to call C code via the .C or .Call interface. This works fine > with integer values but using doubles the array received in C will be > set to zeros. > > I have tried the convolve examples (Writing R extensions, chapter 5.2) > and still the resulting array consists of zeros. > > My code (shortened for my purposes. Original did not work either): > > convolve.r > > a=array(1, c(4)) > b=array(3, c(4)) > > print(a) > print(b) > > .C("convolve", > as.double(a), > as.double(b)) > > > convolve.c > > void convolve(double* a, double* b){ > int i; > > for(i=0;i<4;i++) Rprintf("a: %d", a[i]); > for(i=0;i<4;i++) Rprintf("b: %d", b[i]); > } The C code here is wrong for two reasons. Firstly, there's no declaration for Rprintf(), because you don't include the header file. Secondly, you're using %d to print, which means you are telling the compiler you're passing ints to Rprintf(), but you are actually passing doubles. When I fix these problems the code works for me. -thomas Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] readChar and blocking connections
If readChar() reads from a connection and the specified number of bytes are not all available, it returns with what it can find. That's as documented (further down the help page) and it's what the low-level recv() does. However, for a blocking connection, wouldn't it be more natural to retry until the requested data are available? -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] internal copying in R (soon to be released R-3.1.0
On Mon, Mar 3, 2014 at 2:50 PM, Simon Urbanek wrote: > Jens, > > On Mar 3, 2014, at 3:35 PM, Jens Oehlschlägel < > jens.oehlschlae...@truecluster.com> wrote: > > > > > Well, I did read, for example "Writing R Extensions" (Version 3.1.0 > Under development (2014-02-28)) chapter "5.9.10 Named objects and copying" > which says "Currently all arguments to a .Call call will have NAMED set to > 2, and so users must assume that they need to be duplicated before > alteration." > > Matthew pointed out that line and I cannot shed more light on it, since > it's not true - at least not currently. > > > > This is consistent with the observation of my test code: that NAMED() in > .Call always returns 2. > > It is not - you're not testing .Call() - your'e testing the assignments in > frames which cause additional bumps of NAMED. If you actually test .Call() > you'll see what I have reported - .Call() itself does NOT affect NAMED. > > I think that's what R-ext is saying. The context is assignment functions, and the section is explaining that while you might think you can write assignment functions with .Call() that modify their arguments, by the time the object gets into .Call() it will have NAMED==2 because there's the object you're trying to modify and the reference to it inside the assignment function. If you write assignment functions that way, you need to make potentially unsafe assumptions. For example, that you can't get an error halfway up a chain of nested complex assignments when it's too late to back out of the expression. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why did R 3.0's resolveNativeRoutine remove full-search ability?
On Sat, Apr 19, 2014 at 2:29 AM, Simon Urbanek wrote: > Andrew, > > On Apr 18, 2014, at 9:55 AM, Andrew Piskorski wrote: > >> In versions of R prior to 3.0, by default .C and .Call would find the >> requested C function regardless of which shared library it was located >> in. You could use the PACKAGE argument to restrict the search to a >> specific library, but doing so was not necessary for it to work. >> >> R 3.0 introduced a significant change to that behavior; from the NEWS >> file: >> >> CHANGES IN R 3.0.0: >> PERFORMANCE IMPROVEMENTS: >>* A foreign function call (.C() etc) in a package without a PACKAGE >> argument will only look in the first DLL specified in the >> NAMESPACE file of the package rather than searching all loaded >> DLLs. A few packages needed PACKAGE arguments added. >> >> That is not merely a performance improvement, it is a significant >> change in functionality. Now, when R code in my package foo tries to >> call C code located in bar.so, it fails with a "not resolved from >> current namespace (foo)" error. It works if I change all my uses of >> .C and .Call to pass a PACKAGE="bar" argument. Ok, I can make that >> change in my code, no big deal. >> >> What surprises me though, is that there appears to be no way to invoke >> the old (and very conventional Unix-style), "I don't want to specify >> where the function is located, just keep searching until you find it" >> behavior. Is there really no way to do that, and if so, why not? >> >> Comparing the R sources on the 3.1 vs. 2.15 branches, it looks as if >> this is due to some simple changes to resolveNativeRoutine in >> "src/main/dotcode.c". Specifically, the newer code adds this: >> >> errorcall(call, "\"%s\" not resolved from current namespace (%s)", >> buf, ns); >> >> And removes these lines: >> >> /* need to continue if the namespace search failed */ >> *fun = R_FindSymbol(buf, dll.DLLname, symbol); >> if (*fun) return args; >> >> Is that extra call to R_FindSymbol really all that's necessary to >> invoke the old "keep searching" behavior? Would it be a good idea to >> provide an optional way of finding a native routine regardless of >> where it's located, perhaps via an optional PACKAGE=NA argument to .C, >> .Call, etc.? >> >> And now I see that help(".Call") says: >> >> 'PACKAGE = ""' used to be accepted (but was undocumented): it is >>now an error. >> >> I assume passing PACKAGE="" used to invoke the same "keep searching" >> behavior as not passing any PACKAGE argument at all. So apparently >> the removal of functionality was intentional. I'd like to better >> understand why. Why should that be an error? Or said another way, >> why has traditional Unix-style symbol resolution been banned from use >> with .C and .Call ? >> > > I cannot speak for the author, but a very strong argument is to prevent > (symbol) namespace issues. If you cannot even say where the symbol comes > from, you have absolutely no way of knowing that the symbol you get has > anything to do with the symbol you intended to get, because you could get any > random symbol in any shared object that may or may not have anything to do > with your code. Note that even you as the author of the code have no control > over the namespace so although you intended this to work, loading some other > package can break your code - and in a fatal manner since this will typically > lead to a segfault. Do you have any strong use case for allowing this given > how dangerous it is? Ever since symbol registration has been made easy, it's > much more efficient and safe to use symbols directly instead. > As a follow-up to this, note that with traditional Unix symbol resolution it was forbidden to have two different routines with the same name linked into an object. That just isn't an option for R because of the package system. This isn't theoretical: the PACKAGE= argument was introduced when finding the wrong symbol resolution became a real problem late last century (http://marc.info/?l=r-devel&m=107151103308418&w=2), but there wasn't a good cross-package calling mechanism for quite a while. Now there is a cross-package mechanism that works, where the Unix-style approach cannot be made to work safely with packages from multiple authors. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] capture.output on S4 slot
On Fri, Jul 18, 2014 at 4:00 PM, Dario Strbenac wrote: > Hello, > > capture.output produces a different result if the S4 object was created with > a constructor than if the body of the constructor is copied and pasted. > > setClass("TransformParams", representation( > transform = "function", > otherParams = "list") > ) > > setGeneric("TransformParams", function(transform, ...) > {standardGeneric("TransformParams")}) > setMethod("TransformParams", character(0), function() > { > new("TransformParams", transform = function(){}, otherParams = list()) > }) > >> capture.output(TransformParams()@transform) > [1] "function () " "{" > [3] "}""" >> capture.output(new("TransformParams", transform = function(){}, otherParams >> = list())@transform) > [1] "function(){}" > > Why is the function split into three parts if a constructor is used ? > > -- > Dario Strbenac > PhD Student > University of Sydney > Camperdown NSW 2050 > Australia Dario, When you use the constructor, the environment of the function is the environment inside the constructor; when you use new() it is R_GlobalEnv The way functions print is that they print their environment when it isn't something special like R_GlobalEnv. Since capture.output captures the printed output, that's what it sees. The function isn't split up; the printed output is. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Speed difference between df$a[1] and df[1,"a"]
On Wed, Oct 19, 2011 at 2:34 PM, Stavros Macrakis wrote: > I was surprised to find that df$a[1] is an order of magnitude faster than > df[1,"a"]: Yes. This treats a data frame as a list, and is much faster. > I thought this might be because df[,] builds a data frame before simplifying > it to a vector, but with drop=F, it is even slower, so that doesn't seem to > be the problem: drop=FALSE creates a data frame first, and then simplifies it to a vector, so this test isn't showing what you think it is. > I then wondered if it might be because '[' allows multiple columns and > handles rownames. Sure enough, '[[,]]', which allows only one column, and > does not handle rownames, is almost 3x faster: That's part of it, but if you look at [.data.frame you see there is also quite a bit of copying that could be avoided in simple cases but is hard to avoid in full generality. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] C function is wrong under Windows 7
On Mon, Oct 24, 2011 at 6:04 AM, Evarist Planet wrote: > Dear mailing list, > > I have a C function that gives me a wrong result when I run it under Windows > 7. The fact that you extract pointers to the contents of fchr and sign before coercing them to REALSXP is deeply suspicious to me, though I admit I don't see how it causes the problem, since you don't use the new versions, you don't overwrite them, and the old versions should still be protected as arguments to the function. > SEXP getEs(SEXP fchr, SEXP sign) > { > int i, nfchr, nsign; > double *rfchr = REAL(fchr), *res; > int *rsign = INTEGER(sign); > > PROTECT(fchr = coerceVector(fchr, REALSXP)); > PROTECT(sign = coerceVector(sign, REALSXP)); > > nfchr = length(fchr); > nsign = length(sign); > > SEXP es; > PROTECT(es = allocVector(REALSXP, nfchr)); > res = REAL(es); > > double nr = getNr(rfchr, rsign, nsign); > > SEXP phit; > PROTECT(phit = allocVector(REALSXP, nfchr)); > double *rphit; > rphit = REAL(phit); > for(i = 0; i < nfchr; i++) rphit[i] = 0.0; > getPhit(rfchr, rsign, nsign, nr, rphit); > cumsum(rphit, nfchr); > > SEXP pmiss; > PROTECT(pmiss = allocVector(REALSXP, nfchr)); > double *rpmiss; > rpmiss = REAL(pmiss); > for(i = 0; i < nfchr; i++) rpmiss[i] = 0.0; > getPmiss(rsign, nfchr, nsign, rpmiss); > cumsum(rpmiss, nfchr); > > for (i = 0; i < nfchr; ++i) { > res[i] = rphit[i] - rpmiss[i]; > } > > UNPROTECT(5); > return es; > } > > Could you please help me to find out what I am doing wrong? > > Many thanks in advance, > > -- > Evarist Planet > Research officer, Bioinformatics and Biostatistics unit > IRB Barcelona > Tel (+34) 93 402 0553 > Fax (+34) 93 402 0257 > > evarist.pla...@irbbarcelona.org > http://www.irbbarcelona.org/bioinformatics > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] droplevels: drops contrasts as well
On Fri, Oct 21, 2011 at 5:57 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote: > Dear all, > > Today I figured out that there is a neat function called droplevels, > which, well, drops unused levels in a data frame. I tried the function > with some of my data sets and it turned out that not only the unused > levels were dropped but also the contrasts I set via "C". I had a look > into the code, and this behaviour arises from the fact that droplevels > uses simply factor to drop the unused levels, which uses the default > contrasts as set by options("contrasts"). > > I think this behaviour is annoying, because if one does not look > carefully enough, one looses the contrasts silently. Hence may I suggest > to change the code of droplevels to something like the following: This silently changes the contrasts -- eg, if the first level of the factor is one of the empty levels, the reference level used by contr.treatment() will change. Also, if the contrasts are a matrix rather than specifying a contrast function, the matrix will be invalid for the the new factor. I think just having a warning would be better -- in general it's not clear what (if anything) it means to have the same contrasts on factors with different numbers of levels. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Moderating consequences of garbage collection when in C
On Tue, Nov 15, 2011 at 8:47 AM, wrote: > dhi...@sonic.net wrote: >> Martin Morgan wrote: >> > Allocating many small objects triggers numerous garbage collections as R >> > grows its memory, seriously degrading performance. The specific use case >> > is in creating a STRSXP of several 1,000,000's of elements of 60-100 >> > characters each; a simplified illustration understating the effects >> > (because there is initially little to garbage collect, in contrast to an >> > R session with several packages loaded) is below. > >> What a coincidence -- I was just going to post a question about why it >> is so slow to create a STRSXP of ~10,000,000 unique elements, each ~10 >> characters long. I had noticed that this seemed to show much worse >> than linear scaling. I had not thought of garbage collection as the >> culprit -- but indeed it is. By manipulating the GC trigger, I can >> make this operation take as little as 3 seconds (with no GC) or as >> long as 76 seconds (with 31 garbage collections). > > I had done some google searches on this issue, since it seemed like it > should not be too uncommon, but the only other hit I could come up > with was a thread from 2006: > > https://stat.ethz.ch/pipermail/r-devel/2006-November/043446.html > > In any case, one issue with your suggested workaround is that it > requires knowing how much additional storage is needed, which may be > an expensive operation to determine. I've just tried implementing a > different approach, which is to define two new functions to either > disable or enable GC. The function to disable GC first invokes > R_gc_full() to shrink the heap as much as possible, then sets a flag. > Then in R_gc_internal(), I first check that flag, and if it is set, I > call AdjustHeapSize(size_needed) and exit immediately. > > These calls could be used to bracket any code section that expects to > make lots of calls to R's memory allocator. The down side is that > this approach requires that all paths out of such a code section > (including error handling) need to take care to unset the GC-disabled > flag. I think I would want to hear from someone on the R team about > whether they think this is a good idea. If .Call and .C re-enabled the GC on return from compiled code (and threw some sort of error) that would help contain the potential damage. You'd might also want to re-enable GC if malloc() returned NULL, rather than giving an out-of-memory error. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] rowsum
On Wed, Mar 30, 2011 at 1:56 AM, Terry Therneau wrote: > > with the entirely different rowSums, but it has been around > > for a long time.) > A lot longer than rowSums ... > > Bill Dunlap > > Spotfire, TIBCO Software > --- > This made me smile. The rowsums function was originally an internal > part of the survival package, used for fast computation of certain sums > when there is a cluster() statement. It was Statistical Sciences > (S-Plus) who moved it to global status. That is, they used it in enough > other places that they decided to speed it up, took over the > maintainance and ownership of the function (with my blessing), and > ceased to label it as part of "survival" in the manual. > This "metabug" can't be laid at R's feet. The same process happened independently in R. I ported the 'survival' version of rowsum() to R, added it to base R in version 0.63.1, and later it made it faster using hashing. So perhaps it isn't entirely StatSci's fault, although it's likely that R would eventually have added a rowsum() function for compatibility. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] names<- appears to copy 3 times?
On Tue, Jan 17, 2012 at 9:11 PM, Matthew Dowle wrote: > Hi, > > $ R --vanilla > R version 2.14.1 (2011-12-22) > Platform: i686-pc-linux-gnu (32-bit) >> DF = data.frame(a=1:3,b=4:6) >> DF > a b > 1 1 4 > 2 2 5 > 3 3 6 >> tracemem(DF) > [1] "<0x8898098>" >> names(DF)[2]="B" > tracemem[0x8898098 -> 0x8763e18]: > tracemem[0x8763e18 -> 0x8766be8]: > tracemem[0x8766be8 -> 0x8766b68]: >> DF > a B > 1 1 4 > 2 2 5 > 3 3 6 >> > > Are those 3 copies really taking place? > tracemem() isn't likely to give false positives. Since you're on Linux, you could check by running under gdb and setting a breakpoint on memtrace_report, which is the function that prints the message. That would show where the duplicates are happening. - thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] read.spss issues
On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms wrote: > The second problem is that the spss dataformat allows to specify > 'duplicate labels', whereas this is not allowed for factors. read.spss > does not deal with this and creates a bad factor > > x <- read.spss("http://www.stat.ucla.edu/~jeroen/spss/duplicate_labels.sav";, > use.value.labels=T); > levels(x$opinion); > > which causes issues downstream. I am not sure if this is an issue in > read.spss() or as.factor(), but I guess it might be wise to try to > detect duplicate levels and assign them all with one and the same > integer value when converting to a factor. I think this one would be better dealt with by giving an error. SPSS value labels are just labels, so they don't map very well onto R factors, which are enumerated types. Rather than force them and lose data, I would prefer to make the user decide what to do. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] portable parallel seeds project: request for critiques
On Sat, Feb 18, 2012 at 4:33 PM, Paul Johnson wrote: > On Fri, Feb 17, 2012 at 5:06 PM, Petr Savicky wrote: >> On Fri, Feb 17, 2012 at 02:57:26PM -0600, Paul Johnson wrote: >> Hi. >> >> Some of the random number generators allow as a seed a vector, >> not only a single number. This can simplify generating the seeds. >> There can be one seed for each of the 1000 runs and then, >> the rows of the seed matrix can be >> >> c(seed1, 1), c(seed1, 2), ... >> c(seed2, 1), c(seed2, 2), ... >> c(seed3, 1), c(seed3, 2), ... >> ... >> > Yes, I understand. > > The seed things I'm using are the 6 integer values from the L'Ecuyer. > If you run the example script, the verbose option causes some to print > out. The first 3 seeds in a saved project seeds file looks like: > >> projSeeds[[1]] > [[1]] > [1] 407 376488316 1939487821 1433925148 -1040698333 579503880 > [7] -624878918 > > [[2]] > [1] 407 -1107332181 854177397 1773099324 1774170776 -266687360 > [7] 816955059 > > [[3]] > [1] 407 936506900 -1924631332 -1380363206 2109234517 1585239833 > [7] -1559304513 > > The 407 in the first position is an integer R uses to note the type of > stream for which the seed is intended, in this case R'Lecuyer. > > > > > >> There could be even only one seed and the matrix can be generated as >> >> c(seed, 1, 1), c(seed, 1, 2), ... >> c(seed, 2, 1), c(seed, 2, 2), ... >> c(seed, 3, 1), c(seed, 3, 2), ... >> >> If the initialization using the vector c(seed, i, j) is done >> with a good quality hash function, the runs will be independent. >> > I don't have any formal proof that a "good quality hash function" > would truly create seeds from which independent streams will be drawn. That is essentially the *definition* of a good quality hash function, at least in the cryptographic sense. It maps the inputs into numbers that are indistinguishable from uniform random except that the same input always gives the same output. What's harder is to prove that you *have* a good quality hash function, but for these (non-adversarial) purposes even something like MD4 would be fine, and certainly the SHA family. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On Thu, Mar 29, 2012 at 3:30 AM, Gabor Grothendieck wrote: > 2012/3/28 Uwe Ligges : >> >> >> On 27.03.2012 20:33, Jeffrey Ryan wrote: >>> >>> Thanks Uwe for the clarification on what goes and what stays. >>> >>> Still fuzzy on the notion of "significant" though. Do you have an example >>> or two for the list? >> >> >> >> We have to look at those notes again and again in order to find if something >> important is noted, hence please always try to avoid all notes unless the >> effect is really intended! >> >> >> Consider the Note "No visible binding for global variable" >> We cannot know if your code intends to use such a global variable (which is >> undesirable in most cases), hence would let is pass if it seems to be >> sensible. >> >> Another Note such as "empty section" or "partial argument match" can quickly >> be fixed, hence just do it and don't waste our time. >> >> Best, >> Uwe Ligges > > What is the point of notes vs warnings if you have to get rid of both > of them? Furthermore, if there are notes that you don't have to get > rid of its not fair that package developers should have to waste their > time on things that are actually acceptable. Finally, it makes the > whole system arbitrary since packages can be rejected based on > undefined rules. > The "notes" are precisely the things for which clear rules can't be written. They are reported by CMD check because they are usually signs of coding errors, but are not warnings because their use is sometimes justified. The 'No visible binding for global variable" is a good example. This found some bugs in my 'survey' package, which I removed. There is still one note of this type, which arises when I have to handle two different versions of the hexbin package with different internal structures. The note is a false positive because the use is guarded by an if(), but CMD check can't tell this. So, it's a good idea to remove all Notes that can be removed without introducing other code problems, which is nearly all of them, but occasionally there may be a good reason for code that produces a Note. But if you want a simple, unambiguous, mechanical rule for *your* packages, just eliminate all Notes. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The constant part of the log-likelihood in StructTS
On Thu, May 3, 2012 at 3:36 AM, Mark Leeds wrote: > Hi Ravi: As far as I know ( well , really read ) and Bert et al can say > more , the AIC is not dependent on the models being nested as long as the > sample sizes used are the same when comparing. In some cases, say comparing > MA(2), AR(1), you have to be careful with sample size usage but there is no > nesting requirement for AIC atleast, I'm pretty sure. This is only partly true. The expected value of the AIC will behave correctly even if models are non-nested, but there is no general guarantee that the standard deviation is small, so AIC need not even asymptotically lead to optimal model choice for prediction in arbitrary non-nested models. Having said that, 'nearly' nested models like these are probably ok. I believe it's sufficient that all your models are nested in a common model, with a bound on the degree of freedom difference, but my copy of Claeskens & Hjort's book on model selection and model averaging is currently with a student so I can't be definitive. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] confusion over S3/S4 importing
On Mon, Jul 30, 2012 at 7:12 AM, Ben Bolker wrote: > > Can anyone help me figure out the right way to import a method that is > defined as S3 in one package and S4 in another? > > Specifically: > > profile() is defined as an S3 method in the stats package: > > function (fitted, ...) > UseMethod("profile") > > > > In stats4 it is defined as an S4 method: > > stats4:::profile > standardGeneric for "profile" defined from package "stats" > > function (fitted, ...) > standardGeneric("profile") > > >In the bbmle package I want to define it as an S4 method: > > setMethod("profile", "mle2", function (...) {...}) > > In the NAMESPACE file for bbmle I have > > importFrom(stats4,profile) > > However, in R 2.14.2 (which is I am currently trying to fix: I would > like to try to allow for backward compatibility), I get > > library(bbmle) > example(mle2) ## to create some fitted objects > profile(fit) > Error in UseMethod("profile") : > no applicable method for 'profile' applied to an object of class "mle2" > > whereas stats4::profile(fit) works fine. > > -- a similar issue occurs for coef. > > I would have thought that importFrom() would allow package code to > find the 'stats4' version of profile (and coef) before it found the > 'stats' versions , but that doesn't seem to be the case. > > Just for kicks I tried > > import(stats4) > > which leads to warning messages but no improvements. > > If I add an explicit Depends: on stats4 that fixes the problem. I may be > forced to do that, but I thought it was supposed to be a last resort and > that I was *supposed* to be able to fix my problems by proper use of > imports. 'Imports' won't be enough -- the whole point of a generic is that it's visible to the user, which doesn't happen with imports. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] error on uneven recycling?
Is there some reason why > (1:2)+(1:3) [1] 2 4 4 Warning message: In (1:2) + (1:3) : longer object length is not a multiple of shorter object length can't be made into an error? I realise it was there in S-PLUS, but since it produces a warning there can't be many examples on CRAN or Bioconductor using it, and I can't think of any situation where it would be used deliberately. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to build a list with missing values? What is missing, anyway?
On Fri, Oct 5, 2012 at 4:07 PM, Bert Gunter wrote: > On Thu, Oct 4, 2012 at 6:31 PM, Peter Meilstrup >> explain some things, such as under what circumstances one would get a >> "`...` used in incorrect context" error. > > How could it possibly know that? > > -- Bert By looking at the code that throws the error. The code is in src/main/eval.c, and the error is thrown in several places when the symbol ... appears in a call being evaluated, and the value of the symbol is neither the special DOTSXP type nor NULL. That is, if you use ... in code and there isn't a ... formal argument in scope. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] setting option in function
old_options <- options(na.action=na.fail) on.exit(options(old_options)) You can also use this to define a wrapper that executes an expression using special options withOptions<-function(optlist,expr){ oldopt<-options(optlist) on.exit(options(oldopt)) expr<-substitute(expr) eval.parent(expr) } -thomas On Sat, Oct 20, 2012 at 10:35 AM, Charles Geyer wrote: > is it possible to set an option inside a function ((I want to set > na.action = na.fail) and have the previous state restored if there is > an error so that the function doesn't change the option behind the > user's back? > > Sorry if this has been answered before, but this subject is hard to Google. > > -- > Charles Geyer > Professor, School of Statistics > University of Minnesota > char...@stat.umn.edu > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Crash - cause 'memory not mapped'
On Sat, Nov 10, 2012 at 12:33 PM, Peter Langfelder < peter.langfel...@gmail.com> wrote: > On Fri, Nov 9, 2012 at 3:05 PM, gianluca mastrantonio > wrote: > > the error "memory non mapped" happen also if i use > > void FiltroGaus(SEXP size1_r, SEXP size2_r, SEXP sigma_r) > > instead > > SEXP FiltroGaus(SEXP size1_r, SEXP size2_r, SEXP sigma_r) { > > > > If you use a .Call on a function, the function must return a SEXP, it > cannot be have a void return value. > > And if one uses a reasonable warning level on the compiler, there should be a warning, such as gcc's "control reaches end of a non-void function" Also .C() doesn't pass SEXPs, so the declaration of the function is still wrong for .C(), though this one the compiler won't notice. --thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] What is the INDEX file for?
On Fri, Nov 9, 2012 at 10:02 PM, Christophe Genolini wrote: > Hi the list, > > In WRE (or in Rdindex), we can find how the INDEX is make, how to change > it, but I do not manage to find the purpose of this file. So what is the > INDEX file for? > > It's an index, or more precisely a table of contents for the package. The INDEX is part of what is displayed by help(package="pkgname"). It was more important before package help pages and vignettes, especially for non-GUI systems when it could be quite hard to work out which function help pages to look at in an unfamiliar package. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] as.matrix.Surv -- R core question/opinions
I agree that it's cleaner to remove the extra attributes -- that's the point of as.matrix.Surv(), to produce a matrix that doesn't include any extra details of how Surv objects are implemented. -thomas On Fri, Dec 7, 2012 at 3:48 AM, Terry Therneau wrote: > 1. A Surv object is a matrix with some extra attributes. The > as.matrix.Surv function removes the extras but otherwise leaves it as is. > > 2. The last several versions of the survival library were accidentally > missing the S3method('as.matrix', 'Surv') line from their NAMESPACE file. > (Instead it's position is held by a duplicate of the line just above it in > the NAMESPACE file, suggesting a copy/paste error). As a consequence the > as.matrix.Surv function was effectively ignored, and the default method was > being used. >The as.matrix.default function leaves anything with a "dim" attribute > alone. > > 3. In my current about-to-submit-to-CRAN version of survival the missing > NAMESPACE line was restored. This breaks one function in one package (rms) > which calls "as.matrix(y)" on a Surv object but then later looks at the > "type" attribute of y. > > So now to the design question: should the as.matrix.Surv function > "sanitize" the result by removing the extra attributes, or should it leave > them alone? The first seems cleaner; my accidental multi-year test of > leaving them in, however, clearly shows that it does no harm. > > Terry T. > > __**____ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel> > -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Understanding svd usage and its necessity in generalized inverse calculation
On Thu, Dec 6, 2012 at 12:58 PM, Paul Johnson wrote: > Dear R-devel: > > I could use some advice about matrix calculations and steps that might > make for faster computation of generalized inverses. It appears in > some projects there is a bottleneck at the use of svd in calculation > of generalized inverses. > > Here's some Rprof output I need to understand. > > > summaryRprof("Amelia.out") > $by.self > self.time self.pct total.time total.pct > "La.svd"150.3427.66 164.82 30.32 > > [snip] > I *Think* this means that a bottlleneck here is svd, which is being > A bottleneck to a limited extent -- it's still only 30% of computation time, so speeding it up can't save more than that > called by this function that calculates generalized inverses: > > ## Moore-Penrose Inverse function (aka Generalized Inverse) > ## X:symmetric matrix > ## tol: convergence requirement > mpinv <- function(X, tol = sqrt(.Machine$double.eps)) { > s <- svd(X) > e <- s$d > e[e > tol] <- 1/e[e > tol] > s$v %*% diag(e,nrow=length(e)) %*% t(s$u) > } > > That is from the Amerlia package, which we like to use very much. > > Basically, I wonder if I should use a customized generalized inverse > or svd calculator to make this faster. > > If your matrix is produced as a covariance matrix, so that it really is symmetric positive semidefinite up to rounding error, my understanding is that the Cholesky approach is stable in the sense that it will reliably produce an accurate inverse or a zero-to-within-tolerance pivot and error message. Now, if most of the matrices you are trying to invert are actually invertible (as I would hope), it may be quicker to use the Cholesky approach will a fallback to the SVD for semidefinite matrices. That is, something like tryCatch( chol2inv(chol(xx)), error=function(e) ginv(xx) ) Most of the time you will get the Cholesky branch, which is much faster (about five-fold for 10x10 matrices on my system). On my system and using a 10x10 matrix the overhead in the tryCatch() is much smaller than the time taken by either set of linear algebra, so there is a net gain as long as even a reasonably minority of the matrices are actually invertible. You can probably do slightly better by replacing the chol2inv() with backsolve(): solving just the systems of linear equations you need is usually preferable to constructing a matrix inverse. Note that this approach will give wrong answers without warning if the matrices are not symmetric. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R-forge, package dependencies
I have a project on R-forge (sqlsurvey.r-forge.r-project.org) with two packages, RMonetDB and sqlsurvey. At the moment, sqlsurvey is listed as failing to build. The error is on the Linux package check, which says that RMonetDB is not available: * checking package dependencies ... ERROR Package required but not available: RMonetDB RMonetDB has built successfully: r-forge lists its status as 'current', with Linux, Windows, and Mac packages available for download. The package check for sqlsurvey on Windows and Mac finds RMonetDB without any problems, it's just on Linux that it appears to be unavailable. Any suggestions for how to fix this? I've tried uploading a new version of RMonetDB, but the situation didn't change: it built successfully, but the Linux check of sqlsurvey still couldn't find it. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] stopping finalizers
Is there some way to prevent finalizers running during a section of code? I have a package that includes R objects linked to database tables. To maintain the call-by-value semantics, tables are copied rather than modified, and the extra tables are removed by finalizers during garbage collection. However, if the garbage collection occurs in the middle of processing another SQL query (which is relatively likely, since that's where the memory allocations are) there are problems with the database interface. Since the guarantees for the finalizer are "at most once, not before the object is out of scope" it seems harmless to be able to prevent finalizers from running during a particular code block, but I can't see any way to do it. Suggestions? -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopping finalizers
Luke, We're actually adopting the first of your generic approaches. As a more concrete description: There are R objects representing survey data sets, with the data stored in a database table. The subset() method, when applied to these objects, creates a new table indicating which rows of the data table are in the subset -- we don't modify the original table, because that breaks the call-by-value semantics. When the subset object in R goes out of scope, we need to delete the extra database table. I have been doing this with a finalizer on an environment that's part of the subset object in R. This all worked fine with JDBC, but the native database interface requires that all communications with the database come in send/receive pairs. Since R is single-threaded, this would normally not be any issue. However, since garbage collection can happen at any time, it is possible that the send part of the finalizer query "drop table _sbs_whatever" comes between the send and receive of some other query, and the database connection then falls over. So, I'm happy for the finalizer to run at any time except during a small critical section of R code. In this particular case the finalizer only issues "drop table" queries, and it doesn't need to know if they succeed, so we can keep a lock in the database connection and just store any "drop table" queries that arrive during a database operation for later execution. More generally, though, the fact that no R operation is atomic with respect to garbage collection seems to make it a bit difficult to use finalizers -- if you need a finalizer, it will often be in order to access and free some external resource, which is when the race conditions can matter. What I was envisaging was something like without_gc(expr) to evaluate expr with the memory manager set to allocate memory (or attempt to do so) without garbage collection. Even better would be if gc could run, but weak references were temporarily treated as strong so that garbage without finalizers would be collected but finalizers didn't get triggered. Using this facility would be inefficient, because it would allocate more memory than necessary and would also mess with the tuning of the garbage collector, but when communicating with other programs it seems it would be very useful to have some way of running an R code block and knowing that no other R code block would run during it (user interrupts are another issue, but they can be caught, and in any case I'm happy to fail when the user presses CTRL-C). -thomas On Fri, Feb 15, 2013 at 12:53 AM, wrote: > It might help if you could be more specific about what the issue is -- > if they are out of scope why does it matter whether the finalizers > run? > > Generically two approaches I can think of: > > you keep track of whenit is safe to fully run your finalizers and have > your finalizers put the objects on a linked list if it isn't safe to > run the finalizer now and clear the list each time you make a new one > > keep track of your objects with a weak list andturn them into strong > references before your calls, then drop the list after. > > I'm pretty sure we don't have a mechanism for temporarily suspending > running the finalizers but it is probably fairly easy to add if that > is the only option. > > I might be able to think of other options with more details on the > issue. > > Best, > > luke > > > On Tue, 12 Feb 2013, Thomas Lumley wrote: > > Is there some way to prevent finalizers running during a section of code? >> >> I have a package that includes R objects linked to database tables. To >> maintain the call-by-value semantics, tables are copied rather than >> modified, and the extra tables are removed by finalizers during garbage >> collection. >> >> However, if the garbage collection occurs in the middle of processing >> another SQL query (which is relatively likely, since that's where the >> memory allocations are) there are problems with the database interface. >> >> Since the guarantees for the finalizer are "at most once, not before the >> object is out of scope" it seems harmless to be able to prevent finalizers >> from running during a particular code block, but I can't see any way to do >> it. >> >> Suggestions? >> >>-thomas >> >> >> >> > -- > Luke Tierney > Chair, Statistics and Actuarial Science > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics andFax: 319-335-3017 >Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopping finalizers
On Fri, Feb 15, 2013 at 3:35 PM, Hadley Wickham wrote: > > There are R objects representing survey data sets, with the data stored > in > > a database table. The subset() method, when applied to these objects, > > creates a new table indicating which rows of the data table are in the > > subset -- we don't modify the original table, because that breaks the > > call-by-value semantics. When the subset object in R goes out of scope, > we > > need to delete the extra database table. > > Isn't subset slightly too early to do this? It would be slightly more > efficient for subset to return an object that creates the table when > you first attempt to modify it. > The subset table isn't a copy of the subset, it contains the unique key and an indicator column showing whether the element is in the subset. I need this even if the subset is never modified, so that I can join it to the main table and use it in SQL 'where' conditions to get computations for the right subset of the data. The whole point of this new sqlsurvey package is that most of the aggregation operations happen in the database rather than in R, which is faster for very large data tables. The use case is things like the American Community Survey and the Nationwide Emergency Department Subsample, with millions or tens of millions of records and quite a lot of variables. At this scale, loading stuff into memory isn't feasible on commodity desktops and laptops, and even on computers with enough memory, the database (MonetDB) is faster. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopping finalizers
On Sat, Feb 16, 2013 at 2:32 PM, Hadley Wickham wrote: > > The subset table isn't a copy of the subset, it contains the unique key > and > > an indicator column showing whether the element is in the subset. I need > > this even if the subset is never modified, so that I can join it to the > main > > table and use it in SQL 'where' conditions to get computations for the > right > > subset of the data. > > Cool - Is that faster than storing a column that just contains the > include indices? > I haven't tried -- I'm writing this so it doesn't modify database tables, partly for safety and partly to reduce the privileges it needs to run. > > The whole point of this new sqlsurvey package is that most of the > > aggregation operations happen in the database rather than in R, which is > > faster for very large data tables. The use case is things like the > American > > Community Survey and the Nationwide Emergency Department Subsample, with > > millions or tens of millions of records and quite a lot of variables. At > > this scale, loading stuff into memory isn't feasible on commodity > desktops > > and laptops, and even on computers with enough memory, the database > > (MonetDB) is faster. > > Have you done any comparisons of monetdb vs sqlite - I'm interested to > know how much faster it is. I'm working on a package > (https://github.com/hadley/dplyr) that compiles R data manipulation > expressions into (e.g. SQL), and have been wondering if it's worth > considering a column-store like monetdb. It's enormously faster than SQLite for databases slightly larger than physical memory. I don't have measurements, but I started this project using SQLite and it just wasn't fast enough to be worthwhile. My guess is that for the JOIN and SELECT SUM() ... GROUP BY operations I'm using it's perhaps ten times faster. For moderate-sized databases it's competitive with analysis in memory even if you ignore the data loading time. For example, using a data set already in memory, with 18000 records and 96 variables: > system.time(svymean(~BPXSAR+BPXDAR,subset(dhanes,RIAGENDR==2))) user system elapsed 0.090.010.10 Using MonetDB > system.time(svymean(~bpxsar+bpxdar,se=TRUE,subset(sqhanes,riagendr==2))) user system elapsed 0.020 0.001 0.108 -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] S3 generics in packages: default method
For a package providing 'virtual' data.frames backed by MonetDB database tables, we want to make some functions generic, to implement versions that work on single or multiple database columns. My approach was sd <- function(x, na.rm=TRUE,...) UseMethod("sd") sd.default<- base::sd I've done that before to make dotchart() generic in the 'survey' package. With dotchart() (and even sd()) there still doesn't seem to be a problem (based on the CRAN check results), but with var() we get a note Found .Internal call in the following function: var.default with calls to .Internal functions cov because base::var contains calls to .Internal. These seems harmless to me -- the package is only calling .Internal() through base functions in usual way. We could use var.default<-function(x, y=NULL, na.rm=TRUE, ...) base::var(x,y,na.rm,...) but it seems that could have different lazy-evaluation behaviour. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] nobs() with glm(family="poisson")
On Thu, Feb 28, 2013 at 11:56 AM, Ravi Varadhan wrote: > This is getting further away from typical R-devel issues, but let me add > another perspective: the `n' in BIC reflects the rate at which the > information in the log-likelihood grows. > But in the derivation of BIC, the log(n) term is kept and various terms of order 1 are discarded. What we're arguing about is one of the O(1) terms. If it makes an important difference then presumably we should also worry about the other O(1) terms that got discarded. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Registering method for "t.test"
On Thu, Mar 14, 2013 at 3:31 AM, Lukas Lehnert wrote: > Dear R-developers, > > in a new package, I'm currently working on, I tried to add a new method to > function "t.test". The new class, on which the method is based, is called > "speclib", so I defined the funcion for the new method as "t.test.speclib". > You shouldn't have a problem if you tell R it is a method for t.test. In the NAMESPACE, use S3method(t.test, speclib) rather than exporting the function by name, and in the usage section of the help page use \method{t.test}{speclib} > R CMD check now gives always a warning, that S3 methods are not consistent, > because R does not recognize that the basic function name is "t.test" > instead > of "t": > > t: > function(x) > t.test.speclib: > function(x, y, paired, ...) > > > Is there any workaround or do I have to rename the t.test.speclib function > to > something like t_test.speclib? > > Thank you in advance > > Lukas > [[alternative HTML version deleted]] > > ______ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
On Fri, 29 May 2009, Jason Vertrees wrote: My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: > d = data.frame(a=rnorm(10), b=rnorm(10)); > typeof(d);# OK; > typeof(d[,1]);# Unexpected; > typeof(d[,1,drop=F]); # Oh, now I see. It does make it harder for programmers, but it makes it easier for non-programmers. In particular, it is convenient to be able to do d[1,1] to extract a number from a matrix, rather than having to explicitly coerce the result to stop it being a matrix. At least the last two times this was discussed, there ended up being a reasonable level of agreement that if someone's life had to be made harder the programmers were better able to cope and that dropping dimensions was preferable. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why change data type when dropping to one-dimension?
On Fri, 29 May 2009, Stavros Macrakis wrote: This is another example of the general preference of the designers of R for convenience over consistency. In my opinion, this is a design flaw even for non-programmers, because I find that inconsistencies make the system harder to learn. Yes, the naive user may stumble over the difference between m[[1,1]] and m[1,1] a few times before getting it, but once he or she understands the principle, it is general. I was on your side of this argument the first time it came up, but ended up being convinced the other way. In contrast to sample(n) or the non-standard evaluation of weights= and subset= arguments to modelling functions, or various other conveniences that I think we are stuck with despite them being a bad idea, I think dropping dimensions is useful. -thomas -s On Fri, May 29, 2009 at 5:33 PM, Jason Vertrees wrote: Hello, First, let me say I'm an avid fan of R--it's incredibly powerful and I use it all the time. I appreciate all the hard work that the many developers have undergone. My question is: why does the paradigm of changing the type of a 1D return value to an unlisted array exist? This introduces boundary conditions where none need exist, thus making the coding harder and confusing. For example, consider: > d = data.frame(a=rnorm(10), b=rnorm(10)); > typeof(d); # OK; > typeof(d[,1]); # Unexpected; > typeof(d[,1,drop=F]); # Oh, now I see. This is indeed documented in the R Language specification, but why is it there in the first place? It doesn't make sense to the average programmer to change the return type based on dimension. Here it is again in 'sapply': > sapply > function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) > { > [...snip...] >if (common.len == 1) >unlist(answer, recursive = FALSE) >else if (common.len > 1) >array(unlist(answer, recursive = FALSE), > dim = c(common.len, >length(X)), dimnames = if (!(is.null(n1 <- > names(answer[[1]])) & >is.null(n2 <- names(answer >list(n1, n2)) > [...snip...] > } So, in 'sapply', if your return value is one-dimensional be careful, because the return type will not the be same as if it were otherwise. Is this legacy or a valid, rational design decision which I'm not yet a sophisticated enough R coder to enjoy? Thanks, -- Jason -- Jason Vertrees, PhD Dartmouth College : j...@cs.dartmouth.edu Boston University : jas...@bu.edu PyMOLWiki : http://www.pymolwiki.org/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] ______ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wishlist: install.packages to look for the newest version (PR#13852)
Uwe, I think Ulrike is making a different suggestion, that install.packages() should fetch the binary that has been built for the current version of R. This would be a bad idea for a different reason -- in general it is not possible to be sure that the package works with an older version of R. The R version dependency isn't enough for two reasons. The first is that the author may well not know that the package fails with an older version of R and so would not have listed a dependency. The second is that the binary versions may be incompatible even if the source versions are compatible. You can always download a binary package from CRAN in a browser and use the option to install from a local zip file. Or, as Uwe suggests, get a new version of R. What I think might be useful if it's not too difficult is a warning from install.packages() that a newer version of the package you were installing is available for the current version of R. -thomas On Fri, 24 Jul 2009 lig...@statistik.tu-dortmund.de wrote: Ulrike, if you install from source, you always get the most recent version of the package given it does not depend on a newer version of R. If you want a binary package, you also get the newest version - that was newest at the time we stopped building binaries for that version of R. We (or better I if we only talk about Windows, but similar for all other platforms) cannot build for each R version any more. In that case we'd have to build even 11 binary versions for Windows just for the R-2.x.y series now. Binary repositories are fixed at some time (for Windows once the first patchlevel release of the next R version is out, e.g. at the time of the R-2.9.1 release the binary builds for R-2.8.x had been stopped). So please upgrade your version of R or compile yourself from sources for the R version you need the particular package for. Best wishes, Uwe Ligges groemp...@bht-berlin.de wrote: Full_Name: Ulrike Groemping Version: 2.9.0 (and older) OS: Windows Submission from: (NULL) (84.190.173.190) When using an older version of R, packages are not found although they are available for newer versions of R and do work when installed with the old version. For example, installing DoE.base on R 2.8.1 installs version 0.2, while CRAN is at version 0.4-1 currently. It would be nice if the install process would per default look for the newest version of the package and install this one if its R-version request allows this. (I recently found a help list entry by Uwe Ligges that explains how to manually install from a repository for a newer CRAN version, but I did not bookmark it and cannot find it any more. The documentation does not enlighten me there.) Regards, Ulrike __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Basic Question regarding PROTECT
On Mon, 24 Aug 2009, Saptarshi Guha wrote: > Thank you. So the reason I wouldnt need to protect y had I returned to > R, is because had i had done something like > > h<-.Call("boo",a) > where "boo" contains y=foo() > > the assignment "<-" to h would have a PROTECT somewhere, i.e R's > assignment is doing the protection for me. > Had I not returned to R, I would have to do it myself. > > Yes PROTECT is quite cheap, I had thought it to be costly but 1MM > PROTECT/UNPROTECT calls, takes <1 second (on a macbook 2.16ghz) That's not where the cost would be. PROTECT/UNPROTECT calls themselves are very cheap, since they just push and pop pointers from a stack. Any real impact of different strategies in using PROTECT would be seen in the garbage collector. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] optional package dependency
On Fri, 15 Jan 2010, Seth Falcon wrote: There is a real need (of some kind) here. Not all packages work on all platforms. For example, the multicore package provides a mechanism for running parallel computations on a multi-cpu box, but it is not available on Windows. A package that _is_ available on all platforms should be able to optionally make use of multicore on non-Windows. I don't think there is a way to do that now and pass check without resorting to "tricks" as above. These tricks are bad as they make it harder to programmatically determine the true "suggests". And NAMESPACE brings up another issue in that being able to do conditional imports would be very useful for these cases, otherwise you simply can't make proper use of name spaces for any optional functionality. I'm willing to help work on and test a solution if we can arrive at some consensus as to what the solution looks like. Seth, In the case of multicore it seems to work to put it in 'Suggests' and to use require() to load it. That's what I did with the survey package, and it didn't cause problems on CRAN. I didn't run CMD check on Windows myself, only on Mac and Linux. A more difficult issue is providing methods for a generic in another package that might not be available. I wanted to provide methods on survey objects for generics in odfWeave, and I couldn't find out how to do that without making it required. I ended up creating a new odfWeave.survey package that depends on odfWeave and survey, but this seems like the sort of thing that should be able to be done with Enhances or Suggests. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Copyright versus Licenses
On Mon, 18 Jan 2010, Bryan McLellan wrote: My company recently started using a R library from RCRAN that is licensed under the LGPL Version 2 or greater per the DESCRIPTION file, but contains no copy of the LGPL notice, or any copyright notice. I've grown accustomed to paying attention to copyright and licensing as a Debian package maintainer, and sent the author of the package an email expressing my concern. The author believed that assigning themselves copyright was incompatible with licensing the library under the terms of the LGPL. I disagree, and further contend that without copyright notice, the [copyright] license loses a certain degree of applicability, as it becomes inconclusive as to who is licensing the software under the terms of the LGPL. Not knowing who I was, the library author asked me to start a discussion of the subject on this list, presumably so they could see the opinions of others that they trust. Like Simon, I'm speaking only for myself here. Not having an explicit copyright statement certainly makes it harder for people downstream who have questions, so explicit statements seem like a good thing. The LGPL itself [1], in the final section entitled "How to Apply These Terms to Your New Libraries", the license provides a template for adding to the top of each source code file that contains a copyright line, a general notice regarding the lack of warranty, and information on where to obtain a full copy of the license. The GPL HOWTO [2] expresses similar instructions for the inclusion of a copyright line with the license. I know that R distributes copies of common licenses under 'share/licenses' in the R source. Debian does as well in '/usr/share/common-licenses/' for the purpose of not having to include the full LICENSE and/or COPYING file with every package that uses a common open source license, allowing the use of verbage such as "The Debian packaging is © 2010 [author] and is licensed under the Apache License version 2.0. On debian and derived systems, see `/usr/share/common-licenses/Apache-2.0' for the complete text." The R manual for writing extensions suggests a similar approach to avoiding duplication in Section 1.1 [3]. The R manual for writing extensions also mentions [4] in Section 1.1.1 the optional use of a Copyright field in the DESCRIPTION file, separate from the License field. As this section implies that the DESCRIPTION file format is based on the debian control file format, I assume the goal is to keep these lines simple, generally under 80 characters do to average terminal width. As such, I don't assume this field is recommended for complete copyright information for a library with multiple contributors. The aforementioned article does not specify where a developer should alternately put copyright information, perhaps assuming one would add it to each source code file as is recommended by GNU. Yes, in the source code is good. In addition, if you look at the source code for R, the approach there has been to have a file COPYRIGHTS that lists the original copyright holders of bits of the code. This seems like a good idea if there are multiple sources. It's especially useful if there are areas of doubt and uncertainty (eg code published on ftp sites before people worried about licenses), since it at least gives you rigidly defined areas of doubt and uncertainty[5] In closing, do the R developers believe that including a Copyright notice is imperative with a Copyright License? If so, what advice do they have for those writing and contributing open source R libraries as to where this notice should go? Should that information perhaps be added to the R manual for extensions? Bryan McLellan [1] http://www.gnu.org/licenses/lgpl-2.1.txt [2] http://www.gnu.org/licenses/gpl-howto.html [3] http://cran.r-project.org/doc/manuals/R-exts.html#Package-structure [4] http://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle [5] Adams D, (1978) Hitchhiker's Guide. BBC Radio. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] AD: advanced R course - June 28-30, Seattle.
The Summer Institute in Statistical Genetics at the University of Washington will include a 2.5 day course in Advanced R Programming, on June 28-30. The course will cover object-oriented programming, SQL and netCDF data access, calling compiled code, creating R packages, and using some of the software infrastructure from the Bioconductor project. The instructors will be Thomas Lumley and Ken Rice. Further details on this and the other 21 modules at the Summer Institute are available from http://sisg.biostat.washington.edu/. I will just note that this is the University of Washington, in Seattle (not DC, or St Louis) and that June 28-30 is at the beginning of Seattle's non-rainy season -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Package Directory Hierarchy: Recursive inclusion of *.R possible?
On Thu, 4 Feb 2010, Barry Rowlingson wrote: On Wed, Feb 3, 2010 at 10:40 AM, Johannes Graumann wrote: Hello, I would like to organize the "R" directory in my home-grown package into sub-directories, but "R CMD --build" doesn't seem to find *.R files below the actual source directory. Is there any way around that? Not really. You could always manage your files in a separate directory tree and then use a Makefile to put them into the package format. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why is there no c.factor?
On Thu, 4 Feb 2010, Hadley Wickham wrote: Hi all, Is there are reason that there is no c.factor method? Analogous to c.Date, I'd expect something like the following to be useful: c.factor <- function(...) { factors <- list(...) levels <- unique(unlist(lapply(factors, levels))) char <- unlist(lapply(factors, as.character)) factor(char, levels = levels) } c(factor("a"), factor("b"), factor(c("c", "b","a")), factor("d")) # [1] a b c b a d # Levels: a b c d It's well established that different people have different views on what factors should do, but this doesn't match mine. I think of factors as enumerated data types where the factor levels already specify all the valid values for the factor, so I wouldn't want to be able to combine two factors with different sets of levels. For example: A <- factor("orange",levels=c("orange","yellow","red","purple")) B <- factor("orange", levels=c("orange","apple","mango", "banananana")) On the other hand, I think the current behaviour, which reduces them to numbers, is just wrong. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Is it valid to do x == Inf?
On Thu, 1 Apr 2010, Henrik Bengtsson wrote: Hi, I found in a bit of code the following test for infinity: if (x == Inf) ... Is that valid Yes, if you don't want to also include -Inf , or should it be (as I always thought): if (is.infinite(x)) ...? If you don't want to distinguish Inf and -Inf Does it depend on whether 'x' is float or integer? There isn't an integer infinity. Integer values larger than the maximum reprensentable give NA eg > .Machine$integer.max+1L [1] NA My question is related to testing for missing values where is.na(x) is required. NA is different, because NA by its nature can't compare equal to anything: x==NA asks: "Is x equal to some number I don't know?", to which the answer is "Don't know". x==Inf asks "Is x positive infinite?", which is a perfectly well-defined question. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel