[Rd] (PR#8192) [ subscripting sometimes loses names
This (tangential) discussion really should be a separate thread so I changed the subject line above. On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote: > Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame > >My boss was debugging an issue in our R code. We have our own "[" > >functions, because stock R drops names when subscripting. > > ... if you tell it to do so, yes. If you tell it to not do that, it > won't ... ever tried drop=FALSE ? Simon, no, the drop=FALSE argument has nothing to do with what Christian was talking about. The kind of thing he meant is PR# 8192, "Subject: [ subscripting sometimes loses names": http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 In R, subscripting with "[" USUALLY retains names, but R has various edge cases where it (IMNSHO) inappropriately discards them. This occurs with both .Primitive("[") and "[.data.frame". This has been known for years, but I have not yet tried digging into R's implementation to see where and how the names are actually getting lost. Incidentally, versions of S-Plus since approximately S-Plus 6.0 back in 2001 show similar buggy edge case behavior. Older versions of S-Plus, c. S-Plus 3.3 and earlier, had the correct, name preserving behavior. I presume that the original Bell Labs S had correct name-preserving behavior, and then the S-Plus developers broke it sometime along the way. -- Andrew Piskorski http://www.piskorski.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8192) [ subscripting sometimes loses names
On 31/01/2009 7:31 AM, Andrew Piskorski wrote: This (tangential) discussion really should be a separate thread so I changed the subject line above. On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote: Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame My boss was debugging an issue in our R code. We have our own "[" functions, because stock R drops names when subscripting. ... if you tell it to do so, yes. If you tell it to not do that, it won't ... ever tried drop=FALSE ? Simon, no, the drop=FALSE argument has nothing to do with what Christian was talking about. The kind of thing he meant is PR# 8192, "Subject: [ subscripting sometimes loses names": http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 In that bug report you were asked to provide simple examples, and you didn't. I imagine that's why there was no action on it. It is not that easy for someone else to actually find the simple example that led you to print $vec.1 BAD $vec.1[[1]] $vec.1[[2]] ac a c no 13 NA 1 3 NA I just tracked this one down, and can put together this simple example: > (1:3)["no"] [1] NA where I think you would want the name "no" attached to the output. (Or maybe your more complicated example is wanted? You don't explain.) But that looks like documented behaviour to me: according to my reading of "Indexing by vectors" in the R Language Definition manual, it should give the same answer as (1:3)[4], and it does. So it's not a bug, but a wishlist item. And the other two cases where you list "BAD" behaviour? I didn't track them down. I know you spent a lot of time putting together that bug report; it seems a shame that it is being ignored because you put in too much: you really should simplify it as you were asked to do. Duncan Murdoch In R, subscripting with "[" USUALLY retains names, but R has various edge cases where it (IMNSHO) inappropriately discards them. This occurs with both .Primitive("[") and "[.data.frame". This has been known for years, but I have not yet tried digging into R's implementation to see where and how the names are actually getting lost. Incidentally, versions of S-Plus since approximately S-Plus 6.0 back in 2001 show similar buggy edge case behavior. Older versions of S-Plus, c. S-Plus 3.3 and earlier, had the correct, name preserving behavior. I presume that the original Bell Labs S had correct name-preserving behavior, and then the S-Plus developers broke it sometime along the way. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
Hi Dirk, * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote: > Turns out, as so often, that there was a regular bug lurking which is now > fixed in RDieHarder 0.1.1. But I still would like to understand exactly what > is different so that --slave was able to trigger it when --vanilla, > --no-save, ... did not. > > [ The library() vs require() issue may have been a red herring. ] Without telling us any details about the nature of the bug you found, it is difficult to speculate. If the bug was in your C code and memory related, it could simply be that the two different run paths resulted in different allocation patterns, one of which triggered the bug. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8192) [ subscripting sometimes loses names
Duncan Murdoch wrote: On 31/01/2009 7:31 AM, Andrew Piskorski wrote: This (tangential) discussion really should be a separate thread so I changed the subject line above. On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote: Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame My boss was debugging an issue in our R code. We have our own "[" functions, because stock R drops names when subscripting. ... if you tell it to do so, yes. If you tell it to not do that, it won't ... ever tried drop=FALSE ? Simon, no, the drop=FALSE argument has nothing to do with what Christian was talking about. The kind of thing he meant is PR# 8192, "Subject: [ subscripting sometimes loses names": http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 In that bug report you were asked to provide simple examples, and you didn't. I imagine that's why there was no action on it. It is not that easy for someone else to actually find the simple example that led you to print $vec.1 BAD $vec.1[[1]] $vec.1[[2]] ac a c no 13 NA 1 3 NA I just tracked this one down, and can put together this simple example: > (1:3)["no"] [1] NA where I think you would want the name "no" attached to the output. (Or maybe your more complicated example is wanted? You don't explain.) But that looks like documented behaviour to me: according to my reading of "Indexing by vectors" in the R Language Definition manual, it should give the same answer as (1:3)[4], and it does. So it's not a bug, but a wishlist item. And the other two cases where you list "BAD" behaviour? I didn't track them down. I did, and they boil down to variations of > data.frame(val=1:3,row.names=letters[1:3])[,1] [1] 1 2 3 but it's not obvious that the result should be named using the row.names and (in particular) whether or why it should differ from .[[1]] and $val. Given that for most purposes, extracting the relevant names would just be unnecessary red tape, I'd say that we can do without it. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
Hi Seth, Thanks for the follow-up. On 31 January 2009 at 06:59, Seth Falcon wrote: | * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote: | > Turns out, as so often, that there was a regular bug lurking which is now | > fixed in RDieHarder 0.1.1. But I still would like to understand exactly what | > is different so that --slave was able to trigger it when --vanilla, | > --no-save, ... did not. | > | > [ The library() vs require() issue may have been a red herring. ] | | Without telling us any details about the nature of the bug you found, | it is difficult to speculate. If the bug was in your C code and | memory related, it could simply be that the two different run paths | resulted in different allocation patterns, one of which triggered the | bug. Yes yes and yes :) It was in C, and it was memory related and it dealt getting results out of the library to which the package interfaces. But short of looking at the source, is there any documentation on what --slave does differently? Dirk -- Three out of two people have difficulties with fractions. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
Dirk Eddelbuettel wrote: Hi Seth, Thanks for the follow-up. On 31 January 2009 at 06:59, Seth Falcon wrote: | * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote: | > Turns out, as so often, that there was a regular bug lurking which is now | > fixed in RDieHarder 0.1.1. But I still would like to understand exactly what | > is different so that --slave was able to trigger it when --vanilla, | > --no-save, ... did not. | > | > [ The library() vs require() issue may have been a red herring. ] | | Without telling us any details about the nature of the bug you found, | it is difficult to speculate. If the bug was in your C code and | memory related, it could simply be that the two different run paths | resulted in different allocation patterns, one of which triggered the | bug. Yes yes and yes :) It was in C, and it was memory related and it dealt getting results out of the library to which the package interfaces. But short of looking at the source, is there any documentation on what --slave does differently? Dirk Not really (and you know where to find the sources...). But sometimes it only takes one memory allocation more or less to shift the effect of a memory bug to a completely different point in space an time. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
* On 2009-01-31 at 09:34 -0600 Dirk Eddelbuettel wrote: > | Without telling us any details about the nature of the bug you found, > | it is difficult to speculate. If the bug was in your C code and > | memory related, it could simply be that the two different run paths > | resulted in different allocation patterns, one of which triggered the > | bug. > > Yes yes and yes :) It was in C, and it was memory related and it dealt > getting results out of the library to which the package interfaces. > > But short of looking at the source, is there any documentation on what > --slave does differently? The R-intro manual has a brief description: --slave Make R run as quietly as possible. This option is intended to support programs which use R to compute results for them. It implies --quiet and --no-save. I suspect that for more detail than that, one would have to look at the sources. But the above helps explain the behavior you saw; a "--quite" R will suppress some output and that will make a difference in terms of memory allocation. + seth -- Seth Falcon | http://userprimary.net/user/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64
Hi Peter, On 31 January 2009 at 16:55, Peter Dalgaard wrote: | Not really (and you know where to find the sources...). Yes, and I had dug through that in the past for littler and other embedding work. I was just wondering if I had missed any documentation, besides the few lines about --slave from the help page, --help switch and intro manual. | But sometimes it | only takes one memory allocation more or less to shift the effect of a | memory bug to a completely different point in space an time. That seems to have been the case. It also didn't help that x86 didn't trigger it, or I would have noticed sooner after my 0.1.0 release. Building on different architectures help shaking these things out. Dirk -- Three out of two people have difficulties with fractions. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8192) [ subscripting sometimes loses names
On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard wrote: > Duncan Murdoch wrote: > >> On 31/01/2009 7:31 AM, Andrew Piskorski wrote: >> >>> On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote: >>> Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame >>> >>> ever tried drop=FALSE ? >>> >>> Simon, no, the drop=FALSE argument has nothing to do with what >>> Christian was talking about. The kind of thing he meant is PR# 8192, >>> "Subject: [ subscripting sometimes loses names": >>> >>> http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 >>> >> >> In that bug report you were asked to provide simple examples, and you >> didn't. >> ... >> I just tracked this one down, and can put together this simple example: >> >> > (1:3)["no"] >> [1] NA >> >> where I think you would want the name "no" attached to the output. > > No, it has nothing to do with indexing by name. It's about preserving existing names when subsetting. And the other two cases where you list "BAD" behaviour? I didn't track them >> down. >> > > I did, and they boil down to variations of > > > data.frame(val=1:3,row.names=letters[1:3])[,1] > [1] 1 2 3 > > but it's not obvious that the result should be named using the row.names > and (in particular) whether or why it should differ from .[[1]] and > $val. Given that for most purposes, extracting the relevant names would > just be unnecessary red tape, I'd say that we can do without it. Compare > data.frame(val=1:3,row.names=letters[1:3])[,1] [1] 1 2 3 > as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1] a b c 1 2 3 X[,1] preserves row names if X is a matrix, and loses them if X is a data frame. To me, this is ugly and inconsistent. One might argue that having names and dimnames at all is "red tape", and wastes memory and computational efficiency -- after all, Fortran arrays had no names. But R chose to drag along the names (sometimes), and it can be very helpful to us humans. Now R should do it consistently. /Christian [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8192) [ subscripting sometimes loses names
Christian Brechbühler wrote: > >>> data.frame(val=1:3,row.names=letters[1:3])[,1] >>> >> [1] 1 2 3 >> >> but it's not obvious that the result should be named using the row.names >> and (in particular) whether or why it should differ from .[[1]] and >> $val. this might be a good argument, if not that [,1] returning a vector rather than a one-column data frame is already inconsistent (with [,1:2], for example). if [,1] were not dropping the data.frame class and were returning a data frame instead, it would be obvious the result should use row names. data.frame(val=1:3,row.names=letters[1:3])[,1,drop=FALSE] will keep the class and row names, though ?'[' says "drop: For matrices and arrays.". it doesn't mean that dropping row names (or dropping dimensions) isn't useful and handy in specific cases, but this makes it no less inconsistent. >> Given that for most purposes, extracting the relevant names would >> just be unnecessary red tape, I'd say that we can do without it. >> > > > Compare > > >> data.frame(val=1:3,row.names=letters[1:3])[,1] >> > [1] 1 2 3 > >> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1] >> > a b c > 1 2 3 > > X[,1] preserves row names if X is a matrix, and loses them if X is a data > frame. To me, this is ugly and inconsistent. > > One might argue that having names and dimnames at all is "red tape", and > wastes memory and computational efficiency -- after all, Fortran arrays had > no names. But R chose to drag along the names (sometimes), and it can be > very helpful to us humans. Now R should do it consistently. > i support this opinion. whether to have or not to have row names is a design decision, and both options may be reasonably argued for and against. but lack of consistency is seldom any good; r consistently lacks consistency. vQ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Inherited Methods in r-devel (for package maintainers mainly)
The revisions below have been re-committed (r47803), and appear to be compatible with the current Matrix package ('0.999375-19'). Thanks to Martin Maechler for help with Matrix. John Chambers wrote: A recently committed revison of R-devel (47740) has introduced a new mechanism for ordering superclasses consistently, with related changes for selecting inherited methods. As part of the process, a function testInheritedMethods has been introduced that examines method selection for the relevant subclasses and reports ambiguities. Maintainers of packages that have methods involving multiple arguments are encouraged to run testInheritedMethods for the relevant generic functions (e.g., the binary operators). The new method selection is unambiguous for single-argument selection. It's preferable to find such ambiguities during package development or revision, rather than having users encounter ambiguous method selection later on. In that spirit, ambiguous method selection is no longer a warning, just a message. The new mechanism for class ordering and method selection is described in a draft paper at http://stat.stanford.edu/~jmc4/classInheritance.pdf (later likely to be part of a submission to the R Journal). John __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#8192) [ subscripting sometimes loses names
On 31/01/2009 3:26 PM, Christian Brechbühler wrote: On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard wrote: Duncan Murdoch wrote: On 31/01/2009 7:31 AM, Andrew Piskorski wrote: On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote: Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame ever tried drop=FALSE ? Simon, no, the drop=FALSE argument has nothing to do with what Christian was talking about. The kind of thing he meant is PR# 8192, "Subject: [ subscripting sometimes loses names": http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192 In that bug report you were asked to provide simple examples, and you didn't. ... I just tracked this one down, and can put together this simple example: > (1:3)["no"] [1] NA where I think you would want the name "no" attached to the output. No, it has nothing to do with indexing by name. It's about preserving existing names when subsetting. I think you misread my message. And the other two cases where you list "BAD" behaviour? I didn't track them down. I did, and they boil down to variations of data.frame(val=1:3,row.names=letters[1:3])[,1] [1] 1 2 3 but it's not obvious that the result should be named using the row.names and (in particular) whether or why it should differ from .[[1]] and $val. Given that for most purposes, extracting the relevant names would just be unnecessary red tape, I'd say that we can do without it. Compare data.frame(val=1:3,row.names=letters[1:3])[,1] [1] 1 2 3 as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1] a b c 1 2 3 X[,1] preserves row names if X is a matrix, and loses them if X is a data frame. To me, this is ugly and inconsistent. One might argue that having names and dimnames at all is "red tape", and wastes memory and computational efficiency -- after all, Fortran arrays had no names. But R chose to drag along the names (sometimes), and it can be very helpful to us humans. Now R should do it consistently. In one case you're working with a matrix, and in the other, a dataframe. So perfect consistency is impossible: matrices and dataframes are not the same. So it's a matter of deciding how much consistency is worth pursuing. Now, it seems nobody thinks this is worth pursuing: so it won't get changed. To get it changed, you should make the change, then investigate what would break the change were adopted, and what would become slower, etc. Or convince someone else to do that. But the fact that you think it's ugly is probably not convincing. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel