[Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Andrew Piskorski
This (tangential) discussion really should be a separate thread so I
changed the subject line above.

On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
> Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame

> >My boss was debugging an issue in our R code.  We have our own "["
> >functions, because stock R drops names when subscripting.
> 
> ... if you tell it to do so, yes. If you tell it to not do that, it  
> won't ... ever tried drop=FALSE ?

Simon, no, the drop=FALSE argument has nothing to do with what
Christian was talking about.  The kind of thing he meant is PR# 8192,
"Subject: [ subscripting sometimes loses names":

  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192

In R, subscripting with "[" USUALLY retains names, but R has various
edge cases where it (IMNSHO) inappropriately discards them.  This
occurs with both .Primitive("[") and "[.data.frame".  This has been
known for years, but I have not yet tried digging into R's
implementation to see where and how the names are actually getting
lost.

Incidentally, versions of S-Plus since approximately S-Plus 6.0 back
in 2001 show similar buggy edge case behavior.  Older versions of
S-Plus, c. S-Plus 3.3 and earlier, had the correct, name preserving
behavior.  I presume that the original Bell Labs S had correct
name-preserving behavior, and then the S-Plus developers broke it
sometime along the way.

-- 
Andrew Piskorski 
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Duncan Murdoch

On 31/01/2009 7:31 AM, Andrew Piskorski wrote:

This (tangential) discussion really should be a separate thread so I
changed the subject line above.

On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:

Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling [.data.frame



My boss was debugging an issue in our R code.  We have our own "["
functions, because stock R drops names when subscripting.
... if you tell it to do so, yes. If you tell it to not do that, it  
won't ... ever tried drop=FALSE ?


Simon, no, the drop=FALSE argument has nothing to do with what
Christian was talking about.  The kind of thing he meant is PR# 8192,
"Subject: [ subscripting sometimes loses names":

  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192


In that bug report you were asked to provide simple examples, and you 
didn't.  I imagine that's why there was no action on it.  It is not that 
easy for someone else to actually find the simple example that led you 
to print


 $vec.1
BAD  $vec.1[[1]]   $vec.1[[2]]
ac  a  c no
13   NA 1  3 NA

I just tracked this one down, and can put together this simple example:

> (1:3)["no"]
[1] NA

where I think you would want the name "no" attached to the output.  (Or 
maybe your more complicated example is wanted?  You don't explain.)  But 
that looks like documented behaviour to me:  according to my reading of 
"Indexing by vectors" in the R Language Definition manual, it should 
give the same answer as (1:3)[4], and it does.  So it's not a bug, but a 
wishlist item.


And the other two cases where you list "BAD" behaviour?  I didn't track 
them down.


I know you spent a lot of time putting together that bug report; it 
seems a shame that it is being ignored because you put in too much:  you 
really should simplify it as you were asked to do.


Duncan Murdoch




In R, subscripting with "[" USUALLY retains names, but R has various
edge cases where it (IMNSHO) inappropriately discards them.  This
occurs with both .Primitive("[") and "[.data.frame".  This has been
known for years, but I have not yet tried digging into R's
implementation to see where and how the names are actually getting
lost.

Incidentally, versions of S-Plus since approximately S-Plus 6.0 back
in 2001 show similar buggy edge case behavior.  Older versions of
S-Plus, c. S-Plus 3.3 and earlier, had the correct, name preserving
behavior.  I presume that the original Bell Labs S had correct
name-preserving behavior, and then the S-Plus developers broke it
sometime along the way.



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Seth Falcon
Hi Dirk,

* On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote:
> Turns out, as so often, that there was a regular bug lurking which is now
> fixed in RDieHarder 0.1.1.  But I still would like to understand exactly what
> is different so that --slave was able to trigger it when --vanilla,
> --no-save, ... did not.  
> 
> [ The library() vs require() issue may have been a red herring. ]

Without telling us any details about the nature of the bug you found,
it is difficult to speculate.  If the bug was in your C code and
memory related, it could simply be that the two different run paths
resulted in different allocation patterns, one of which triggered the
bug.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Peter Dalgaard

Duncan Murdoch wrote:

On 31/01/2009 7:31 AM, Andrew Piskorski wrote:

This (tangential) discussion really should be a separate thread so I
changed the subject line above.

On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling 
[.data.frame



My boss was debugging an issue in our R code.  We have our own "["
functions, because stock R drops names when subscripting.
... if you tell it to do so, yes. If you tell it to not do that, it  
won't ... ever tried drop=FALSE ?


Simon, no, the drop=FALSE argument has nothing to do with what
Christian was talking about.  The kind of thing he meant is PR# 8192,
"Subject: [ subscripting sometimes loses names":

  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192


In that bug report you were asked to provide simple examples, and you 
didn't.  I imagine that's why there was no action on it.  It is not that 
easy for someone else to actually find the simple example that led you 
to print


 $vec.1
BAD  $vec.1[[1]]   $vec.1[[2]]
ac  a  c no
13   NA 1  3 NA

I just tracked this one down, and can put together this simple example:

 > (1:3)["no"]
[1] NA

where I think you would want the name "no" attached to the output.  (Or 
maybe your more complicated example is wanted?  You don't explain.)  But 
that looks like documented behaviour to me:  according to my reading of 
"Indexing by vectors" in the R Language Definition manual, it should 
give the same answer as (1:3)[4], and it does.  So it's not a bug, but a 
wishlist item.


And the other two cases where you list "BAD" behaviour?  I didn't track 
them down.


I did, and they boil down to variations of

> data.frame(val=1:3,row.names=letters[1:3])[,1]
[1] 1 2 3

but it's not obvious that the result should be named using the row.names 
and (in particular) whether or why it should differ from .[[1]] and 
$val. Given that for most purposes, extracting the relevant names 
would just be unnecessary red tape, I'd say that we can do without it.




--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Dirk Eddelbuettel

Hi Seth,

Thanks for the follow-up.

On 31 January 2009 at 06:59, Seth Falcon wrote:
| * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote:
| > Turns out, as so often, that there was a regular bug lurking which is now
| > fixed in RDieHarder 0.1.1.  But I still would like to understand exactly 
what
| > is different so that --slave was able to trigger it when --vanilla,
| > --no-save, ... did not.  
| > 
| > [ The library() vs require() issue may have been a red herring. ]
| 
| Without telling us any details about the nature of the bug you found,
| it is difficult to speculate.  If the bug was in your C code and
| memory related, it could simply be that the two different run paths
| resulted in different allocation patterns, one of which triggered the
| bug.

Yes yes and yes :)  It was in C, and it was memory related and it dealt
getting results out of the library to which the package interfaces. 

But short of looking at the source, is there any documentation on what
--slave does differently?

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Peter Dalgaard

Dirk Eddelbuettel wrote:

Hi Seth,

Thanks for the follow-up.

On 31 January 2009 at 06:59, Seth Falcon wrote:
| * On 2009-01-30 at 22:38 -0600 Dirk Eddelbuettel wrote:
| > Turns out, as so often, that there was a regular bug lurking which is now
| > fixed in RDieHarder 0.1.1.  But I still would like to understand exactly 
what
| > is different so that --slave was able to trigger it when --vanilla,
| > --no-save, ... did not.  
| > 
| > [ The library() vs require() issue may have been a red herring. ]
| 
| Without telling us any details about the nature of the bug you found,

| it is difficult to speculate.  If the bug was in your C code and
| memory related, it could simply be that the two different run paths
| resulted in different allocation patterns, one of which triggered the
| bug.

Yes yes and yes :)  It was in C, and it was memory related and it dealt
getting results out of the library to which the package interfaces. 


But short of looking at the source, is there any documentation on what
--slave does differently?

Dirk



Not really (and you know where to find the sources...). But sometimes it 
only takes one memory allocation more or less to shift the effect of a 
memory bug to a completely different point in space an time.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Seth Falcon
* On 2009-01-31 at 09:34 -0600 Dirk Eddelbuettel wrote:
> | Without telling us any details about the nature of the bug you found,
> | it is difficult to speculate.  If the bug was in your C code and
> | memory related, it could simply be that the two different run paths
> | resulted in different allocation patterns, one of which triggered the
> | bug.
> 
> Yes yes and yes :)  It was in C, and it was memory related and it dealt
> getting results out of the library to which the package interfaces. 
> 
> But short of looking at the source, is there any documentation on what
> --slave does differently?

The R-intro manual has a brief description:

--slave
Make R run as quietly as possible. This option is intended to
support programs which use R to compute results for them. It
implies --quiet and --no-save.

I suspect that for more detail than that, one would have to look at
the sources.  But the above helps explain the behavior you saw; a
"--quite" R will suppress some output and that will make a difference
in terms of memory allocation.

+ seth

-- 
Seth Falcon | http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Side-effects of require() vs library() on x86_64 aka amd64

2009-01-31 Thread Dirk Eddelbuettel

Hi Peter,

On 31 January 2009 at 16:55, Peter Dalgaard wrote:
| Not really (and you know where to find the sources...).

Yes, and I had dug through that in the past for littler and other embedding
work.  I was just wondering if I had missed any documentation, besides the
few lines about --slave from the help page, --help switch and intro manual.

| But sometimes it 
| only takes one memory allocation more or less to shift the effect of a 
| memory bug to a completely different point in space an time.

That seems to have been the case. It also didn't help that x86 didn't trigger
it, or I would have noticed sooner after my 0.1.0 release.  Building on
different architectures help shaking these things out.

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Christian Brechbühler
On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard
wrote:

> Duncan Murdoch wrote:
>
>> On 31/01/2009 7:31 AM, Andrew Piskorski wrote:
>>
>>> On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
>>>
 Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling
 [.data.frame

>>>
>>>  ever tried drop=FALSE ?

>>>
>>> Simon, no, the drop=FALSE argument has nothing to do with what
>>> Christian was talking about.  The kind of thing he meant is PR# 8192,
>>> "Subject: [ subscripting sometimes loses names":
>>>
>>>  http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>>>
>>
>> In that bug report you were asked to provide simple examples, and you
>> didn't.
>> ...
>> I just tracked this one down, and can put together this simple example:
>>
>>  > (1:3)["no"]
>> [1] NA
>>
>> where I think you would want the name "no" attached to the output.
>
> No, it has nothing to do with indexing by name.  It's about preserving
existing names when subsetting.

And the other two cases where you list "BAD" behaviour?  I didn't track them
>> down.
>>
>
> I did, and they boil down to variations of
>
> > data.frame(val=1:3,row.names=letters[1:3])[,1]
> [1] 1 2 3
>
> but it's not obvious that the result should be named using the row.names
> and (in particular) whether or why it should differ from .[[1]] and
> $val. Given that for most purposes, extracting the relevant names would
> just be unnecessary red tape, I'd say that we can do without it.


Compare

> data.frame(val=1:3,row.names=letters[1:3])[,1]
[1] 1 2 3
> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]
a b c
1 2 3

X[,1] preserves row names if X is a matrix, and loses them if X is a data
frame.  To me, this is ugly and inconsistent.

One might argue that having names and dimnames at all is "red tape", and
wastes memory and computational efficiency -- after all, Fortran arrays had
no names.  But R chose to drag along the names (sometimes), and it can be
very helpful to us humans.  Now R should do it consistently.

/Christian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Wacek Kusnierczyk
Christian Brechbühler wrote:


>
>>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>>>   
>> [1] 1 2 3
>>
>> but it's not obvious that the result should be named using the row.names
>> and (in particular) whether or why it should differ from .[[1]] and
>> $val. 

this might be a good argument, if not that [,1] returning a vector
rather than a one-column data frame is already inconsistent (with
[,1:2], for example).  if [,1] were not dropping the data.frame class
and were returning a data frame instead, it would be obvious the result
should use row names. 

data.frame(val=1:3,row.names=letters[1:3])[,1,drop=FALSE]

will keep the class and row names, though ?'[' says "drop: For matrices
and arrays.".

it doesn't mean that dropping row names (or dropping dimensions) isn't
useful and handy in specific cases, but this makes it no less
inconsistent. 

>> Given that for most purposes, extracting the relevant names would
>> just be unnecessary red tape, I'd say that we can do without it.
>> 
>
>
> Compare
>
>   
>> data.frame(val=1:3,row.names=letters[1:3])[,1]
>> 
> [1] 1 2 3
>   
>> as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]
>> 
> a b c
> 1 2 3
>
> X[,1] preserves row names if X is a matrix, and loses them if X is a data
> frame.  To me, this is ugly and inconsistent.
>
> One might argue that having names and dimnames at all is "red tape", and
> wastes memory and computational efficiency -- after all, Fortran arrays had
> no names.  But R chose to drag along the names (sometimes), and it can be
> very helpful to us humans.  Now R should do it consistently.
>   

i support this opinion.  whether to have or not to have row names is a
design decision, and both options may be reasonably argued for and
against.  but lack of consistency is seldom any good;  r consistently
lacks consistency.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Inherited Methods in r-devel (for package maintainers mainly)

2009-01-31 Thread John Chambers
The revisions below have been re-committed (r47803), and appear to be 
compatible with the current Matrix package ('0.999375-19').  Thanks to 
Martin Maechler for help with Matrix.


John Chambers wrote:
A recently committed revison of R-devel (47740) has introduced a new 
mechanism for ordering superclasses consistently, with related changes 
for selecting inherited methods.


As part of the process, a function testInheritedMethods has been  
introduced that examines method selection for the relevant subclasses 
and reports ambiguities.


Maintainers of packages that have methods involving multiple arguments 
are encouraged to run testInheritedMethods for the relevant generic 
functions (e.g., the binary operators).  The new method selection is 
unambiguous for single-argument selection.


It's preferable  to find such ambiguities during package development 
or revision, rather than having users encounter ambiguous method 
selection later on.  In that spirit, ambiguous method selection is no 
longer a warning, just a message.


The new mechanism for class ordering and method selection is described 
in a draft paper at 
http://stat.stanford.edu/~jmc4/classInheritance.pdf (later likely to 
be part of a submission to the R Journal).


John

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-01-31 Thread Duncan Murdoch

On 31/01/2009 3:26 PM, Christian Brechbühler wrote:

On Sat, Jan 31, 2009 at 10:13 AM, Peter Dalgaard
wrote:


Duncan Murdoch wrote:


On 31/01/2009 7:31 AM, Andrew Piskorski wrote:


On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:


Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling
[.data.frame


 ever tried drop=FALSE ?
Simon, no, the drop=FALSE argument has nothing to do with what
Christian was talking about.  The kind of thing he meant is PR# 8192,
"Subject: [ subscripting sometimes loses names":

 http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192


In that bug report you were asked to provide simple examples, and you
didn't.
...
I just tracked this one down, and can put together this simple example:

 > (1:3)["no"]
[1] NA

where I think you would want the name "no" attached to the output.

No, it has nothing to do with indexing by name.  It's about preserving

existing names when subsetting.


I think you misread my message.



And the other two cases where you list "BAD" behaviour?  I didn't track them

down.


I did, and they boil down to variations of


data.frame(val=1:3,row.names=letters[1:3])[,1]

[1] 1 2 3

but it's not obvious that the result should be named using the row.names
and (in particular) whether or why it should differ from .[[1]] and
$val. Given that for most purposes, extracting the relevant names would
just be unnecessary red tape, I'd say that we can do without it.



Compare


data.frame(val=1:3,row.names=letters[1:3])[,1]

[1] 1 2 3

as.matrix(data.frame(val=1:3,row.names=letters[1:3]))[,1]

a b c
1 2 3

X[,1] preserves row names if X is a matrix, and loses them if X is a data
frame.  To me, this is ugly and inconsistent.

One might argue that having names and dimnames at all is "red tape", and
wastes memory and computational efficiency -- after all, Fortran arrays had
no names.  But R chose to drag along the names (sometimes), and it can be
very helpful to us humans.  Now R should do it consistently.


In one case you're working with a matrix, and in the other, a dataframe. 
 So perfect consistency is impossible:  matrices and dataframes are not 
the same.  So it's a matter of deciding how much consistency is worth 
pursuing.  Now, it seems nobody thinks this is worth pursuing:  so it 
won't get changed.


To get it changed, you should make the change, then investigate what 
would break the change were adopted, and what would become slower, etc. 
 Or convince someone else to do that.  But the fact that you think it's 
ugly is probably not convincing.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel