Very good points. They closely match the current prototype I have written...

> Starting by working on an interface for such object(s) is probably the first 
> step toward a unified solution

Agree. Getting a good API is always the most important step.

> Dimension-level is what seems to the be most needed...

True, and that was Henrik's original suggestion. But I find all three are 
closely related to the same topic (metadata) and as such deserve to be worked 
out together, but if most people agree otherwise, the direction is clear.

> - Object-level, if not linked to any dimension-attribute is such saying that 
> one want to attach anything to any object. That's what attr() is already 
> doing.

Except that plain attributes are dropped when subsetting. I've found myself 
dozens of times creating classes must to create a `[` method for them that 
preserves some attributes. This looks like such a common situation that having 
a mechanism to avoid the user programming the same stuff again and again would 
be handy.

> - Cell-level, is may be out-of-scope for one first trial (but may be I missed 
> the use-cases for it)

Although I agree that cell-level is far less common, here are a couple of use 
cases I've hit recently:

1) the array represents time series in columns. The original data comes in a 
different frequency for each column, with some data missing. When you align to 
a common frequency and interpolate missing values, I needed a factor array of 
the same dimension as the data array identifying whether each observation 
corresponded to the actual original series, or had been interpolated, and 
whether interpolation was due to missing data or to frequency alignment. Of 
course, I needed the factor array to be subsetted together with the array.

2) the array is a table representing data to be formatted by a reporting system 
(Sweave, R2HTML, etc), similar to the 'xtable' class. So I needed to associate 
formatting information to each individual "cell" (font, color, borders...), as 
well to each dimension and to the whole table.

Anyway, it's far easier to add "cell-level" metadata on top of the other 
features with a new class: for `[` subscripting just call NextMethod() and then 
apply the same indexes to the object storing the cell-level metadata. But I 
still think it's useful to work out data object's metadata at all possible 
levels with a unified interface.


About the subscripting `[` methods, I don't see the need to modify `[<-` for 
arrays, as out-of-bound indexes generate errors with arrays (unlike vectors or 
data frames), so `[<-` would only replace data and leave metadata untouched. Am 
I missing something? 

> may be a function called "dimmeta()" (for consistency with "dimnames()") ? 

I'm using 'dimdata' in my current prototype, and Henrik suggested 'dimattr', 
but I really like your proposal more. 

Wrappers to the two first elements of 'dimmeta' for 2-dim arrays could be added 
in the same vein as 'rownames' and 'colnames': 'rowmeta' and 'colmeta'.

> The signature could be dimmeta(x, i), with x the object, 

For consistency with 'dimnames', the 'i' argument could be dropped and use 
dimmeta(x)[[i]] instead...


Other standard generics to be affected would be:

 * rbind & cbind for 2-dim arrays/matrices: they should combine the metadata, 
and for dimension-sensitive metadata can be modelled upon what is done with 
dimnames: use rowmeta (colmeta) of the first object with them in cbind (rbind), 
and combine colmeta (rowmeta) of all objects with them, filling with 
NAs/NULLs/.. for non metadata-sensitive objects being combined. An issue of 
coercing dimmeta of different classes may arise.

 * `dim<-`, but this may raise the same problem of coercing dimmeta of 
different classes.


...and I agree with the rest of your comments.

Best,

Enrique

-----Original Message-----
From: Laurent Gautier [mailto:lgaut...@gmail.com] 
Sent: jueves, 09 de julio de 2009 14:15
Cc: Heinz Tuechler; Bengoechea Bartolomé Enrique (SIES 73); Tony Plate; Henrik 
Bengtsson; r-devel@r-project.org
Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes

Starting by working on an interface for such object(s) is probably the first 
step toward a unified solution, and this before about if and how R attributes 
are used.

It would also help to ensure a smooth transition from the existing classes 
implementing a similar solution (first the interface is added to those classes, 
then after a grace period the classes are eventually refactored).

Dimension-level is what seems to the be most needed... but I am not convinced 
of the practicality of the object-level, and cell-level scheme s proposed:

- Object-level, if not linked to any dimension-attribute is such saying that 
one want to attach anything to any object. That's what attr() is already doing.

- Cell-level, is may be out-of-scope for one first trial (but may be I missed 
the use-cases for it)



If starting with behaviour, it seems to boil to having "["/"[<-" and 
"dimmeta()"/"dimmeta<-()", :

- extract "[" / replace "[<-" :

   * keeps working the way it already does

   * extracts a subset of the object as well as a subset of the 
dimension-associated metadata.

   * departing too much from the way "[" is working and add 
behind-the-curtain name matching will only compromise the chances of 
adoption.

   * forget about the bit about which metadata is kept and which one 
isn't when using "[". Make a function "unmeta()" (similar behavior to 
"unname()") to drop them all, or work it out with something like
 > dimmeta(x, 1) <- NULL # drop the metadata associated with dimension 1

- access the dimension-associated metadata:

   * may be a function called "dimmeta()" (for consistency with 
"dimnames()") ? The signature could be dimmeta(x, i), with x the object, 
and i the dimension requested. A replace function "dimmeta<-"(x, i, 
value) would be provided.


In the abstract the "names" associated with a given dimension is just 
one of possible metadata, but I'd keep away from meddling with it for a 
start.


It would seem natural that metadata associated with one dimension:
would a table-like object (data.frame seems natural in R, and 
unfortunately there is no data.frame-like structure in R).



L.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to