On Wed, Jul 8, 2009 at 10:51 AM, Michael A. Miller<mmill...@iupui.edu> wrote: >>>>>> Mark wrote: > > > Currently my data is one experiment per row, but that's > > wasting space as most experiments only take 20% of the row > > and 80% of the row is filled with 0's. I might want to make > > the array more narrow and have a flag somewhere in the 1st > > 10 columns that says the this row is a continuation row > > from the previous row. That way I could pack the array > > better, use less memory and when I do finally test for 0 I > > have a short line to traverse? > > This may be a bit off track from the data manipulation you are > working on, but I thought I'd point out that another way to > handle this sort of data is to make a table with one measurement > per row, rather than one experiment per row. > > experiment measurement value > A 1 0.27 > A 2 0.66 > A 3 0.24 > A 4 0.55 > B 1 0.13 > B 2 0.65 > B 3 0.83 > B 4 0.41 > B 5 0.92 > B 6 0.67 > C 1 0.75 > C 2 0.97 > C 3 0.49 > C 4 0.58 > D 1 1.00 > D 2 0.71 > E 1 0.11 > E 2 0.50 > E 3 0.98 > E 4 0.07 > E 5 0.94 > E 6 0.57 > E 7 0.34 > E 8 0.21 > > > If you wrote the output of your calculations in this way, one > value per line, it can easily be read into R as a data.frame and > handled with less need for munging. No need to remove the > zero-padding because the zeros aren't needed in the first place. > > You can subset the data with subset, as in > > test <- read.table('test.dat',header=TRUE) > expA <- subset(test, experiment=='A') > expB <- subset(test, experiment=='B') > > so there is no need to deal with ragged/zero-padded arrays. Your > plots can be grouped automatically with lattice: > > require(lattice) > xyplot(value ~ measurement, data=test, group=experiment, type='b') > xyplot(value ~ measurement | experiment, data=test, type='b') > > > It is simple to do calculations by experiment using tapply. For > example > > >> with(test, tapply(value, experiment, mean)) > A B C D E > 0.4300000 0.6016667 0.6975000 0.8550000 0.4650000 > > >> with(test, tapply(measurement, experiment, max)) > A B C D E > 4 6 4 2 8 > > > > Mike >
Mike, It's not really that far off track as I didn't have any background when I started this in R. This is the first time I've used it. I simply chose to use a format that I thought would work for me in both Excel and R. I do like your examples. My impression of reshape coupled with cast is that it's pretty capable of giving me more or less the same format you suggest although it is a bit of work. Currently in my files I save only the start and finish times of the experiments and planned on calculating all the times in the middle if necessary. With this format I'd just write them out on each line and save that work in R. I suppose the files using this alternative format would be a lot larger on disk. I currently have 10 values + 500 observations per experiment with an average experiment tracking file containing maybe 500-1000 experiments. With this format in the worst I suppose I'd have (10+1) * 1000 per experiment on disk, but on average it would be less than that because as you say I wouldn't write out any zeros. Once in R in memory they'd be equivalent. Disk space doesn't matter but reading and writing the files might be slower. I suppose I don't really have to write the zeros out anyway, but at this point it's jsut one additional subset after going through reshape. It might be an advantage to get to the subset commands immediately but still I've got 10 independent variables and I suspect I'm going to be using reshape/cast more than once to get to my answers so I haven't been against learning how to work with it. Overall they are good inputs and I appreciate them. Thanks! Cheers, Mark ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.