DISCCRS wrote:
Hi,

I have a dataset of historical monthly temperature data that is grouped by
weather station. I want to create z-scores of the monthly data using a base
period of a subset of years. I subset the dataset first to include only data
from the years (V2) that make up the base period so I could calculate the
appropriate means and standard deviations

         V1   V2    V3   V12   V15   V16   V19
84    11084 1978 40.16 63.13 44.06 63.41 63.47
85    11084 1979 43.71 60.88 48.09 64.64 62.34
86    11084 1980 50.61 61.64 47.93 62.10 63.45
87    11084 1981 42.11 63.59 47.29 63.42 63.37
1583  18469 1978 30.78 56.93 34.62 56.40 57.39
1584  18469 1979 33.48 57.68 37.76 58.70 57.30
1585  18469 1980 40.83 54.48 39.27 56.14 57.42
1586  18469 1981 33.33 56.28 37.57 56.20 56.47
2688  25467 1978 52.61 75.51 55.02 68.20 70.70
2689  25467 1979 47.95 74.54 50.70 67.58 70.24
2690  25467 1980 55.12 72.51 56.59 66.49 71.21
2691  25467 1981 56.70 70.33 57.65 69.35 72.16

Then I split the data by group ID (V1) and got the means and std deviations:

subsets <- split(test,V1)
sub.means <- data.frame(t(sapply(subsets, mean)))
sub.sds <- data.frame(t(sapply(subsets, sd, na.rm=T)))

Here are the means, for example:

           V1     V2      V3     V12     V15     V16     V19
11084   11084 1979.5 44.1475 62.3100 46.8425 63.3925 63.1575
18469   18469 1979.5 34.6050 56.3425 37.3050 56.8600 57.1450
25467   25467 1979.5 53.0950 73.2225 54.9900 67.9050 71.0775

How can I approach the next step -- applying the means and std deviations
from the two new arrays that I created to the original dataset (by station
and by month)? Or should I be using a different approach entirely? There are
NAs throughout the dataset.
Thanks very much in advance.

-Jennife
Playing the ball from where it landed, how about

nm <- as.character(test$V1)
(test - sub.means[nm,])/sub.sds[nm,]

However, there could be a neater solution by looping ave(V2, V1, FUN=scale)

Or, you could apply scale() on each of your split() data and then unsplit(). Just beware that scale() turns things into matrices so you need an as.data.frame step inbetween.

--
  O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to