Hi, I have a dataset of historical monthly temperature data that is grouped by weather station. I want to create z-scores of the monthly data using a base period of a subset of years. I subset the dataset first to include only data from the years (V2) that make up the base period so I could calculate the appropriate means and standard deviations
V1 V2 V3 V12 V15 V16 V19 84 11084 1978 40.16 63.13 44.06 63.41 63.47 85 11084 1979 43.71 60.88 48.09 64.64 62.34 86 11084 1980 50.61 61.64 47.93 62.10 63.45 87 11084 1981 42.11 63.59 47.29 63.42 63.37 1583 18469 1978 30.78 56.93 34.62 56.40 57.39 1584 18469 1979 33.48 57.68 37.76 58.70 57.30 1585 18469 1980 40.83 54.48 39.27 56.14 57.42 1586 18469 1981 33.33 56.28 37.57 56.20 56.47 2688 25467 1978 52.61 75.51 55.02 68.20 70.70 2689 25467 1979 47.95 74.54 50.70 67.58 70.24 2690 25467 1980 55.12 72.51 56.59 66.49 71.21 2691 25467 1981 56.70 70.33 57.65 69.35 72.16 Then I split the data by group ID (V1) and got the means and std deviations: subsets <- split(test,V1) sub.means <- data.frame(t(sapply(subsets, mean))) sub.sds <- data.frame(t(sapply(subsets, sd, na.rm=T))) Here are the means, for example: V1 V2 V3 V12 V15 V16 V19 11084 11084 1979.5 44.1475 62.3100 46.8425 63.3925 63.1575 18469 18469 1979.5 34.6050 56.3425 37.3050 56.8600 57.1450 25467 25467 1979.5 53.0950 73.2225 54.9900 67.9050 71.0775 How can I approach the next step -- applying the means and std deviations from the two new arrays that I created to the original dataset (by station and by month)? Or should I be using a different approach entirely? There are NAs throughout the dataset. Thanks very much in advance. -Jennifer [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.