DISCCRS wrote:
Hi,
I have a dataset of historical monthly temperature data that is grouped by
weather station. I want to create z-scores of the monthly data using a base
period of a subset of years. I subset the dataset first to include only data
from the years (V2) that make up the base period so I could calculate the
appropriate means and standard deviations
V1 V2 V3 V12 V15 V16 V19
84 11084 1978 40.16 63.13 44.06 63.41 63.47
85 11084 1979 43.71 60.88 48.09 64.64 62.34
86 11084 1980 50.61 61.64 47.93 62.10 63.45
87 11084 1981 42.11 63.59 47.29 63.42 63.37
1583 18469 1978 30.78 56.93 34.62 56.40 57.39
1584 18469 1979 33.48 57.68 37.76 58.70 57.30
1585 18469 1980 40.83 54.48 39.27 56.14 57.42
1586 18469 1981 33.33 56.28 37.57 56.20 56.47
2688 25467 1978 52.61 75.51 55.02 68.20 70.70
2689 25467 1979 47.95 74.54 50.70 67.58 70.24
2690 25467 1980 55.12 72.51 56.59 66.49 71.21
2691 25467 1981 56.70 70.33 57.65 69.35 72.16
Then I split the data by group ID (V1) and got the means and std deviations:
subsets <- split(test,V1)
sub.means <- data.frame(t(sapply(subsets, mean)))
sub.sds <- data.frame(t(sapply(subsets, sd, na.rm=T)))
Here are the means, for example:
V1 V2 V3 V12 V15 V16 V19
11084 11084 1979.5 44.1475 62.3100 46.8425 63.3925 63.1575
18469 18469 1979.5 34.6050 56.3425 37.3050 56.8600 57.1450
25467 25467 1979.5 53.0950 73.2225 54.9900 67.9050 71.0775
How can I approach the next step -- applying the means and std deviations
from the two new arrays that I created to the original dataset (by station
and by month)? Or should I be using a different approach entirely? There are
NAs throughout the dataset.
Thanks very much in advance.
-Jennife
Playing the ball from where it landed, how about
nm <- as.character(test$V1)
(test - sub.means[nm,])/sub.sds[nm,]
However, there could be a neater solution by looping ave(V2, V1, FUN=scale)
Or, you could apply scale() on each of your split() data and then
unsplit(). Just beware that scale() turns things into matrices so you
need an as.data.frame step inbetween.
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.