Your question and your English are just fine!
If I were you, I would not mess around with the ccf() function but would attack the question "directly" using the cor.test() function, with sub-vectors of your x vector. Personally I find the notion of "lag" in acf() and ccf() highly confusing and I always make "parity errors" --- i.e. I get things backwards! Moreover, the ccf() function is throwing information away; it truncates the x vector to have the same length as y, i.e. 21, and so never uses x[22:29] --- which have useful content in respect of lags less than 8. You haven't a lot of data, so it is prudent not to be wasteful. What I would do: OP <- par(mfrow=c(3,3)) for(i in 1:9) { CT <- cor.test(x[i:(20+i)],y,alternative="less") PV <- CT$p.value cat("lag =",9-i,"p-value =",PV,"\n") COR <- sprintf("%1.3f",CT$estimate) plot(x[i:(20+i)],y,xlab="x",main=paste("lag =",9-i,"corr =",COR)) } par(OP) HTH cheers, Rolf Turner On 01/29/2013 11:26 PM, Larissa Modica wrote:
Hello everybody, I am sorry if my questions are too simple or not easily understandable. I’m not a native English speaker and this is my first analysis using this function. I have a problem with a cross correlation function and I would like to understand how I have to perform it in R. I have yearly data of an independent variable (x) from 1982 to 2010, and I also have yearly data of a variable (y)from 1990 to 2010. I think y could be influenced by the variable (x) with a delay of 6 years. When I plot the data of x from 1986 to 2006 against the data of y from 1990 to 2010, the graphic has a opposite trend, i.e. when the variable x was high in the 1986, the variable y was low in 1990 and so on until the end of the time series. Consequently I aspect that the two time series are correlated with a negative correlation value. Namely: Yyear=f(xyear-Lag). And corr has a negative value. I write here the script I have performed in R. a) x<-c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029 2,184.9683, 222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653) y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794, 26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892) x<-ts(x) y<-ts(y) dumb<-ccf( x,y, ylab = "cross-correlation", xlab = "Time lag", main = "y influenced by x") dumb Autocorrelations of series ‘X’, by lag -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0.083 0.133 0.253 0.323 0.386 0.515 0.544 0.609 0.448 0.118 0 1 2 3 4 5 6 7 8 9 -0.154 -0.283 -0.416 -0.326 -0.265 -0.217 -0.285 -0.340 -0.315 -0.254 10 - 0.188 My question is: Is the script correct to ask the question I need to answer? X and y have to heve the same length (i.e. I have to consider the same number of years)? What does this result means? My interpretation is: the higher correlation was a lag of -3 years. It means that what happened to “x” variable in 1987 influenced “y” in 1990? Also, if it was not correct, is correct to write: b) c(105.3381,126.2792,121.7298,110.35,133.1647,140.5724,183.8853,177.0154,181.2147,186.4154,209.6958,205.029 2,184.9683, 222.9683,219.8538,268.1029,249.1545,228.942,198.2119,171.0913,146.346,166.3192,163.5747,173.3394,180.7952,176.8276,159.7074,150.6029,110.9653) y<-c(32.93415,45.75486,29.36993,23.70824,21.30857,19.78977,16.88913,22.25963,19.32558,19.73704,22.62746,28.90173,27.66794, 26.23163,28.69109,22.04674,26.47496,33.03602,41.62231,28.96627,31.80892) x<-ts(x) y<-ts(y) dumb<-ccf( x[3:23],y, ylab = "cross-correlation", xlab = "Time lag", main = "y influenced by x") dumb Autocorrelations of series ‘X’, by lag -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0.104 0.221 0.257 0.393 0.478 0.601 0.517 0.406 0.087 -0.270 0 1 2 3 4 5 6 7 8 9 -0.481 -0.397 -0.344 -0.241 -0.284 -0.349 -0.337 -0.265 -0.198 -0.161 10 0.044 As I understand this results mean that the higher correlation is observed when the lag =0. That means a difference of 6 years that I set up when I wrote x[3:23] that simply means work with years from 1984 to 2004. In summary I would like to know: 1) if the analysis is correct in the way a) or in the way b) 2) if there is another way to demonstrate that the variable x have an influence on the variable y with a delay of 6 years. Thank very much to anybody who could help me.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.