Steve Murray wrote:
Dear all,
I am trying to validate a model by comparing simulated output values against
observed values. I have produced a simple X-y scatter plot with a 1:1 line, so
that the closer the points fall to this line, the better the 'fit' between the
modelled data and the observation data.
I am now attempting to quantify the strength of this fit by using a statistical
test in R. I am no statistics guru, but from my limited understanding, I
suspect that I need to use the Chi Squared test (I am more than happy to be
corrected on this though!).
However, this results in the following:
chisq.test(data$Simulation,data$Observation)
Pearson's Chi-squared test
data: data$Simulation and data$Observation
X-squared = 567, df = 550, p-value = 0.2989
Warning message:
In chisq.test(data$Simulation, data$Observation) :
Chi-squared approximation may be incorrect
The ?chisq.test document suggests that the objects should be of vector or
matrix format, so I tried the following, but still receive a warning message
(and different results):
chisq.test(as.matrix(data[,4:5]))
Pearson's Chi-squared test
data: as.matrix(data[, 4:5])
X-squared = 130.8284, df = 26, p-value = 6.095e-16
Warning message:
In chisq.test(as.matrix(data[, 4:5])) :
Chi-squared approximation may be incorrect
What am I doing wrong and how can I successfully measure how well the simulated
values fit the observed values?
If it's of any help, here are how my data are structured - note that I am only
using columns 4 and 5 (Observation and Simulation).
str(data)
'data.frame': 27 obs. of 5 variables:
$ Location : Factor w/ 27 levels "Australia","Brazil",..: 8 2 13 19 22
14 16 23 6 7 ...
$ Vegetation : Factor w/ 21 levels "Beech","Broadleaf evergreen
laurel",..: 17 21 2 16 15 16 9 16 3 4 ...
$ Vegetation.Class: Factor w/ 4 levels "Boreal and Temperate Evergreen",..: 3
3 4 1 1 1 4 1 4 1 ...
$ Observation : num 24 8.9 14.7 26.7 42.4 31.7 30.8 7.5 14 22 ...
$ Simulation : num 33.9 7.8 9.74 7.6 11.8 10.7 12 28.1 1.7 1.7 ...
The chisquare test is not the right thing here. You may have
been fooled by the "goodness-of-fit" phrase associated with
the test.
I would do a cor.test(). But if the above is the real data,
then there probably isn't much to test; you have very little
agreement for the first 10 pairs.
-Peter Ehlers
I hope someone is able to point me in the right direction.
Many thanks,
Steve
_________________________________________________________________
Have more than one Hotmail account? Link them together to easily access both
http://clk.atdmt.com/UKM/go/186394591/direct/01/
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.