I would really like some or any advice on how I can improve (or fix??) the following analysis. I hope I have provided a completely runnable code - it doesn't produce any errors for me.
The resulting plot at the end shows a pretty poor correlation (just speaking visually here) to the test set. How can I improve the performance of the mapping and prediction? Here are some of the data (continuous, numerical): > head(somdata) MEAS_TC SP LN SN GR NEUT 1 2.780000 59.181090 33.74364 19.75361 66.57665 257.0368 2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001 3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739 4 2.201276 18.240331 19.20386 10.74748 62.04492 494.4161 5 2.201276 18.215522 19.18009 10.72446 61.87448 494.7409 6 1.276476 9.337769 14.16061 19.06902 14.99612 363.0020 Complete data set is at the following link if you fancy it: https://gist.github.com/ottadini/6068259 The first variable is the dependent. I wish to train a som using this data, and then be able to predict MEAS_TC using a new set of data with missing values of MEAS_TC. Below I'm simply splitting the somdata into a training and a testing set for evaluation purposes. # ===== # library(kohonen) somdata <- read.csv("somdata.csv") # Create test and training sets from data: inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3)) training <- somdata[inTrain, ] testing <- somdata[-inTrain, ] # Supervised kohonen map, where the dependent variable is MEAS_TC. # Attempting to follow the examples in Wehrens and Buydens, 2007, 21(5), J Stat Soft. # somdata[1] is the MEAS_TC variable somX <- scale(training[-1]) somY <- training[[1]] # Needs to return a vector # Train the map (not sure this is how it should be done): tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6, "hexagonal"), contin=TRUE) # Prediction with test set: tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1])) # Basic plot: x <- seq(nrow(testing)) plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5)) par(new=TRUE) plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5)) # Wow, that's terrible. Do I have something wrong? # ===== # ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.