Try rescaling your data prior to splitting it up into a training and test set. 
Otherwise you end up with two different ways of scaling.

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Ben Harrison
Verzonden: woensdag 24 juli 2013 11:05
Aan: r-help@r-project.org
Onderwerp: [R] Help to improve prediction from supervised mapping using kohonen 
package

I would really like some or any advice on how I can improve (or fix??) the 
following analysis. I hope I have provided a completely runnable code - it 
doesn't produce any errors for me.

The resulting plot at the end shows a pretty poor correlation (just speaking 
visually here) to the test set. How can I improve the performance of the 
mapping and prediction?

Here are some of the data (continuous, numerical):

> head(somdata)
   MEAS_TC        SP        LN        SN       GR     NEUT
1 2.780000 59.181090  33.74364  19.75361 66.57665 257.0368
2 1.490000 49.047750 184.14598 139.07980 54.75052 326.8001
3 1.490000 49.128902 183.58853 138.02768 55.54114 327.4739
4 2.201276 18.240331  19.20386  10.74748 62.04492 494.4161
5 2.201276 18.215522  19.18009  10.72446 61.87448 494.7409
6 1.276476  9.337769  14.16061  19.06902 14.99612 363.0020

Complete data set is at the following link if you fancy it:
https://gist.github.com/ottadini/6068259

The first variable is the dependent. I wish to train a som using this data, and 
then be able to predict MEAS_TC using a new set of data with missing values of 
MEAS_TC. Below I'm simply splitting the somdata into a training and a testing 
set for evaluation purposes.

# ===== #
library(kohonen)

somdata <- read.csv("somdata.csv")

# Create test and training sets from data:
inTrain <- sample(nrow(somdata), nrow(somdata)*(2/3)) training <- 
somdata[inTrain, ] testing <- somdata[-inTrain, ]

# Supervised kohonen map, where the dependent variable is MEAS_TC.
# Attempting to follow the examples in Wehrens and Buydens, 2007, 21(5), J Stat 
Soft.
# somdata[1] is the MEAS_TC variable
somX <- scale(training[-1])
somY <- training[[1]]  # Needs to return a vector # Train the map (not sure 
this is how it should be done):
tc.xyf <- xyf(data=somX, Y=somY, xweight=0.5, grid=somgrid(6, 6, "hexagonal"), 
contin=TRUE)

# Prediction with test set:
tc.xyf.prediction <- predict(tc.xyf, newdata = scale(testing[-1]))

# Basic plot:
x <- seq(nrow(testing))
plot(x, testing[, "MEAS_TC"], type="l", col="black", ylim=c(0, 3.5))
par(new=TRUE)
plot(x, tc.xyf.prediction$prediction, type="l", col="red", ylim=c(0, 3.5))

# Wow, that's terrible. Do I have something wrong?
# ===== #

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to