I am new to R, so I am sure I am making a simple mistake. I am including
complete information in hopes
someone can help me.
Basically my data in R looks good, I write it to a file, and every value is off
by 1.
Here is my flow:
str(prediction)
Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ...
- attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ...
print(prediction)
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23
2 0 9 9 3 7 0 3 0 3 5 7 4
0 4 3 3 1 9 0 9 1 1
ok, so it shows my values are 2, 0, 9, 9, 3 etc
# I write my file out
write(prediction, file="prediction.csv")
# look at the first 10 values
$ head -10 prediction.csv
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8
The complete work of what I did was as follows:
# First I load in a dataset, label the first column as a factor
dataset <- read.csv('train.csv',head=TRUE)
dataset$label <- as.factor(dataset$label)
# it has 42000 obs. 785 variables
str(dataset)
'data.frame': 42000 obs. of 785 variables:
$ label : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ...
$ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ...
$ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ...
[list output truncated]
# I make a sampling testset and trainset
index <- 1:nrow(dataset)
testindex <- sample(index, trunc(length(index)*30/100))
testset <- dataset[testindex,]
trainset <- dataset[-testindex,]
# build model, predict, view
model <- svm(label~., data = trainset, type="C-classification",
kernel="radial", gamma=0.0000001, cost=16)
prediction <- predict(model, testset)
tab <- table(pred = prediction, true = testset[,1])
true
pred 0 1 2 3 4 5 6 7 8 9
0 1210 0 3 1 0 5 7 2 5 8
1 0 1415 2 0 2 1 0 7 5 0
2 0 2 1127 12 3 0 2 7 2 0
3 0 0 7 1296 0 10 0 2 15 6
4 1 1 8 2 1201 2 4 3 5 16
5 3 1 0 13 0 1100 3 1 2 3
6 3 0 3 0 5 9 1263 0 1 0
7 0 2 9 6 6 1 0 1296 1 13
8 3 5 7 11 1 2 0 2 1190 4
9 1 1 2 3 17 2 0 4 4 1190
Ok everything looks great up to this point..........so I try to apply my model to a
"real" testset, which is the same format as my previous
dataset, except it does not have the label/factor column, so its 28000 obs 784
variables:
testset <- read.csv('test.csv',head=TRUE)
str(testset)
'data.frame': 28000 obs. of 784 variables:
$ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ...
$ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ...
[list output truncated]
prediction <- predict(model, testset)
summary(prediction)
0 1 2 3 4 5 6 7 8 9
2780 3204 2824 2767 2771 2516 2744 2898 2736 2760
print(prediction)
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23
2 0 9 9 3 7 0 3 0 3 5 7 4
0 4 3 3 1 9 0 9 1 1
24 25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46
5 7 4 2 7 4 7 7 5 4 2 6 2
5 5 1 6 7 7 4 9 8 7
[list output truncated]
write(prediction, file="prediction.csv")
$ head -10 prediction.csv
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8
I am obviously making a mistake. Everything is off by a value of 1.
Can someone tell me what I am doing wrong?
Brian
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.