Re: [R] data after write() is off by 1 ?

Rui Barradas Tue, 20 Nov 2012 11:54:57 -0800

Hello,

You are seeing the levels of a factor but saving its values. Internally,factors are coded as consecutive integers starting at 1, and that's whatis saved to file using write.table. To have the levels "0", "1", etc andnot the corresponding values 1, 2, etc, try


levels(prediction)[prediction]

or

as.integer(levels(prediction)[prediction])


Hope this helps,

Rui Barradas
Em 20-11-2012 19:30, Brian Feeny escreveu:

I am new to R, so I am sure I am making a simple mistake.  I am including 
complete information in hopes
someone can help me.

Basically my data in R looks good, I write it to a file, and every value is off 
by 1.

Here is my flow:

str(prediction)

  Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ...
  - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ...

print(prediction)

     1     2     3     4     5     6     7     8     9    10    11    12    13  
  14    15    16    17    18    19    20    21    22    23
     2     0     9     9     3     7     0     3     0     3     5     7     4  
   0     4     3     3     1     9     0     9     1     1

ok, so it shows my values are 2, 0, 9, 9, 3 etc

# I write my file out
write(prediction, file="prediction.csv")

# look at the first 10 values
$ head -10 prediction.csv
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8

The complete work of what I did was as follows:

# First I load in a dataset, label the first column as a factor

dataset <- read.csv('train.csv',head=TRUE)
dataset$label <- as.factor(dataset$label)

# it has 42000 obs. 785 variables

str(dataset)

'data.frame':   42000 obs. of  785 variables:
  $ label   : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ...
  $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
  $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
  $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
   [list output truncated]

# I make a sampling testset and trainset

index <- 1:nrow(dataset)
testindex <- sample(index, trunc(length(index)*30/100))
testset <- dataset[testindex,]
trainset <- dataset[-testindex,]

# build model, predict, view

model  <- svm(label~., data = trainset, type="C-classification", 
kernel="radial", gamma=0.0000001, cost=16)
prediction <- predict(model, testset)
tab <- table(pred = prediction, true = testset[,1])

     true
pred    0    1    2    3    4    5    6    7    8    9
    0 1210    0    3    1    0    5    7    2    5    8
    1    0 1415    2    0    2    1    0    7    5    0
    2    0    2 1127   12    3    0    2    7    2    0
    3    0    0    7 1296    0   10    0    2   15    6
    4    1    1    8    2 1201    2    4    3    5   16
    5    3    1    0   13    0 1100    3    1    2    3
    6    3    0    3    0    5    9 1263    0    1    0
    7    0    2    9    6    6    1    0 1296    1   13
    8    3    5    7   11    1    2    0    2 1190    4
    9    1    1    2    3   17    2    0    4    4 1190


Ok everything looks great up to this point..........so I try to apply my model to a 
"real" testset, which is the same format as my previous
dataset, except it does not have the label/factor column, so its 28000 obs 784 
variables:

testset <- read.csv('test.csv',head=TRUE)
str(testset)

'data.frame':   28000 obs. of  784 variables:
  $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
  $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
  $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
   [list output truncated]

prediction <- predict(model, testset)
summary(prediction)

    0    1    2    3    4    5    6    7    8    9
2780 3204 2824 2767 2771 2516 2744 2898 2736 2760

print(prediction)

     1     2     3     4     5     6     7     8     9    10    11    12    13  
  14    15    16    17    18    19    20    21    22    23
     2     0     9     9     3     7     0     3     0     3     5     7     4  
   0     4     3     3     1     9     0     9     1     1
    24    25    26    27    28    29    30    31    32    33    34    35    36  
  37    38    39    40    41    42    43    44    45    46
     5     7     4     2     7     4     7     7     5     4     2     6     2  
   5     5     1     6     7     7     4     9     8     7
   [list output truncated]

write(prediction, file="prediction.csv")

$ head -10 prediction.csv
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8


I am obviously making a mistake.  Everything is off by a value of 1.


Can someone tell me what I am doing wrong?

Brian



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data after write() is off by 1 ?

Reply via email to