Hi,
It seems as if the problem was caused by an odd quirk of the "scale"
function.
Some of my data have NA entries.
So, I substitute 0 for any NA with:
rawdata[is.na(rawdata)] <- 0
I then scale the data.
For some reason that I don't understand, I find some NA back in the data
after the scale command.
But, issuing the same 0 substitution AFTER the scale command makes
everything work again.
rawdata[is.na(rawdata)] <- 0
VERY strange behavior.
-N
On 8/2/09 3:57 PM, J Dougherty wrote:
> On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote:
>
>> The column names have to obfuscated, but here are 10 rows of the data.
>>
>> label c0 c1 c2 c3 c4 c5 c6 c7
>> c8 c9 c10 c11 c12 c13
>> c14 c15 c16 c17 c18 c19 c20 c21 c22 c23
>> c24 c25 c26 c27
>> c28 c29 c30 c31 c32 c33 c34 c35 c36 c37
>> c38 c39 c40 c41
>> c42 c43 c44 c45 c46 c47 c48 c49 c50 c51
>> c52 c53 c54 c55
>> c56 c57 c58 c59 c60 c61 c62 c63 c64 c65
>> c66
>> sick 2008-12-28_1 95.609 5 3.3 1.35 0 1
>> 35 9.6666 0 0
>> 0.0833 1 0.0833 1 0.1428 7 3 2.035714286
>> 6.5 94.8481
>> 53.846 12 -4.69 1.25 0.5062 0.0522 0.1808 3 0.5126
>> 0.0694
>> 0.2061 94.9288 8.3125 0.0247 7.5833 9.3 35 9.6666
>> 0 0
>> 0.0833 1 0.0833 1 0.1428 7 3 2.035714286
>> 6.5 94.8481
>> 53.846 12 -4.69 1.25 0.5062 0.0522 0.1808 3 0.5126
>> 0.0694
>> 0.2061 94.9288 8.3125 0.0247 7.5833 9.3
>> well 2008-12-28_1 95.338 1 11 3.2 3 2
>> 11 7.0277 0.0555 2
>> 0.1666 6 0.1666 5 0.238 18 11 2.541666667
>> 2.022727273 94.7733
>> 38.461 36 6.07 7.5555 0.5928 0.0955 0.2871 0 0.5434
>> 0.0679
>> 0.2283 95.9003 5.1736 0.0847 7.3333 28 11 7.0277
>> 0.0555 2
>> 0.1666 6 0.1666 5 0.238 18 11 2.541666667
>> 2.022727273 94.7733
>> 38.461 36 6.07 7.5555 0.5928 0.0955 0.2871 0 0.5434
>> 0.0679
>> 0.2283 95.9003 5.1736 0.0847 7.3333 28
>> well 2008-12-28_1 95.204 2 7.4 2.75 4 1
>> 22 8.4545 0 0
>> 0 0 0 0 0 6 4 2.791666667 2.5625
>> 94.8444 61.538 11 2.84
>> 3.0909 0.5693 0.0641 0.2738 0 0.5874 0.1011 0.2803 94.9769
>> 8.1363 0.0467 5.4545 10 22 8.4545 0 0 0
>> 0 0 0 0 6 4
>> 2.791666667 2.5625 94.8444 61.538 11 2.84 3.0909 0.5693
>> 0.0641
>> 0.2738 0 0.5874 0.1011 0.2803 94.9769 8.1363 0.0467
>> 5.4545 10
>> sick 2008-12-28_1 95.204 14 48
>> 0 3 25 8.7045 0.0909 4 0.2045 9 0.2045
>> 4 0.2666 11 8
>> 4.409090909 0 95.0006 15.384 44 1.76 7.409 0.4475
>> 0.0285
>> 0.1206 0 0.5094 0.058 0.1931 92.9455 7.2613 0.0532
>> 4.5227
>> 82 25 8.7045 0.0909 4 0.2045 9 0.2045 4 0.2666
>> 11 8
>> 4.409090909 0 95.0006 15.384 44 1.76 7.409 0.4475
>> 0.0285
>> 0.1206 0 0.5094 0.058 0.1931 92.9455 7.2613 0.0532
>> 4.5227 82
>> well 2008-12-28_1 95.07 13 26
>> 1 1 11 8.1 0.0666 2 0.1666 5 0.1666
>> 0 0 21 16
>> 2.571428571 1.984375 94.825 30.769 30 -4.69 -0.7999
>> 0.5166
>> 0.0624 0.2078 0 0.5306 0.0792 0.2398 95.2282 7.575
>> 0.0715
>> 3.4333 44 11 8.1 0.0666 2 0.1666 5 0.1666
>> 0 0 21 16
>> 2.571428571 1.984375 94.825 30.769 30 -4.69 -0.7999
>> 0.5166
>> 0.0624 0.2078 0 0.5306 0.0792 0.2398 95.2282 7.575
>> 0.0715
>> 3.4333 44
>> well 2008-12-28_1 95.07 9 16
>> 0 4 39 9.4117 0 0 0.0588 1 0.0588
>> 0 0 3 25 3.916666667
>> 2.96 94.8177 30.769 17 -20.84 -15.8234 0.8205
>> 0.3333 0.6666 0
>> 0.6054 0.1287 0.3292 95.3232 6.9117 0.076 2.647 16
>> 39
>> 9.4117 0 0 0.0588 1 0.0588 0 0 3
>> 25 3.916666667 2.96
>> 94.8177 30.769 17 -20.84 -15.8234 0.8205 0.3333 0.6666 >> 0
>> 0.6054 0.1287 0.3292 95.3232 6.9117 0.076 2.647 16
>> sick 2008-12-28_1 94.936 6 11
>> 4 1 28 7.725 0.075 3 0.125 5 0.125
>> 0 0 6 2 4 1.75
>> 94.7815 46.153 40 6.07 12.5 0.5014 0.0621 0.1972 6
>> 0.523
>> 0.0742 0.2035 95.794 6.0625 0.046 7.25 12 28 7.725
>> 0.075 3
>> 0.125 5 0.125 0 0 6 2 4 1.75
>> 94.7815 46.153 40 6.07 12.5
>> 0.5014 0.0621 0.1972 6 0.523 0.0742 0.2035 95.794 6.0625
>> 0.046 7.25 12
>> well 2008-12-28_1 94.803 11 13
>> 0 5 35 7.125 0.0937 3 0.1562 5 0.1562
>> 5 0.2 18 17
>> 1.555555556 2.794117647 95.0398 38.461 32 10.38 8.4063
>> 0.5804
>> 0.0871 0.2627 1 0.558 0.0738 0.2324 92.4367 5.289
>> 0.0722
>> 9.125 16 35 7.125 0.0937 3 0.1562 5 0.1562
>> 5 0.2 18 17
>> 1.555555556 2.794117647 95.0398 38.461 32 10.38 8.4063
>> 0.5804
>> 0.0871 0.2627 1 0.558 0.0738 0.2324 92.4367 5.289
>> 0.0722 9.125 16
>> well 2008-12-28_1 94.67 4 38
>> 5 1 11 8.9642 0.0357 1 0.1428 4 0.1428
>> 4 0.2105 11 13
>> 3.772727273 4.307692308 94.8451 23.076 28 -5.76 -4
>> 0.3269 0
>> 0.0833 0 0.5222 0.0616 0.2079 94.9668 8.6696 0.0663
>> 4.6428
>> 14 11 8.9642 0.0357 1 0.1428 4 0.1428 4 0.2105
>> 11 13
>> 3.772727273 4.307692308 94.8451 23.076 28 -5.76 -4
>> 0.3269 0
>> 0.0833 0 0.5222 0.0616 0.2079 94.9668 8.6696 0.0663
>> 4.6428 14
>> well 2008-12-28_1 94.537 12 39
>> 0 1 35 9.4444 0 0 0 0 0
>> 0 0 2 7 2.5 2.892857143 94.878
>> 23.076 9 -12.23 -9.6666 0.4428 0 0.0857 0
>> 0.5411 0.0849 0.25
>> 94.54 8.9166 0.0296 6.1111 67 35 9.4444 0 0
>> 0 0 0 0 0
>> 2 7 2.5 2.892857143 94.878 23.076 9 -12.23 -9.6666
>> 0.4428 0
>> 0.0857 0 0.5411 0.0849 0.25 94.54 8.9166 0.0296 6.1111
>> 67
>>
>>
>>
> Your initial post mentions 70 columns in your data table, yet the example
> shows 67 counting the initial "labels" term in the header. I would suggest
> adding "row.names = NULL" to force row numbers and see how that behaves, e.g.
>
> rawdata<- read.table("r_work/train_data.csv", header=T, sep=",",
> na.strings=0, row.names = NULL)
>
> Otherwise, you might want to consult the R Manual where it states:
>
> header a logical value indicating whether the file contains the names
> of the
> variables as its first line. If missing, the value is
> determined from the
> file format: header is set to TRUE if and only if the first row
> contains one
> fewer field than the number of columns.
>
> So, you might also want to count up your column names in the header line.
>
> JWDougherty
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.