Hi, It seems as if the problem was caused by an odd quirk of the "scale" function.
Some of my data have NA entries. So, I substitute 0 for any NA with: rawdata[is.na(rawdata)] <- 0 I then scale the data. For some reason that I don't understand, I find some NA back in the data after the scale command. But, issuing the same 0 substitution AFTER the scale command makes everything work again. rawdata[is.na(rawdata)] <- 0 VERY strange behavior. -N On 8/2/09 3:57 PM, J Dougherty wrote: > On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote: > >> The column names have to obfuscated, but here are 10 rows of the data. >> >> label c0 c1 c2 c3 c4 c5 c6 c7 >> c8 c9 c10 c11 c12 c13 >> c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 >> c24 c25 c26 c27 >> c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 >> c38 c39 c40 c41 >> c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 >> c52 c53 c54 c55 >> c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 >> c66 >> sick 2008-12-28_1 95.609 5 3.3 1.35 0 1 >> 35 9.6666 0 0 >> 0.0833 1 0.0833 1 0.1428 7 3 2.035714286 >> 6.5 94.8481 >> 53.846 12 -4.69 1.25 0.5062 0.0522 0.1808 3 0.5126 >> 0.0694 >> 0.2061 94.9288 8.3125 0.0247 7.5833 9.3 35 9.6666 >> 0 0 >> 0.0833 1 0.0833 1 0.1428 7 3 2.035714286 >> 6.5 94.8481 >> 53.846 12 -4.69 1.25 0.5062 0.0522 0.1808 3 0.5126 >> 0.0694 >> 0.2061 94.9288 8.3125 0.0247 7.5833 9.3 >> well 2008-12-28_1 95.338 1 11 3.2 3 2 >> 11 7.0277 0.0555 2 >> 0.1666 6 0.1666 5 0.238 18 11 2.541666667 >> 2.022727273 94.7733 >> 38.461 36 6.07 7.5555 0.5928 0.0955 0.2871 0 0.5434 >> 0.0679 >> 0.2283 95.9003 5.1736 0.0847 7.3333 28 11 7.0277 >> 0.0555 2 >> 0.1666 6 0.1666 5 0.238 18 11 2.541666667 >> 2.022727273 94.7733 >> 38.461 36 6.07 7.5555 0.5928 0.0955 0.2871 0 0.5434 >> 0.0679 >> 0.2283 95.9003 5.1736 0.0847 7.3333 28 >> well 2008-12-28_1 95.204 2 7.4 2.75 4 1 >> 22 8.4545 0 0 >> 0 0 0 0 0 6 4 2.791666667 2.5625 >> 94.8444 61.538 11 2.84 >> 3.0909 0.5693 0.0641 0.2738 0 0.5874 0.1011 0.2803 94.9769 >> 8.1363 0.0467 5.4545 10 22 8.4545 0 0 0 >> 0 0 0 0 6 4 >> 2.791666667 2.5625 94.8444 61.538 11 2.84 3.0909 0.5693 >> 0.0641 >> 0.2738 0 0.5874 0.1011 0.2803 94.9769 8.1363 0.0467 >> 5.4545 10 >> sick 2008-12-28_1 95.204 14 48 >> 0 3 25 8.7045 0.0909 4 0.2045 9 0.2045 >> 4 0.2666 11 8 >> 4.409090909 0 95.0006 15.384 44 1.76 7.409 0.4475 >> 0.0285 >> 0.1206 0 0.5094 0.058 0.1931 92.9455 7.2613 0.0532 >> 4.5227 >> 82 25 8.7045 0.0909 4 0.2045 9 0.2045 4 0.2666 >> 11 8 >> 4.409090909 0 95.0006 15.384 44 1.76 7.409 0.4475 >> 0.0285 >> 0.1206 0 0.5094 0.058 0.1931 92.9455 7.2613 0.0532 >> 4.5227 82 >> well 2008-12-28_1 95.07 13 26 >> 1 1 11 8.1 0.0666 2 0.1666 5 0.1666 >> 0 0 21 16 >> 2.571428571 1.984375 94.825 30.769 30 -4.69 -0.7999 >> 0.5166 >> 0.0624 0.2078 0 0.5306 0.0792 0.2398 95.2282 7.575 >> 0.0715 >> 3.4333 44 11 8.1 0.0666 2 0.1666 5 0.1666 >> 0 0 21 16 >> 2.571428571 1.984375 94.825 30.769 30 -4.69 -0.7999 >> 0.5166 >> 0.0624 0.2078 0 0.5306 0.0792 0.2398 95.2282 7.575 >> 0.0715 >> 3.4333 44 >> well 2008-12-28_1 95.07 9 16 >> 0 4 39 9.4117 0 0 0.0588 1 0.0588 >> 0 0 3 25 3.916666667 >> 2.96 94.8177 30.769 17 -20.84 -15.8234 0.8205 >> 0.3333 0.6666 0 >> 0.6054 0.1287 0.3292 95.3232 6.9117 0.076 2.647 16 >> 39 >> 9.4117 0 0 0.0588 1 0.0588 0 0 3 >> 25 3.916666667 2.96 >> 94.8177 30.769 17 -20.84 -15.8234 0.8205 0.3333 0.6666 >> 0 >> 0.6054 0.1287 0.3292 95.3232 6.9117 0.076 2.647 16 >> sick 2008-12-28_1 94.936 6 11 >> 4 1 28 7.725 0.075 3 0.125 5 0.125 >> 0 0 6 2 4 1.75 >> 94.7815 46.153 40 6.07 12.5 0.5014 0.0621 0.1972 6 >> 0.523 >> 0.0742 0.2035 95.794 6.0625 0.046 7.25 12 28 7.725 >> 0.075 3 >> 0.125 5 0.125 0 0 6 2 4 1.75 >> 94.7815 46.153 40 6.07 12.5 >> 0.5014 0.0621 0.1972 6 0.523 0.0742 0.2035 95.794 6.0625 >> 0.046 7.25 12 >> well 2008-12-28_1 94.803 11 13 >> 0 5 35 7.125 0.0937 3 0.1562 5 0.1562 >> 5 0.2 18 17 >> 1.555555556 2.794117647 95.0398 38.461 32 10.38 8.4063 >> 0.5804 >> 0.0871 0.2627 1 0.558 0.0738 0.2324 92.4367 5.289 >> 0.0722 >> 9.125 16 35 7.125 0.0937 3 0.1562 5 0.1562 >> 5 0.2 18 17 >> 1.555555556 2.794117647 95.0398 38.461 32 10.38 8.4063 >> 0.5804 >> 0.0871 0.2627 1 0.558 0.0738 0.2324 92.4367 5.289 >> 0.0722 9.125 16 >> well 2008-12-28_1 94.67 4 38 >> 5 1 11 8.9642 0.0357 1 0.1428 4 0.1428 >> 4 0.2105 11 13 >> 3.772727273 4.307692308 94.8451 23.076 28 -5.76 -4 >> 0.3269 0 >> 0.0833 0 0.5222 0.0616 0.2079 94.9668 8.6696 0.0663 >> 4.6428 >> 14 11 8.9642 0.0357 1 0.1428 4 0.1428 4 0.2105 >> 11 13 >> 3.772727273 4.307692308 94.8451 23.076 28 -5.76 -4 >> 0.3269 0 >> 0.0833 0 0.5222 0.0616 0.2079 94.9668 8.6696 0.0663 >> 4.6428 14 >> well 2008-12-28_1 94.537 12 39 >> 0 1 35 9.4444 0 0 0 0 0 >> 0 0 2 7 2.5 2.892857143 94.878 >> 23.076 9 -12.23 -9.6666 0.4428 0 0.0857 0 >> 0.5411 0.0849 0.25 >> 94.54 8.9166 0.0296 6.1111 67 35 9.4444 0 0 >> 0 0 0 0 0 >> 2 7 2.5 2.892857143 94.878 23.076 9 -12.23 -9.6666 >> 0.4428 0 >> 0.0857 0 0.5411 0.0849 0.25 94.54 8.9166 0.0296 6.1111 >> 67 >> >> >> > Your initial post mentions 70 columns in your data table, yet the example > shows 67 counting the initial "labels" term in the header. I would suggest > adding "row.names = NULL" to force row numbers and see how that behaves, e.g. > > rawdata<- read.table("r_work/train_data.csv", header=T, sep=",", > na.strings=0, row.names = NULL) > > Otherwise, you might want to consult the R Manual where it states: > > header a logical value indicating whether the file contains the names > of the > variables as its first line. If missing, the value is > determined from the > file format: header is set to TRUE if and only if the first row > contains one > fewer field than the number of columns. > > So, you might also want to count up your column names in the header line. > > JWDougherty > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.