Hi,

It seems as if the problem was caused by an odd quirk of the "scale" 
function.

Some of my data have NA entries.

So, I substitute 0 for any NA with:
rawdata[is.na(rawdata)] <- 0

I then scale the data.

For some reason that I don't understand, I find some NA back in the data 
after the scale command.
But, issuing the same 0 substitution AFTER the scale command makes 
everything work again.
rawdata[is.na(rawdata)] <- 0


VERY strange behavior.

-N



On 8/2/09 3:57 PM, J Dougherty wrote:
> On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote:
>    
>> The column names have to obfuscated, but here are 10 rows of the data.
>>
>> label        c0      c1      c2      c3      c4      c5      c6      c7      
>> c8      c9      c10     c11     c12     c13
>> c14  c15     c16     c17     c18     c19     c20     c21     c22     c23     
>> c24     c25     c26     c27
>> c28  c29     c30     c31     c32     c33     c34     c35     c36     c37     
>> c38     c39     c40     c41
>> c42  c43     c44     c45     c46     c47     c48     c49     c50     c51     
>> c52     c53     c54     c55
>> c56  c57     c58     c59     c60     c61     c62     c63     c64     c65     
>> c66
>> sick         2008-12-28_1    95.609  5       3.3     1.35    0       1       
>> 35      9.6666  0       0
>> 0.0833       1       0.0833  1       0.1428  7       3       2.035714286     
>> 6.5     94.8481
>> 53.846       12      -4.69   1.25    0.5062  0.0522  0.1808  3       0.5126  
>> 0.0694
>> 0.2061       94.9288         8.3125  0.0247  7.5833  9.3     35      9.6666  
>> 0       0
>> 0.0833       1       0.0833  1       0.1428  7       3       2.035714286     
>> 6.5     94.8481
>> 53.846       12      -4.69   1.25    0.5062  0.0522  0.1808  3       0.5126  
>> 0.0694
>> 0.2061       94.9288         8.3125  0.0247  7.5833  9.3
>> well         2008-12-28_1    95.338  1       11      3.2     3       2       
>> 11      7.0277  0.0555  2
>> 0.1666       6       0.1666  5       0.238   18      11      2.541666667     
>> 2.022727273     94.7733
>> 38.461       36      6.07    7.5555  0.5928  0.0955  0.2871  0       0.5434  
>> 0.0679
>> 0.2283       95.9003         5.1736  0.0847  7.3333  28      11      7.0277  
>> 0.0555  2
>> 0.1666       6       0.1666  5       0.238   18      11      2.541666667     
>> 2.022727273     94.7733
>> 38.461       36      6.07    7.5555  0.5928  0.0955  0.2871  0       0.5434  
>> 0.0679
>> 0.2283       95.9003         5.1736  0.0847  7.3333  28
>> well         2008-12-28_1    95.204  2       7.4     2.75    4       1       
>> 22      8.4545  0       0
>> 0    0       0       0       0       6       4       2.791666667     2.5625  
>> 94.8444         61.538  11      2.84
>> 3.0909       0.5693  0.0641  0.2738  0       0.5874  0.1011  0.2803  94.9769
>> 8.1363       0.0467  5.4545  10      22      8.4545  0       0       0       
>> 0       0       0       0       6       4
>> 2.791666667  2.5625  94.8444         61.538  11      2.84    3.0909  0.5693  
>> 0.0641
>> 0.2738       0       0.5874  0.1011  0.2803  94.9769         8.1363  0.0467  
>> 5.4545  10
>> sick         2008-12-28_1    95.204  14      48
>>      0       3       25      8.7045  0.0909  4       0.2045  9       0.2045  
>> 4       0.2666  11      8
>> 4.409090909  0       95.0006         15.384  44      1.76    7.409   0.4475  
>> 0.0285
>> 0.1206       0       0.5094  0.058   0.1931  92.9455         7.2613  0.0532  
>> 4.5227
>> 82   25      8.7045  0.0909  4       0.2045  9       0.2045  4       0.2666  
>> 11      8
>> 4.409090909  0       95.0006         15.384  44      1.76    7.409   0.4475  
>> 0.0285
>> 0.1206       0       0.5094  0.058   0.1931  92.9455         7.2613  0.0532  
>> 4.5227  82
>> well         2008-12-28_1    95.07   13      26
>>      1       1       11      8.1     0.0666  2       0.1666  5       0.1666  
>> 0       0       21      16
>> 2.571428571  1.984375        94.825  30.769  30      -4.69   -0.7999         
>> 0.5166
>> 0.0624       0.2078  0       0.5306  0.0792  0.2398  95.2282         7.575   
>> 0.0715
>> 3.4333       44      11      8.1     0.0666  2       0.1666  5       0.1666  
>> 0       0       21      16
>> 2.571428571  1.984375        94.825  30.769  30      -4.69   -0.7999         
>> 0.5166
>> 0.0624       0.2078  0       0.5306  0.0792  0.2398  95.2282         7.575   
>> 0.0715
>> 3.4333       44
>> well         2008-12-28_1    95.07   9       16
>>      0       4       39      9.4117  0       0       0.0588  1       0.0588  
>> 0       0       3       25      3.916666667
>> 2.96         94.8177         30.769  17      -20.84  -15.8234        0.8205  
>> 0.3333  0.6666  0
>> 0.6054       0.1287  0.3292  95.3232         6.9117  0.076   2.647   16      
>> 39
>> 9.4117       0       0       0.0588  1       0.0588  0       0       3       
>> 25      3.916666667     2.96
>> 94.8177      30.769  17      -20.84  -15.8234        0.8205  0.3333  0.6666  >> 0
>> 0.6054       0.1287  0.3292  95.3232         6.9117  0.076   2.647   16
>> sick         2008-12-28_1    94.936  6       11
>>      4       1       28      7.725   0.075   3       0.125   5       0.125   
>> 0       0       6       2       4       1.75
>> 94.7815      46.153  40      6.07    12.5    0.5014  0.0621  0.1972  6       
>> 0.523
>> 0.0742       0.2035  95.794  6.0625  0.046   7.25    12      28      7.725   
>> 0.075   3
>> 0.125        5       0.125   0       0       6       2       4       1.75    
>> 94.7815         46.153  40      6.07    12.5
>> 0.5014       0.0621  0.1972  6       0.523   0.0742  0.2035  95.794  6.0625
>> 0.046        7.25    12
>> well         2008-12-28_1    94.803  11      13
>>      0       5       35      7.125   0.0937  3       0.1562  5       0.1562  
>> 5       0.2     18      17
>> 1.555555556  2.794117647     95.0398         38.461  32      10.38   8.4063  
>> 0.5804
>> 0.0871       0.2627  1       0.558   0.0738  0.2324  92.4367         5.289   
>> 0.0722
>> 9.125        16      35      7.125   0.0937  3       0.1562  5       0.1562  
>> 5       0.2     18      17
>> 1.555555556  2.794117647     95.0398         38.461  32      10.38   8.4063  
>> 0.5804
>> 0.0871       0.2627  1       0.558   0.0738  0.2324  92.4367         5.289   
>> 0.0722  9.125   16
>> well         2008-12-28_1    94.67   4       38
>>      5       1       11      8.9642  0.0357  1       0.1428  4       0.1428  
>> 4       0.2105  11      13
>> 3.772727273  4.307692308     94.8451         23.076  28      -5.76   -4      
>> 0.3269  0
>> 0.0833       0       0.5222  0.0616  0.2079  94.9668         8.6696  0.0663  
>> 4.6428
>> 14   11      8.9642  0.0357  1       0.1428  4       0.1428  4       0.2105  
>> 11      13
>> 3.772727273  4.307692308     94.8451         23.076  28      -5.76   -4      
>> 0.3269  0
>> 0.0833       0       0.5222  0.0616  0.2079  94.9668         8.6696  0.0663  
>> 4.6428  14
>> well         2008-12-28_1    94.537  12      39
>>      0       1       35      9.4444  0       0       0       0       0       
>> 0       0       2       7       2.5     2.892857143     94.878
>> 23.076       9       -12.23  -9.6666         0.4428  0       0.0857  0       
>> 0.5411  0.0849  0.25
>> 94.54        8.9166  0.0296  6.1111  67      35      9.4444  0       0       
>> 0       0       0       0       0
>> 2    7       2.5     2.892857143     94.878  23.076  9       -12.23  -9.6666 
>>         0.4428  0
>> 0.0857       0       0.5411  0.0849  0.25    94.54   8.9166  0.0296  6.1111  
>> 67
>>
>>
>>      
> Your initial post mentions 70 columns in your data table, yet the example
> shows 67 counting the initial "labels" term in the header.  I would suggest
> adding "row.names = NULL" to force row numbers and see how that behaves, e.g.
>
> rawdata<- read.table("r_work/train_data.csv", header=T, sep=",",
>                       na.strings=0, row.names = NULL)
>
> Otherwise, you might want to consult the R Manual where it states:
>
> header        a logical value indicating whether the file contains the names 
> of the   
>               variables as its first line. If missing, the value is 
> determined from the                       
>               file format: header is set to TRUE if and only if the first row 
> contains one
>               fewer field than the number of columns.
>
> So, you might also want to count up your column names in the header line.
>
> JWDougherty
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>    

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to