Good day,

Sometimes, sparse.model.matrix outputs a dgCMatrix which has column names 
consisting of factor levels that were not in the original dataset. The first 
factor appears to be correctly transformed, but the following factors don't. 
For example:

diamonds <- as.data.frame(ggplot2::diamonds)
> colnames(sparse.model.matrix(~ . -1, diamonds))
 [1] "carat"        "cutFair"      "cutGood"      "cutVery Good" "cutPremium"   
"cutIdeal"     "color.L"      "color.Q"      "color.C"      "color^4"      
"color^5"     
[12] "color^6"      "clarity.L"    "clarity.Q"    "clarity.C"    "clarity^4"    
"clarity^5"    "clarity^6"    "clarity^7"    "depth"        "table"        
"price"       
[23] "x"            "y"            "z"

The variables color and clarity don't have factor levels which have been 
suffixed to them in the transformed matrix. The values in those columns are 
also wrong. Changing the Ord.factor columns into simply being factors fixes the 
problem. 

> diamonds[, "cut"] <- factor(as.character(diamonds[, "cut"]))
> diamonds[, "color"] <- factor(as.character(diamonds[, "color"]))
> diamonds[, "clarity"] <- factor(as.character(diamonds[, "clarity"]))

> colnames(sparse.model.matrix(~ . -1, diamonds)) # No more invented factor 
> levels.
 [1] "carat"        "cutFair"      "cutGood"      "cutIdeal"     "cutPremium"   
"cutVery Good" "colorE"       "colorF"       "colorG"       "colorH"      
[11] "colorI"       "colorJ"       "clarityIF"    "claritySI1"   "claritySI2"   
"clarityVS1"   "clarityVS2"   "clarityVVS1"  "clarityVVS2"  "depth"       
[21] "table"        "price"        "x"            "y"            "z"

Can it be made to work correctly for both plain and ordered factors?

> sessionInfo()
R Under development (unstable) (2018-02-06 r74231)
Platform: i386-w64-mingw32/i386 (32-bit)

other attached packages:
[1] Matrix_1.2-12

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.5.0     compiler_3.5.0   lazyeval_0.2.1  
 [5] plyr_1.8.4       pillar_1.1.0     gtable_0.2.0     tibble_1.4.2    
 [9] Rcpp_0.12.15     ggplot2_2.2.1    grid_3.5.0       rlang_0.1.6     
[13] munsell_0.4.3    lattice_0.20-35

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to