Hi,
While learning how to implement XGBoost in R I came across below case and want 
to know how to go about it.

Outcome variable: continous
independent features: mix of categorical and continuous 
nrow(train_set): 8523

Since, XGBoost natively supports only numeric features, I applied one hot 
encoding on the training data set:

target <- train_set$Outlet_sales
sparsed_train_set <- sparse.model.matrix(~.-1, data=train_set)

nrow(sparsed_train_set) : 4526 #As expected, the row count is reduced.

Note: The target variable is continuous and has as many rows as in train_set 
i.e 8523, before one hot encoding is applied.

# To build mode:
bst <- xgboost(data = sparsed_train_set, label = target, max.depth = 4,
               eta = 1, nthread = 4, nround = 50, objective=reg:linear)

# Above execution would fail as 

My questions:
- How should I handle above disparity between sparsed training data and label  
while building the model ?
- How should I use XGBoost to perform regression where outcome is continuous ? 
Most of the web portals refers to the cases related to classification.
  If any could lead me to the source explaining this. I have gone through the 
documentation but not much cleared in this case.

Regards,
Sandeep S. Rana















        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to