Hello,

I would like to reinforce my anova results using PCA i.e. which factor are most 
important because they explain most of the variance (i.e. signal) of my 2^k*r 
experiment. However, I get the following error while trying to run PCA:

> throughput.prcomp <- 
> prcomp(~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
Error in prcomp.formula(~No_databases + Partitioning + No_middlewares +  : 
  PCA applies only to numerical variables

What is the most R-like concise way to map/transform those factor values into 
numerical values in a suitable way for PCA analysis? My first attempt would be:

# C++ "style"
throughput$No_databases_num <- (throughput$No_databases == 1) ? -1 : 1 
throughput$Partitioning_num <- (throughput$Partitioning == "sharding") ? -1 : 1 
etc.
How can I do this in the R way?

Would these -1, 1 be sensible for a PCA analysis or it just doesn't matter? How 
about a factor for which I have 3 levels? -1, 0 and 1? 

Many thanks in advance,
Best regards,
Giovanni
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to