Input: dataframe with 300+columns for a regression. It consists of sets of 
factors whose names have the same structure. For example, aa1,aa2,aa3 could be 
one set of factors.

After reading in the dataframe, I would like to compute the density 
(%nonzeroes) for certain groups of factors and delete the factors which are 
below the density threshold. I would like to use regular expressions to specify 
the factor names.

density.factor = c("^aaa","^bbb")
density.faccol=c()
for(fac in density.factor){
    density.faccol=c(density.faccol,grep(fac,names(data.df)))
}
data.df=data.df[,-density.faccol]

Is there a way to avoid the for loop? The following seems to work:
  lapply(density.factor,grep,names(data.df))
However, that produces a list of lists which need to be merged. Note that in 
the above example since we have 2 regular expressions, there will be two lists 
but in the general case there will be many more.

Questions (i) how do I merge the lists into a single list (ii) is there a 
better way to achieve the "vectorized" grep?

Thanks.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to