Input: dataframe with 300+columns for a regression. It consists of sets of factors whose names have the same structure. For example, aa1,aa2,aa3 could be one set of factors.
After reading in the dataframe, I would like to compute the density (%nonzeroes) for certain groups of factors and delete the factors which are below the density threshold. I would like to use regular expressions to specify the factor names. density.factor = c("^aaa","^bbb") density.faccol=c() for(fac in density.factor){ density.faccol=c(density.faccol,grep(fac,names(data.df))) } data.df=data.df[,-density.faccol] Is there a way to avoid the for loop? The following seems to work: lapply(density.factor,grep,names(data.df)) However, that produces a list of lists which need to be merged. Note that in the above example since we have 2 regular expressions, there will be two lists but in the general case there will be many more. Questions (i) how do I merge the lists into a single list (ii) is there a better way to achieve the "vectorized" grep? Thanks. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.