HI, Not sure whether this helps. If you take out the grep(",par.obj,..), it works without any warning. eval(parse(text=paste( "dt2 <- dt[", "grep('", par.fund, "', fund) & ", "grep('", par.func, "', func)", ", sum(amount), by=c('code', 'year')]" , sep=""))) dt[grep('^1.E$',fund) & grep('^1.....$',func),sum(amount),by=c('code','year')] # code year V1 #1: 1001 2011 185482 #2: 1001 2012 189367 #3: 1002 2011 238098 #4: 1002 2012 211499 aggregate(amount~code+year,data=df,sum) # code year amount #1 1001 2011 185482 #2 1002 2011 238098 #3 1001 2012 189367 #4 1002 2012 211499
In the df, you provided, there is only value of obj. levels(df$obj) #[1] "100" A.K. ----- Original Message ----- From: "Bush, Daniel P. DPI" <daniel.b...@dpi.wi.gov> To: "'r-help@r-project.org'" <r-help@r-project.org> Cc: Sent: Thursday, March 14, 2013 5:43 PM Subject: [R] Grep with wildcards across multiple columns I have a fairly large data set with six variables set up like the following dummy: # Create fake data df <- data.frame(code = c(rep(1001, 8), rep(1002, 8)), year = rep(c(rep(2011, 4), rep(2012, 4)), 2), fund = rep(c("10E", "10E", "10E", "27E"), 4), func = rep(c("110000", "122000", "214000", "158000"), 4), obj = rep("100", 16), amount = round(rnorm(16, 50000, 10000))) What I would like to do is sum the amount variable by code and year, filtering rows using different wildcard searches in each of three columns: "1?E" in fund, "1??????" in func, and "???" in obj. I'm OK turning these into regular expressions: # Set parameters par.fund <- "10E"; par.func <- "100000"; par.obj <- "000" par.fund <- glob2rx(gsub("0", "?", par.fund)) par.func <- glob2rx(gsub("0", "?", par.func)) par.obj <- glob2rx(gsub("0", "?", par.obj)) The problem occurs when I try to apply multiple greps across columns. I'd prefer to use data.table since it's so much faster than plyr and I have 159 different sets of parameters to run through, but I get the same error setting it up either way: # Doesn't work library(data.table) dt <- data.table(df) eval(parse(text=paste( "dt2 <- dt[", "grep('", par.fund, "', fund) & ", "grep('", par.func, "', func) & grep('", par.obj, "', obj)", ", sum(amount), by=c('code', 'year')]" , sep=""))) # Warning message: # In grep("^1.E$", fund) & grep("^1.....$", func) : # longer object length is not a multiple of shorter object length # Also doesn't work library(plyr) eval(parse(text=paste( "df2 <- ddply(df[grep('", par.fund, "', df$fund) & ", "grep('", par.func, "', df$func) & grep('", par.obj, "', df$obj), ]", ", .(code, year), summarize, amount = sum(amount))" , sep=""))) # Warning message: # In grep("^1.E$", df$fund) & grep("^1.....$", df$func) : # longer object length is not a multiple of shorter object length Clearly, the problem is how I'm trying to combine greps in subsetting rows, but I haven't been able to find a solution that works. Any thoughts-preferably something that works with data.table? DB Daniel Bush School Finance Consultant School Financial Services Wisconsin Department of Public Instruction PO Box 7841 | Madison, WI 53707-7841 daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov Ph: 608-267-9212 | Fax: 608-266-2840 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.