I have a fairly large data set with six variables set up like the following 
dummy:

# Create fake data
df <- data.frame(code   = c(rep(1001, 8), rep(1002, 8)),
                 year   = rep(c(rep(2011, 4), rep(2012, 4)), 2),
                 fund   = rep(c("10E", "10E", "10E", "27E"), 4),
                 func   = rep(c("110000", "122000", "214000", "158000"), 4),
                 obj    = rep("100", 16),
                 amount = round(rnorm(16, 50000, 10000)))

What I would like to do is sum the amount variable by code and year, filtering 
rows using different wildcard searches in each of three columns: "1?E" in fund, 
"1??????" in func, and "???" in obj. I'm OK turning these into regular 
expressions:

# Set parameters
par.fund <- "10E"; par.func <- "100000"; par.obj <- "000"
par.fund <- glob2rx(gsub("0", "?", par.fund))
par.func <- glob2rx(gsub("0", "?", par.func))
par.obj <- glob2rx(gsub("0", "?", par.obj))

The problem occurs when I try to apply multiple greps across columns. I'd 
prefer to use data.table since it's so much faster than plyr and I have 159 
different sets of parameters to run through, but I get the same error setting 
it up either way:

# Doesn't work
library(data.table)
dt <- data.table(df)
eval(parse(text=paste(
  "dt2 <- dt[", "grep('", par.fund, "', fund) & ",
  "grep('", par.func, "', func) & grep('", par.obj, "', obj)",
  ", sum(amount), by=c('code', 'year')]" , sep="")))
# Warning message:
#   In grep("^1.E$", fund) & grep("^1.....$", func) :
#   longer object length is not a multiple of shorter object length

# Also doesn't work
library(plyr)
eval(parse(text=paste(
  "df2 <- ddply(df[grep('", par.fund, "', df$fund) & ",
  "grep('", par.func, "', df$func) & grep('", par.obj, "', df$obj), ]",
  ", .(code, year), summarize, amount = sum(amount))" , sep="")))
# Warning message:
#   In grep("^1.E$", df$fund) & grep("^1.....$", df$func) :
#   longer object length is not a multiple of shorter object length

Clearly, the problem is how I'm trying to combine greps in subsetting rows, but 
I haven't been able to find a solution that works. Any thoughts-preferably 
something that works with data.table?

DB

Daniel Bush
School Finance Consultant
School Financial Services
Wisconsin Department of Public Instruction
PO Box 7841 | Madison, WI 53707-7841
daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov
Ph: 608-267-9212 | Fax: 608-266-2840




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to