Hi: Here's a data.table solution. After I read in your data as a data frame named dd, I used str() to check its contents: > str(dd) 'data.frame': 19 obs. of 5 variables: $ RID : int 43 95 230 230 235 235 247 247 321 321 ... $ SCRNO : Factor w/ 6 levels "HBA0020036","HBA0020087",..: 1 2 3 3 4 4 5 5 6 6 ... $ VISCODE : Factor w/ 1 level "bl": 1 1 1 1 1 1 1 1 1 1 ... $ RECNO : int 1 1 2 1 2 1 1 2 13 5 ... $ CONTTIME: int 9 3 3 28 5 6 5 4 2 13 ...
If you were getting CONTTIME as a factor, I'm guessing you put all of this into a matrix (cbind?) and then read it into data.table. If so, you need to spend a little time reading up on the differences between matrices and data frames. A data table is meant to be a generalization of a data frame. It's important that you know the classes of your objects and how to coerce them from one class to another if necessary. That aside, > library(data.table) data.table 1.6 Quick start guide : vignette("datatable-intro") Homepage : http://datatable.r-forge.r-project.org/ Help : help("data.table") or ?data.table (includes fast start examples) > dt <- data.table(dd, key = 'SCRNO') > dt[, list(csum = sum(CONTTIME)), by = SCRNO] SCRNO csum [1,] HBA0020036 9 [2,] HBA0020087 3 [3,] HBA0020209 31 [4,] HBA0020213 11 [5,] HBA0020222 9 [6,] HBA0020292 70 Using the list() wrapper is useful, especially if you want to output multiple variables or if you want to assign a name to the derived summary variable. HTH, Dennis On Thu, Jun 30, 2011 at 9:20 AM, Edgar Alminar <eaalmi...@ucsd.edu> wrote: >>> I did this: >>> >>> library(data.table) >>> >>> dd <- data.table(bl) >>> dd[,sum(as.integer(CONTTIME)), by = SCRNO] >>> >>> (I used as.integer because I got an error message: sum not meaningful for >>> factors) >>> >>> And got this: >>> >>> SCRNO V1 >>> [1,] HBA0020036 111 >>> [2,] HBA0020087 71 >>> [3,] HBA0020209 140 >>> [4,] HBA0020213 189 >>> [5,] HBA0020222 174 >>> [6,] HBA0020292 747 >>> [7,] HBA0020310 57 >>> [8,] HBA0020317 291 >>> [9,] HBA0020365 417 >>> [10,] HBA0020366 124 >>> >>> All the sums are way too big. Is there something making it not add up >>> correctly? >>> >>> Original dataset: >>> > RID SCRNO VISCODE RECNO CONTTIME > 338 43 HBA0020036 bl 1 9 > 1187 95 HBA0020087 bl 1 3 > 3251 230 HBA0020209 bl 2 3 > 3258 230 HBA0020209 bl 1 28 > 3321 235 HBA0020213 bl 2 5 > 3351 235 HBA0020213 bl 1 6 > 3436 247 HBA0020222 bl 1 5 > 3456 247 HBA0020222 bl 2 4 > 4569 321 HBA0020292 bl 13 2 > 4572 321 HBA0020292 bl 5 13 > 4573 321 HBA0020292 bl 1 25 > 4576 321 HBA0020292 bl 7 5 > 4578 321 HBA0020292 bl 8 2 > 4581 321 HBA0020292 bl 4 4 > 4582 321 HBA0020292 bl 9 5 > 4586 321 HBA0020292 bl 12 2 > 4587 321 HBA0020292 bl 6 2 > 4590 321 HBA0020292 bl 10 3 > 4591 321 HBA0020292 bl 11 7 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.