On Jun 16, 2015, at 11:18 AM, Clint Bowman wrote: > Thanks, Dimitri. Burt is the real wizard here--I'll bet he can conjure up an > elegant solution.
This would be base method: > by( md[-4]==5, md[4], colSums) device: 1 a b c 1 2 0 ----------------------------------------------------- device: 2 a b c 1 1 0 ----------------------------------------------------- device: 3 a b c 1 0 2 You could adapt that to use myvars: > by(md[myvars]==5, md[!names(md) %in% myvars],colSums) device: 1 a b c 1 2 0 ----------------------------------------------------- device: 2 a b c 1 1 0 ----------------------------------------------------- device: 3 a b c 1 0 2 And if you want them smushed into a matrix then use rbind: > do.call( rbind, by(md[myvars]==5, md[!names(md) %in% myvars],colSums)) a b c 1 1 2 0 2 1 1 0 3 1 0 2 > > For me, just reaching a desired endpoint is enough<g>. > > Clint > > Clint Bowman INTERNET: cl...@ecy.wa.gov > Air Quality Modeler INTERNET: cl...@math.utah.edu > Department of Ecology VOICE: (360) 407-6815 > PO Box 47600 FAX: (360) 407-7534 > Olympia, WA 98504-7600 > > USPS: PO Box 47600, Olympia, WA 98504-7600 > Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 > > On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: > >> Thank you, Clint. >> That's the thing: it's relatively easy to do it in base, but the >> resulting code is not THAT simple. >> I thought dplyr would make it easy... >> >> On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <cl...@ecy.wa.gov> wrote: >>> May want to add headers but the following provides the device number with >>> each set fo sums: >>> >>> for (dev in (unique(md$device))) >>> {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} >>> >>> Clint Bowman INTERNET: cl...@ecy.wa.gov >>> Air Quality Modeler INTERNET: cl...@math.utah.edu >>> Department of Ecology VOICE: (360) 407-6815 >>> PO Box 47600 FAX: (360) 407-7534 >>> Olympia, WA 98504-7600 >>> >>> USPS: PO Box 47600, Olympia, WA 98504-7600 >>> Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 >>> >>> On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: >>> >>>> Except, of course, Bert, that you forgot that it had to be done by >>>> device. Your solution ignores the device. >>>> >>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = >>>> c(1,3,4,3,5,5), >>>> device = c(1,1,2,2,3,3)) >>>> myvars = c("a", "b", "c") >>>> md[2,3] <- NA >>>> md[4,1] <- NA >>>> md >>>> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) >>>> >>>> But the result should be by device. >>>> >>>> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski >>>> <dimitri.liakhovit...@gmail.com> wrote: >>>>> >>>>> Thank you, Bert. >>>>> I'll be honest - I am just learning dplyr and was wondering if one >>>>> could do it in dplyr. >>>>> But of course your solution is perfect... >>>>> >>>>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>>>>> vapply() in base R : >>>>>> >>>>>> >>>>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>>>>> >>>>>> + b=sample(3:7,10,rep=TRUE), >>>>>> + g = sample(7:9,10,rep=TRUE)) >>>>>> >>>>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>>>>> >>>>>> >>>>>> a b g >>>>>> 5 4 0 >>>>>> >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Bert >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "Data is not information. Information is not knowledge. And knowledge is >>>>>> certainly not wisdom." >>>>>> -- Clifford Stoll >>>>>> >>>>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>>>>> <dimitri.liakhovit...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> I have a data frame: >>>>>>> >>>>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = >>>>>>> c(1,3,4,3,5,5), >>>>>>> device = c(1,1,2,2,3,3)) >>>>>>> myvars = c("a", "b", "c") >>>>>>> md[2,3] <- NA >>>>>>> md[4,1] <- NA >>>>>>> md >>>>>>> >>>>>>> I want to count number of 5s in each column - by device. I can do it >>>>>>> like >>>>>>> this: >>>>>>> >>>>>>> library(dplyr) >>>>>>> group_by(md, device) %>% >>>>>>> summarise(counts.a = sum(a==5, na.rm = T), >>>>>>> counts.b = sum(b==5, na.rm = T), >>>>>>> counts.c = sum(c==5, na.rm = T)) >>>>>>> >>>>>>> However, in real life I'll have tons of variables (the length of >>>>>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>>>>> counts.b, etc. manually - dozens of times. >>>>>>> >>>>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at >>>>>>> once? >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Dimitri Liakhovitski >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dimitri Liakhovitski >>>> >>>> >>>> >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> >> >> -- >> Dimitri Liakhovitski >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.