Fast fingers; notice that there is still a problem in the counts; I was only looking at the last.
Happy New Year -- up too late. On Sun, Jan 1, 2012 at 12:33 AM, jim holtman <jholt...@gmail.com> wrote: > Here is a test I ran and looks fine, but then I created the data, so > it might have something to do with your data: > >> x <- sample(0:23, 100000, TRUE) >> a <- hist(x, breaks = 24) >> a[1:5] > $breaks > [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 > > $counts > [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132 > 4139 4231 4216 4158 4054 4185 4153 > [21] 4281 4110 4221 > > $intensities > [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155 > 0.04157 0.04203 0.04186 0.04158 > [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153 > 0.04281 0.04110 0.04221 > > $density > [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155 > 0.04157 0.04203 0.04186 0.04158 > [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153 > 0.04281 0.04110 0.04221 > > $mids > [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 > 13.5 14.5 15.5 16.5 17.5 18.5 19.5 > [21] 20.5 21.5 22.5 > >> table(x) > x > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 > 14 15 16 17 18 19 20 > 4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132 > 4139 4231 4216 4158 4054 4185 4153 > 21 22 23 > 4281 4110 4221 >> > > > On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote: >> Hi, >> >> I think you're not understanding quite what's going on with hist. Reread the >> help, and take a look at this small example. The solution I'd use is the last >> item. >> >>> x <- rep(1:10, times=1:10) >>> table(x) >> x >> 1 2 3 4 5 6 7 8 9 10 >> 1 2 3 4 5 6 7 8 9 10 >>> >>> >>> hist(x, plot=FALSE, right=TRUE)$counts >> [1] 3 3 4 5 6 7 8 9 10 >>> hist(x, plot=FALSE, right=TRUE)$breaks >> [1] 1 2 3 4 5 6 7 8 9 10 >>> hist(x, plot=FALSE, right=TRUE)$mids >> [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 >>> >>> >>> hist(x, plot=FALSE, right=FALSE)$counts >> [1] 1 2 3 4 5 6 7 8 19 >>> hist(x, plot=FALSE, right=FALSE)$breaks >> [1] 1 2 3 4 5 6 7 8 9 10 >>> hist(x, plot=FALSE, right=FALSE)$mids >> [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 >>> >>> >>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts >> [1] 1 2 3 4 5 6 7 8 9 10 >>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks >> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 >>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids >> [1] 1 2 3 4 5 6 7 8 9 10 >> >> >> Sarah >> >> On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <a...@arencambre.com> wrote: >>> I have two large datasets (156K and 2.06M records). Each row has the >>> hour that an event happened, represented by an integer from 0 to 23. >>> >>> R's histogram is combining some data. >>> >>> Here's the command I ran to get the histogram: >>>> histinfo <- hist(crashes$hour, right=FALSE) >>> >>> Here's histinfo: >>>> histinfo >>> $breaks >>> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 >>> >>> $counts >>> [1] 4755 4618 5959 3292 2378 2715 4592 6144 6860 5598 5601 >>> 6596 7152 7490 8166 >>> [16] 9758 11301 11745 9943 7494 6272 6220 11669 >>> >>> $intensities >>> [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844 >>> 0.02937602 0.03930449 >>> [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515 >>> 0.05223967 0.06242403 >>> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068 >>> 0.07464911 >>> >>> $density >>> [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844 >>> 0.02937602 0.03930449 >>> [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515 >>> 0.05223967 0.06242403 >>> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068 >>> 0.07464911 >>> >>> $mids >>> [1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 >>> 13.5 14.5 15.5 16.5 17.5 >>> [19] 18.5 19.5 20.5 21.5 22.5 >>> >>> $xname >>> [1] "crashes$hour" >>> >>> $equidist >>> [1] TRUE >>> >>> attr(,"class") >>> [1] "histogram" >>> >>> Note how the last value in counts is 11669. It's relevant to the >>> output of table(crashes$hour): >>> 0 1 2 3 4 5 6 7 8 9 10 >>> 11 12 13 14 >>> 4755 4618 5959 3292 2378 2715 4592 6144 6860 5598 5601 >>> 6596 7152 7490 8166 >>> 15 16 17 18 19 20 21 22 23 >>> 9758 11301 11745 9943 7494 6272 6220 6000 5669 >>> >>> Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is >>> that correct for the histogram to combine hours 22 and 23? Since I >>> specified right = FALSE, I figured there's no way 23 would be combined >>> with 22? >>> >>> Adding breaks=24 to the hist makes no difference; it's still stuck at >>> 23 breaks. I also tried breaks=25 and 23 and several other values, in >>> case I am misinterpreting breaks's meaning, but none of them make a >>> difference. >>> >>> I imagine this is a n00b question, so my apologies if this is obvious. >>> >>> Aren >>> >> >> -- >> Sarah Goslee >> http://www.functionaldiversity.org >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.