Re: [R] Histogram omitting/collapsing groups

jim holtman Sat, 31 Dec 2011 22:21:33 -0800

Fast fingers; notice that there is still a problem in the counts;  I
was only looking at the last.


Happy New Year -- up too late.

On Sun, Jan 1, 2012 at 12:33 AM, jim holtman <jholt...@gmail.com> wrote:
> Here is a test I ran and looks fine, but then I created the data, so
> it might have something to do with your data:
>
>> x <- sample(0:23, 100000, TRUE)
>> a <- hist(x, breaks = 24)
>> a[1:5]
> $breaks
>  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>
> $counts
>  [1] 8262 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
> 4139 4231 4216 4158 4054 4185 4153
> [21] 4281 4110 4221
>
> $intensities
>  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
> 0.04157 0.04203 0.04186 0.04158
> [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
> 0.04281 0.04110 0.04221
>
> $density
>  [1] 0.08262 0.04114 0.04186 0.04106 0.04153 0.04234 0.04206 0.04155
> 0.04157 0.04203 0.04186 0.04158
> [13] 0.04132 0.04139 0.04231 0.04216 0.04158 0.04054 0.04185 0.04153
> 0.04281 0.04110 0.04221
>
> $mids
>  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
> 13.5 14.5 15.5 16.5 17.5 18.5 19.5
> [21] 20.5 21.5 22.5
>
>> table(x)
> x
>   0    1    2    3    4    5    6    7    8    9   10   11   12   13
>  14   15   16   17   18   19   20
> 4168 4094 4114 4186 4106 4153 4234 4206 4155 4157 4203 4186 4158 4132
> 4139 4231 4216 4158 4054 4185 4153
>  21   22   23
> 4281 4110 4221
>>
>
>
> On Sat, Dec 31, 2011 at 11:20 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote:
>> Hi,
>>
>> I think you're not understanding quite what's going on with hist. Reread the
>> help, and take a look at this small example. The solution I'd use is the last
>> item.
>>
>>> x <- rep(1:10, times=1:10)
>>> table(x)
>> x
>>  1 2 3 4 5 6 7 8 9 10
>>  1 2 3 4 5 6 7 8 9 10
>>>
>>>
>>> hist(x, plot=FALSE, right=TRUE)$counts
>> [1] 3 3 4 5 6 7 8 9 10
>>> hist(x, plot=FALSE, right=TRUE)$breaks
>>  [1] 1 2 3 4 5 6 7 8 9 10
>>> hist(x, plot=FALSE, right=TRUE)$mids
>> [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>>>
>>>
>>> hist(x, plot=FALSE, right=FALSE)$counts
>> [1]  1  2  3  4  5  6  7  8 19
>>> hist(x, plot=FALSE, right=FALSE)$breaks
>>  [1] 1 2 3 4 5 6 7 8 9 10
>>> hist(x, plot=FALSE, right=FALSE)$mids
>> [1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
>>>
>>>
>>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
>>  [1] 1 2 3 4 5 6 7 8 9 10
>>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
>>  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
>>> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
>>  [1] 1 2 3 4 5 6 7 8 9 10
>>
>>
>> Sarah
>>
>> On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <a...@arencambre.com> wrote:
>>> I have two large datasets (156K and 2.06M records). Each row has the
>>> hour that an event happened, represented by an integer from 0 to 23.
>>>
>>> R's histogram is combining some data.
>>>
>>> Here's the command I ran to get the histogram:
>>>> histinfo <- hist(crashes$hour, right=FALSE)
>>>
>>> Here's histinfo:
>>>> histinfo
>>> $breaks
>>>  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>>>
>>> $counts
>>>  [1]  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
>>>  6596  7152  7490  8166
>>> [16]  9758 11301 11745  9943  7494  6272  6220 11669
>>>
>>> $intensities
>>>  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
>>> 0.02937602 0.03930449
>>>  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
>>> 0.05223967 0.06242403
>>> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
>>> 0.07464911
>>>
>>> $density
>>>  [1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
>>> 0.02937602 0.03930449
>>>  [9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
>>> 0.05223967 0.06242403
>>> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
>>> 0.07464911
>>>
>>> $mids
>>>  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5
>>> 13.5 14.5 15.5 16.5 17.5
>>> [19] 18.5 19.5 20.5 21.5 22.5
>>>
>>> $xname
>>> [1] "crashes$hour"
>>>
>>> $equidist
>>> [1] TRUE
>>>
>>> attr(,"class")
>>> [1] "histogram"
>>>
>>> Note how the last value in counts is 11669. It's relevant to the
>>> output of table(crashes$hour):
>>>     0     1     2     3     4     5     6     7     8     9    10
>>> 11    12    13    14
>>>  4755  4618  5959  3292  2378  2715  4592  6144  6860  5598  5601
>>> 6596  7152  7490  8166
>>>    15    16    17    18    19    20    21    22    23
>>>  9758 11301 11745  9943  7494  6272  6220  6000  5669
>>>
>>> Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
>>> that correct for the histogram to combine hours 22 and 23? Since I
>>> specified right = FALSE, I figured there's no way 23 would be combined
>>> with 22?
>>>
>>> Adding breaks=24 to the hist makes no difference; it's still stuck at
>>> 23 breaks. I also tried breaks=25 and 23 and several other values, in
>>> case I am misinterpreting breaks's meaning, but none of them make a
>>> difference.
>>>
>>> I imagine this is a n00b question, so my apologies if this is obvious.
>>>
>>> Aren
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Histogram omitting/collapsing groups

Reply via email to