Ah yes. Duhhh... Thanks Rui. So h$density *diff(h$breaks) *100 will give the percentages. No need for arithmetic beyond that.
Bert On Tue, Aug 17, 2021 at 12:03 PM Rui Barradas <ruipbarra...@sapo.pt> wrote: > > Hello, > > > > Às 19:28 de 17/08/21, Bert Gunter escreveu: > > Inline below. > > > > > > > > On Tue, Aug 17, 2021 at 4:09 AM Rui Barradas <ruipbarra...@sapo.pt> wrote: > >> > >> Hello, > >> > >> I had forgotten about plot.histogram, it does make everything simpler. > >> To have percentages on the bars, in the code below I use package scales. > >> > >> Note that it seems to me that you do not want densities, to have > >> percentages, the proportions of counts are given by any of > > > > Under the default of equal width bins -- which is what Sturges gives > > Right. > > > if I read the docs correctly -- since the densities sum to 1, > > The "densities" do not sum to 1. From ?hist, section Value: > > density > values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), > they are the relative frequencies counts/n and in general satisfy > sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i]. > > > If all(diff(breaks) == 1) is FALSE, the density list member must be > multiplied by diff(.$breaks) > > > h <- hist(datasetregs$Amount, plot = FALSE) > sum(h$density) > #[1] 1e-04 > diff(h$breaks) > #[1] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 > sum(h$density*diff(h$breaks)) > #[1] 1 > > > Hope this helps, > > Rui Barradas > > they are > > already the proportion of counts in each histogram bin, no? > > > > -- Bert > > > > > >> > >> h$counts/sum(h$counts) > >> h$density*diff(h$breaks) > >> > >> > >> > >> # One histogram for all dates > >> h <- hist(datasetregs$Amount, plot = FALSE) > >> plot(h, labels = scales::percent(h$counts/sum(h$counts)), > >> ylim = c(0, 1.1*max(h$counts))) > >> > >> > >> > >> # Histograms by date > >> sp <- split(datasetregs, datasetregs$Date) > >> old_par <- par(mfrow = c(1, 3)) > >> h_list <- lapply(seq_along(sp), function(i){ > >> hist_title <- paste("Histogram of", names(sp)[i]) > >> h <- hist(sp[[i]]$Amount, plot = FALSE) > >> plot(h, main = hist_title, xlab = "Amount", > >> labels = scales::percent(h$counts/sum(h$counts)), > >> ylim = c(0, 1.1*max(h$counts))) > >> }) > >> par(old_par) > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> Às 01:49 de 17/08/21, Bert Gunter escreveu: > >>> I may well misunderstand, but proffered solutions seem more complicated > >>> than necessary. > >>> Note that the return of hist() can be saved as a list of class "histogram" > >>> and then plotted with plot.histogram(), which already has a "labels" > >>> argument that seems to be what you want. A simple example is" > >>> > >>> dat <- runif(50, 0, 10) > >>> myhist <- hist(dat, freq = TRUE, breaks ="Sturges") > >>> > >>> plot(myhist, col = "darkgray", > >>> labels = as.character(round(myhist$density*100,1) ), > >>> ylim = c(0, 1.1*max(myhist$counts))) > >>> ## note that this is plot.histogram because myhist has class "histogram" > >>> > >>> Note that I expanded the y axis a bit to be sure to include the labels. > >>> You > >>> can, of course, plot your separate years as Rui has indicated or via e.g. > >>> ?layout. > >>> > >>> Apologies if I have misunderstood. Just ignore this in that case. > >>> Otherwise, I leave it to you to fill in details. > >>> > >>> Bert Gunter > >>> > >>> "The trouble with having an open mind is that people keep coming along and > >>> sticking things into it." > >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> > >>> > >>> On Mon, Aug 16, 2021 at 4:14 PM Paul Bernal <paulberna...@gmail.com> > >>> wrote: > >>> > >>>> Dear Jim, > >>>> > >>>> Thank you so much for your kind reply. Yes, this is what I am looking > >>>> for, > >>>> however, can´t see clearly how the bars correspond to the bins in the > >>>> x-axis. Maybe there is a way to align the amounts so that they match the > >>>> columns, sorry if I sound picky, but just want to learn if there is a way > >>>> to accomplish this. > >>>> > >>>> Best regards, > >>>> > >>>> Paul > >>>> > >>>> El lun, 16 ago 2021 a las 17:57, Jim Lemon (<drjimle...@gmail.com>) > >>>> escribió: > >>>> > >>>>> Hi Paul, > >>>>> I just worked out your first request: > >>>>> > >>>>> datasetregs<-<-structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > >>>>> 2L, > >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > >>>>> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > >>>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > >>>>> 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), class = > >>>>> "factor"), > >>>>> Amount = c(40100, 101100, 35000, 40100, 15000, 45100, 40200, > >>>>> 15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, 15000, > >>>>> 15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, 15000, > >>>>> 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, > >>>>> 15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, 15000, > >>>>> 16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, 15000, > >>>>> 15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, 15000, > >>>>> 15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, 15000, > >>>>> 15000, 15000, 15000, 15000)), row.names = c(NA, -74L), class = > >>>>> "data.frame") > >>>>> histval<-with(datasetregs, hist(Amount, groups=Date, scale="frequency", > >>>>> breaks="Sturges", col="darkgray")) > >>>>> library(plotrix) > >>>>> histpcts<-paste0(round(100*histval$counts/sum(histval$counts),1),"%") > >>>>> barlabels(histval$mids,histval$counts,histpcts) > >>>>> > >>>>> I think that's what you asked for: > >>>>> > >>>>> Jim > >>>>> > >>>>> On Tue, Aug 17, 2021 at 8:44 AM Paul Bernal <paulberna...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> This is way better, now, how could I put the frequency labels in the > >>>>>> columns as a percentage, instead of presenting them as counts? > >>>>>> > >>>>>> Thank you so much. > >>>>>> > >>>>>> Paul > >>>>>> > >>>>>> El lun, 16 ago 2021 a las 17:33, Rui Barradas (<ruipbarra...@sapo.pt>) > >>>>>> escribió: > >>>>>> > >>>>>>> Hello, > >>>>>>> > >>>>>>> You forgot to cc the list. > >>>>>>> > >>>>>>> Here are two ways, both of them apply hist() and text() to Amount > >>>> split > >>>>>>> by Date. The return value of hist is saved because it's a list with > >>>>>>> members the histogram's bars midpoints and the counts. Those are used > >>>>> to > >>>>>>> know where to put the text labels. > >>>>>>> A vector lbls is created to get rid of counts of zero. > >>>>>>> > >>>>>>> The main difference between the two ways is the histogram's titles. > >>>>>>> > >>>>>>> > >>>>>>> old_par <- par(mfrow = c(1, 3)) > >>>>>>> h_list <- with(datasetregs, tapply(Amount, Date, function(x){ > >>>>>>> h <- hist(x) > >>>>>>> lbls <- ifelse(h$counts == 0, NA_integer_, h$counts) > >>>>>>> text(h$mids, h$counts/2, labels = lbls) > >>>>>>> })) > >>>>>>> par(old_par) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> old_par <- par(mfrow = c(1, 3)) > >>>>>>> sp <- split(datasetregs, datasetregs$Date) > >>>>>>> h_list <- lapply(seq_along(sp), function(i){ > >>>>>>> hist_title <- paste("Histogram of", names(sp)[i]) > >>>>>>> h <- hist(sp[[i]]$Amount, main = hist_title) > >>>>>>> lbls <- ifelse(h$counts == 0, NA_integer_, h$counts) > >>>>>>> text(h$mids, h$counts/2, labels = lbls) > >>>>>>> }) > >>>>>>> par(old_par) > >>>>>>> > >>>>>>> > >>>>>>> Hope this helps, > >>>>>>> > >>>>>>> Rui Barradas > >>>>>>> > >>>>>>> Às 23:16 de 16/08/21, Paul Bernal escreveu: > >>>>>>>> Dear Rui, > >>>>>>>> > >>>>>>>> The hist() function comes from the graphics package, from what I > >>>>> could > >>>>>>>> see. The thing is that I want to divide the Amount column into > >>>>> several > >>>>>>>> bins and then generate three different histograms, one for each AF > >>>>>>>> period (AF refers to fiscal years). As you can see, the data > >>>> contains > >>>>>>>> three fiscal years (2017, 2020 and 2021). I want to see the > >>>>> percentage > >>>>>>>> of cases that fall into different amount categories, from 15,000 > >>>> and > >>>>>>>> below, 16,000 to 17,000, from 18,000 to 19,000, and so on. > >>>>>>>> > >>>>>>>> Thanks for your kind help. > >>>>>>>> > >>>>>>>> Paul > >>>>>>>> > >>>>>>>> El lun, 16 ago 2021 a las 17:07, Rui Barradas (< > >>>> ruipbarra...@sapo.pt > >>>>>>>> <mailto:ruipbarra...@sapo.pt>>) escribió: > >>>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> The function Hist comes from what package? > >>>>>>>> > >>>>>>>> Are you sure you don't want a bar plot? > >>>>>>>> > >>>>>>>> > >>>>>>>> agg <- aggregate(Amount ~ Date, datasetregs, sum) > >>>>>>>> bp <- barplot(Amount ~ Date, agg) > >>>>>>>> with(agg, text(bp, Amount/2, labels = Amount)) > >>>>>>>> > >>>>>>>> > >>>>>>>> Hope this helps, > >>>>>>>> > >>>>>>>> Rui Barradas > >>>>>>>> > >>>>>>>> Às 22:54 de 16/08/21, Paul Bernal escreveu: > >>>>>>>> > Hello everyone, > >>>>>>>> > > >>>>>>>> > I am currently working with R version 4.1.0 and I am trying > >>>> to > >>>>>>>> include > >>>>>>>> > (inside the columns of the histogram), the percentage > >>>>>>>> distribution and I > >>>>>>>> > want to generate three histograms, one for each fiscal year > >>>>> (in > >>>>>>>> the Date > >>>>>>>> > column, there are three fiscal year AF 2017, AF 2020 and AF > >>>>>>>> 2021). However, > >>>>>>>> > I can´t seem to accomplish this. > >>>>>>>> > > >>>>>>>> > Here is my data: > >>>>>>>> > > >>>>>>>> > structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > >>>> 2L, > >>>>>>>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > >>>>> 2L, > >>>>>>>> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > >>>>> 2L, > >>>>>>>> > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > >>>>> 3L, > >>>>>>>> > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > >>>>> 3L, > >>>>>>>> > 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), > >>>>> class = > >>>>>>>> > "factor"), > >>>>>>>> > Amount = c(40100, 101100, 35000, 40100, 15000, 45100, > >>>>> 40200, > >>>>>>>> > 15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, > >>>>>>> 15000, > >>>>>>>> > 15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, > >>>>>>> 15000, > >>>>>>>> > 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, > >>>>>>> 15000, > >>>>>>>> > 15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, > >>>>>>> 15000, > >>>>>>>> > 16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, > >>>>>>> 15000, > >>>>>>>> > 15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, > >>>>>>> 15000, > >>>>>>>> > 15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, > >>>>>>> 15000, > >>>>>>>> > 15000, 15000, 15000, 15000)), row.names = c(NA, -74L), > >>>>> class > >>>>>>> = > >>>>>>>> > "data.frame") > >>>>>>>> > > >>>>>>>> > I would like to modify the following script: > >>>>>>>> > > >>>>>>>> >> with(datasetregs, Hist(Amount, groups=Date, > >>>>> scale="frequency", > >>>>>>>> > + breaks="Sturges", col="darkgray")) > >>>>>>>> > > >>>>>>>> > #The only thing missing here are the percentages > >>>>> corresponding to > >>>>>>>> each bin > >>>>>>>> > (I would like to see the percentages inside each column, or > >>>> on > >>>>>>>> top outside > >>>>>>>> > if possible) > >>>>>>>> > > >>>>>>>> > Any help will be greatly appreciated. > >>>>>>>> > > >>>>>>>> > Best regards, > >>>>>>>> > > >>>>>>>> > Paul. > >>>>>>>> > > >>>>>>>> > [[alternative HTML version deleted]] > >>>>>>>> > > >>>>>>>> > ______________________________________________ > >>>>>>>> > R-help@r-project.org <mailto:R-help@r-project.org> mailing > >>>>> list > >>>>>>>> -- To UNSUBSCRIBE and more, see > >>>>>>>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-help> > >>>>>>>> > PLEASE do read the posting guide > >>>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>> <http://www.R-project.org/posting-guide.html> > >>>>>>>> > and provide commented, minimal, self-contained, reproducible > >>>>> code. > >>>>>>>> > > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> [[alternative HTML version deleted]] > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.