Thanks a lot! Could you please elaborate on this one?
"What I'd really do, if you had lots of data, would be to bin x into small contiguous bins and to calculate quantiles for each of those bins and to plot smoothers across the quantiles (using bin medians as the x axis) " On Fri, Mar 9, 2012 at 9:21 PM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > Could you just add a log scale to the y dimension? > > DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) > > plot(y ~ x, data = DAT, log = "y") > > That lessens large dispersion (in some circumstances) but I'm not > really sure what that has to do with smoothing....do you mean > "smoothing" in the technical sense (loess, splines, and friends) or in > some graphical sense? > > Still not sure what this has to do with quantile plots: they are > usually diagnostic tools for examining distributional shape/fit. > > Here's two (related) ideas: > > i) If you have categorical x data, boxplots: > http://had.co.nz/ggplot2/geom_boxplot.html > > ii) If you have continuous x data, quantile "envelopes": > http://had.co.nz/ggplot2/stat_quantile.html > > # In ggplot2 > > DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) > DAT$xbin <- with(DAT, cut(x, seq(0, 20, 2))) > > p <- ggplot(DAT, aes(x = x, y = y)) + geom_point(alpha = 0.2) + > stat_quantile(aes(colour = ..quantile..), quantiles = seq(0.05, 0.95, > by=0.05)) + facet_wrap(~ xbin, scales = "free") > print(p) > > What I'd really do, if you had lots of data, would be to bin x into > small contiguous bins and to calculate quantiles for each of those > bins and to plot smoothers across the quantiles (using bin medians as > the x axis) -- I'm sure that's doable in ggplot2 as well. > > Michael > > On Fri, Mar 9, 2012 at 10:00 PM, Michael <comtech....@gmail.com> wrote: > > The origin of this problem was that a plain scatter plot with too many > > points with high dispersion generated too many points flying all over > > places. > > > > We are trying to smooth the charts a bit... > > > > Any good recommendations? > > > > Thanks a lot! > > > > On Fri, Mar 9, 2012 at 8:59 PM, Michael <comtech....@gmail.com> wrote: > >> > >> Sorry for the confusion Michael. > >> > >> I myself am trying to figure out what my boss is requesting: > >> > >> I am certain that I need to "plot the quantiles of each bin. " ... > >> > >> But how are the quantiles plotted? Shall I specify 50% quantile, etc? > >> > >> Being a diligent guy I am trying my hard to do some homework and figure > it > >> out myself... > >> > >> I thought there is a standard statistical prodedure that everybody > >> knows... > >> > >> Any more thoughts? > >> > >> Thanks a lot! > >> > >> > >> On Fri, Mar 9, 2012 at 8:51 PM, R. Michael Weylandt > >> <michael.weyla...@gmail.com> wrote: > >>> > >>> On Fri, Mar 9, 2012 at 9:28 PM, Michael <comtech....@gmail.com> wrote: > >>> > Thanks a lot Mike! > >>> > > >>> > >>> Michael if you don't mind. (Though admittedly it leads to some degree > >>> of confusion in a conversation like this) > >>> > >>> > Could you please explain your code a bit? > >>> > >>> Which part? > >>> > >>> > > >>> > My imagination is that for each bin, I am plotting a line which is > the > >>> > quantile of the y-values in that bin? > >>> > >>> Oh, so you want a qqnorm()-esque line? How is that like a scatterplot? > >>> > >>> ....yes, that's something else entirely (and not clear from your first > >>> post -- to my ear the "quantile" is a statistic tied to the [e]cdf) > >>> This is actually much easier in ggplot (and certainly doable in base > >>> as well) > >>> > >>> Try this, > >>> > >>> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) # Not so > >>> volatile this time > >>> DAT$xbin <- with(DAT, cut(x, seq(0, 20, 5))) > >>> > >>> library(ggplot2) > >>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + stat_qq(aes(sample = y)) > >>> > >>> print(p) > >>> > >>> If this isn't what you want, please spend some time to show an example > >>> of the sort of graph you desire (it can be a bit of code or a link to > >>> a picture or even a hand sketch hosted somewhere online) > >>> > >>> Out on a limb, I think you might really be thinking of something more > >>> like this: > >>> > >>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + geom_step(aes(x = > >>> seq_along(y), y = sort(y))) > >>> > >>> and see this for more: http://had.co.nz/ggplot2/geom_step.html > >>> > >>> Michael Weylandt > >>> > >>> > > >>> > I ran your program but couldn't figure out the meaning of the dots in > >>> > your > >>> > plot? > >>> > > >>> > Thanks again! > >>> > > >>> > On Fri, Mar 9, 2012 at 7:07 PM, R. Michael Weylandt > >>> > <michael.weyla...@gmail.com> wrote: > >>> >> > >>> >> That doesn't really seem to make sense to me as a graphical > >>> >> representation (transforming adjacent y values differently), but if > >>> >> you really want to do so, here's what I'd do if I understand your > goal > >>> >> (the preprocessing is independent of the graphics engine): > >>> >> > >>> >> DAT <- data.frame(x = runif(1000, 0, 20), y = rcauchy(1000)^2) # > Nice > >>> >> and volatile! > >>> >> > >>> >> # split y based on some x binning and assign empirical quantiles of > >>> >> each > >>> >> group > >>> >> > >>> >> DAT$yquant <- with(DAT, ave(y, cut(x, seq(0, 20, 5)), FUN = > >>> >> function(x) ecdf(x)(x))) > >>> >> > >>> >> # BASE > >>> >> plot(yquant ~ x, data = DAT) > >>> >> > >>> >> # ggplot2 > >>> >> library(ggplot2) > >>> >> > >>> >> p <- ggplot(DAT, aes(x = x, y = yquant)) + geom_point() > >>> >> print(p) > >>> >> > >>> >> Michael Weylandt > >>> >> > >>> >> PS -- I see Josh Wiley just responded pointing out your requirements > >>> >> #1 and #2 are incompatible: I've used 1 here. > >>> >> > >>> >> On Fri, Mar 9, 2012 at 7:37 PM, Michael <comtech....@gmail.com> > wrote: > >>> >> > Hi all, > >>> >> > > >>> >> > I am trying hard to do the following and have already spent a few > >>> >> > hours > >>> >> > in > >>> >> > vain: > >>> >> > > >>> >> > I wanted to do the scatter plot. > >>> >> > > >>> >> > But given the high dispersion on those dots, I would like to bin > the > >>> >> > x-axis > >>> >> > and then for each bin of the x-axis, plot the quantiles of the > >>> >> > y-values > >>> >> > of > >>> >> > the data points in each bin: > >>> >> > > >>> >> > 1. Uniform bin size on the x-axis; > >>> >> > 2. Equal number of observations in each bin; > >>> >> > > >>> >> > How to do that in R? I guess for the sake of prettyness, I'd > better > >>> >> > do > >>> >> > it > >>> >> > in ggplot2? > >>> >> > > >>> >> > Thank you! > >>> >> > > >>> >> > [[alternative HTML version deleted]] > >>> >> > > >>> >> > ______________________________________________ > >>> >> > R-help@r-project.org mailing list > >>> >> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> >> > PLEASE do read the posting guide > >>> >> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > >>> >> > and provide commented, minimal, self-contained, reproducible code. > >>> > > >>> > > >> > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.