Re: [R] How do I do a pretty scatter plot using ggplot2?

Michael Sat, 10 Mar 2012 10:05:11 -0800

Thanks a lot!

Could you please elaborate on this one?


"What I'd really do, if you had lots of data, would be to bin x into
small contiguous bins and to calculate quantiles for each of those
bins and to plot smoothers across the quantiles (using bin medians as
the x axis) "

On Fri, Mar 9, 2012 at 9:21 PM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> Could you just add a log scale to the y dimension?
>
> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000))
>
> plot(y ~ x, data = DAT, log = "y")
>
> That lessens large dispersion (in some circumstances) but I'm not
> really sure what that has to do with smoothing....do you mean
> "smoothing" in the technical sense (loess, splines, and friends) or in
> some graphical sense?
>
> Still not sure what this has to do with quantile plots: they are
> usually diagnostic tools for examining distributional shape/fit.
>
> Here's two (related) ideas:
>
> i) If you have categorical x data, boxplots:
> http://had.co.nz/ggplot2/geom_boxplot.html
>
> ii) If you have continuous x data, quantile "envelopes":
> http://had.co.nz/ggplot2/stat_quantile.html
>
> # In ggplot2
>
> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000))
> DAT$xbin <- with(DAT, cut(x, seq(0, 20, 2)))
>
> p <- ggplot(DAT, aes(x = x, y = y)) + geom_point(alpha = 0.2) +
> stat_quantile(aes(colour = ..quantile..), quantiles = seq(0.05, 0.95,
> by=0.05)) + facet_wrap(~ xbin, scales = "free")
> print(p)
>
> What I'd really do, if you had lots of data, would be to bin x into
> small contiguous bins and to calculate quantiles for each of those
> bins and to plot smoothers across the quantiles (using bin medians as
> the x axis) -- I'm sure that's doable in ggplot2 as well.
>
> Michael
>
> On Fri, Mar 9, 2012 at 10:00 PM, Michael <comtech....@gmail.com> wrote:
> > The origin of this problem was that a plain scatter plot with too many
> > points with high dispersion generated too many points flying all over
> > places.
> >
> > We are trying to smooth the charts a bit...
> >
> > Any good recommendations?
> >
> > Thanks a lot!
> >
> > On Fri, Mar 9, 2012 at 8:59 PM, Michael <comtech....@gmail.com> wrote:
> >>
> >> Sorry for the confusion Michael.
> >>
> >> I myself am trying to figure out what my boss is requesting:
> >>
> >> I am certain that I need to "plot the quantiles of each bin.  " ...
> >>
> >> But how are the quantiles plotted? Shall I specify 50% quantile, etc?
> >>
> >> Being a diligent guy I am trying my hard to do some homework and figure
> it
> >> out myself...
> >>
> >> I thought there is a standard statistical prodedure that everybody
> >> knows...
> >>
> >> Any more thoughts?
> >>
> >> Thanks a lot!
> >>
> >>
> >> On Fri, Mar 9, 2012 at 8:51 PM, R. Michael Weylandt
> >> <michael.weyla...@gmail.com> wrote:
> >>>
> >>> On Fri, Mar 9, 2012 at 9:28 PM, Michael <comtech....@gmail.com> wrote:
> >>> > Thanks a lot Mike!
> >>> >
> >>>
> >>> Michael if you don't mind. (Though admittedly it leads to some degree
> >>> of confusion in a conversation like this)
> >>>
> >>> > Could you please explain your code a bit?
> >>>
> >>> Which part?
> >>>
> >>> >
> >>> > My imagination is that for each bin, I am plotting a line which is
> the
> >>> > quantile of the y-values in that bin?
> >>>
> >>> Oh, so you want a qqnorm()-esque line? How is that like a scatterplot?
> >>>
> >>> ....yes, that's something else entirely (and not clear from your first
> >>> post -- to my ear the "quantile" is a statistic tied to the [e]cdf)
> >>> This is actually much easier in ggplot (and certainly doable in base
> >>> as well)
> >>>
> >>> Try this,
> >>>
> >>> DAT <- data.frame(x = runif(1000, 0, 20), y = rnorm(1000)) # Not so
> >>> volatile this time
> >>> DAT$xbin <- with(DAT, cut(x, seq(0, 20, 5)))
> >>>
> >>> library(ggplot2)
> >>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + stat_qq(aes(sample = y))
> >>>
> >>> print(p)
> >>>
> >>> If this isn't what you want, please spend some time to show an example
> >>> of the sort of graph you desire (it can be a bit of code or a link to
> >>> a picture or even a hand sketch hosted somewhere online)
> >>>
> >>> Out on a limb, I think you might really be thinking of something more
> >>> like this:
> >>>
> >>> p <- ggplot(DAT) + facet_wrap( ~ xbin) + geom_step(aes(x =
> >>> seq_along(y), y = sort(y)))
> >>>
> >>> and see this for more: http://had.co.nz/ggplot2/geom_step.html
> >>>
> >>> Michael Weylandt
> >>>
> >>> >
> >>> > I ran your program but couldn't figure out the meaning of the dots in
> >>> > your
> >>> > plot?
> >>> >
> >>> > Thanks again!
> >>> >
> >>> > On Fri, Mar 9, 2012 at 7:07 PM, R. Michael Weylandt
> >>> > <michael.weyla...@gmail.com> wrote:
> >>> >>
> >>> >> That doesn't really seem to make sense to me as a graphical
> >>> >> representation (transforming adjacent y values differently), but if
> >>> >> you really want to do so, here's what I'd do if I understand your
> goal
> >>> >> (the preprocessing is independent of the graphics engine):
> >>> >>
> >>> >> DAT <- data.frame(x = runif(1000, 0, 20), y = rcauchy(1000)^2) #
> Nice
> >>> >> and volatile!
> >>> >>
> >>> >> # split y based on some x binning and assign empirical quantiles of
> >>> >> each
> >>> >> group
> >>> >>
> >>> >> DAT$yquant <- with(DAT, ave(y, cut(x, seq(0, 20, 5)), FUN =
> >>> >> function(x) ecdf(x)(x)))
> >>> >>
> >>> >> # BASE
> >>> >> plot(yquant ~ x, data = DAT)
> >>> >>
> >>> >>  # ggplot2
> >>> >> library(ggplot2)
> >>> >>
> >>> >> p <- ggplot(DAT, aes(x = x, y = yquant)) + geom_point()
> >>> >> print(p)
> >>> >>
> >>> >> Michael Weylandt
> >>> >>
> >>> >> PS -- I see Josh Wiley just responded pointing out your requirements
> >>> >> #1 and #2 are incompatible: I've used 1 here.
> >>> >>
> >>> >> On Fri, Mar 9, 2012 at 7:37 PM, Michael <comtech....@gmail.com>
> wrote:
> >>> >> > Hi all,
> >>> >> >
> >>> >> > I am trying hard to do the following and have already spent a few
> >>> >> > hours
> >>> >> > in
> >>> >> > vain:
> >>> >> >
> >>> >> > I wanted to do the scatter plot.
> >>> >> >
> >>> >> > But given the high dispersion on those dots, I would like to bin
> the
> >>> >> > x-axis
> >>> >> > and then for each bin of the x-axis, plot the quantiles of the
> >>> >> > y-values
> >>> >> > of
> >>> >> > the data points in each bin:
> >>> >> >
> >>> >> > 1. Uniform bin size on the x-axis;
> >>> >> > 2. Equal number of observations in each bin;
> >>> >> >
> >>> >> > How to do that in R? I guess for the sake of prettyness, I'd
> better
> >>> >> > do
> >>> >> > it
> >>> >> > in ggplot2?
> >>> >> >
> >>> >> > Thank you!
> >>> >> >
> >>> >> >        [[alternative HTML version deleted]]
> >>> >> >
> >>> >> > ______________________________________________
> >>> >> > R-help@r-project.org mailing list
> >>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> > PLEASE do read the posting guide
> >>> >> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >>> >> > and provide commented, minimal, self-contained, reproducible code.
> >>> >
> >>> >
> >>
> >>
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I do a pretty scatter plot using ggplot2?

Reply via email to