On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:
> Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems > to me once you do > > osm = dist.ppf(osm_uniform) >>> >>> > you’re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it’s a division point in a UNIFORM division of the probability axis. > Yes and No, quantile, i.e. what you get from ppf, are units of the random variable. So it is on the scale of the random variable not on a probability scale. The axis labels are in units of the random variable. pp-plots have probabilities on the axis and are uniform scaled in probabilities but non-uniform in the values of the random variable. The difficult part to follow is if the plot is done uniform in one scale, but the axis are labeled non-uniform in the other scale. That's what Paul's probscale does and what you have in mind, AFAIU. Josef > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> >> wrote: >> >>> >>> >>> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >>> values versus actual data values for visualization of fit to a >>> distribution. First a one-D array of expected percentiles is generated for >>> a sample of size N; then that is passed to dist.ppf, the per cent point >>> function for the chosen distribution, to return an array of expected >>> values. The visualized data points are pairs of expected and actual >>> values, and a linear regression is done on these to produce the line data >>> points in this distribution should lie on. >>> >>> Where x is the input data array and dist the chosen distribution we have: >>> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >>> >>> >>> My question concerns the plot display. >>> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >>> >>> >>> The x-axis of the resulting plot is labeled quantiles, but the xticks >>> and xticklabels produced produced by qqplot and problplot do not seem >>> correct for the their intended interpretations. First the numbers on the >>> x-axis do not represent quantiles; the intervals between them do not in >>> general contain equal numbers of points. For a normal distribution with >>> sigma=1, they represent standard deviations. Changing the label on the >>> x-axis does not seem like a very good solution, because the interpretation >>> of the values on the x-axis will be different for different distributions. >>> Rather the right solution seems to be to actually show quantiles on the >>> x-axis. The numbers on the x-axis can stay as they are, representing >>> quantile indexes, but they need to be spaced so as to show the actual >>> division points that carve the population up into groups of the same >>> size. This can be done in something like the following way. >>> >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] >> and statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity > scales attached to pp-plots or generalized versions of probability plots. I > think qqplots are less ambiguous because they are on the original or > standardized scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields > that have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that > there will be a large choice of axis labels and scales. > > Josef > > >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion