Thanks, Jozef. This is very helpful. And I will direct this to one of the other mailing lists, once I read the previous posts.
Regarding your remark: Maybe Im having a terminology problem. It seems to me once you do >> osm = dist.ppf(osm_uniform) you’re back in the value space for the particular distribution. So this gives you known probability intervals, but not UNIFORM probability intervals (the interval between 0 and 1 STD covers a bigger prob interval than the the interval between 1 and 2). And the idea of a quantile is that it’s a division point in a UNIFORM division of the probability axis. Mark On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected > values versus actual data values for visualization of fit to a distribution. > First a one-D array of expected percentiles is generated for a sample of > size N; then that is passed to dist.ppf, the per cent point function for the > chosen distribution, to return an array of expected values. The visualized > data points are pairs of expected and actual values, and a linear regression > is done on these to produce the line data points in this distribution should > lie on. > > Where x is the input data array and dist the chosen distribution we have: > >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > My question concerns the plot display. > >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and > xticklabels produced produced by qqplot and problplot do not seem correct for > the their intended interpretations. First the numbers on the x-axis do not > represent quantiles; the intervals between them do not in general contain > equal numbers of points. For a normal distribution with sigma=1, they > represent standard deviations. Changing the label on the x-axis does not > seem like a very good solution, because the interpretation of the values on > the x-axis will be different for different distributions. Rather the right > solution seems to be to actually show quantiles on the x-axis. The numbers on > the x-axis can stay as they are, representing quantile indexes, but they need > to be spaced so as to show the actual division points that carve the > population up into groups of the same size. This can be done in something > like the following way. > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] and > statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis > label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity scales > attached to pp-plots or generalized versions of probability plots. I think > qqplots are less ambiguous because they are on the original or standardized > scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields that > have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that there > will be a large choice of axis labels and scales. > > Josef > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion