Ok, Our messages crossed. I understand now.
Thanks. Mark On Jun 11, 2016, at 12:24 PM, josef.p...@gmail.com wrote: > > > On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron <gaw...@mail.sdsu.edu> wrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems to > me once you do > >>> osm = dist.ppf(osm_uniform) > > you’re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it’s a division point in a UNIFORM division of the probability axis. > > > Yes and No, quantile, i.e. what you get from ppf, are units of the random > variable. So it is on the scale of the random variable not on a probability > scale. The axis labels are in units of the random variable. > > pp-plots have probabilities on the axis and are uniform scaled in > probabilities but non-uniform in the values of the random variable. > > The difficult part to follow is if the plot is done uniform in one scale, but > the axis are labeled non-uniform in the other scale. That's what Paul's > probscale does and what you have in mind, AFAIU. > > Josef > > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > >> >> >> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote: >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote: >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a distribution. >> First a one-D array of expected percentiles is generated for a sample of >> size N; then that is passed to dist.ppf, the per cent point function for >> the chosen distribution, to return an array of expected values. The >> visualized data points are pairs of expected and actual values, and a linear >> regression is done on these to produce the line data points in this >> distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> My question concerns the plot display. >> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the values >> on the x-axis will be different for different distributions. Rather the >> right solution seems to be to actually show quantiles on the x-axis. The >> numbers on the x-axis can stay as they are, representing quantile indexes, >> but they need to be spaced so as to show the actual division points that >> carve the population up into groups of the same size. This can be done in >> something like the following way. >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] and >> statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> >> as related link >> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html >> >> Paul Hobson has done a lot of work for getting different probabitlity scales >> attached to pp-plots or generalized versions of probability plots. I think >> qqplots are less ambiguous because they are on the original or standardized >> scale. >> >> I haven't worked my way through the various interpretation of probability >> axis yet because I find it "not obvious". It might be easier for fields that >> have a tradition of using probability papers. >> >> It's planned to be added to the statsmodels probability plots so that there >> will be a large choice of axis labels and scales. >> >> Josef >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion