Hi Mark, Note that the scipy-dev or scipy-user mailing list would have been more appropriate for this question.
On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected > values versus actual data values for visualization of fit to a > distribution. First a one-D array of expected percentiles is generated for > a sample of size N; then that is passed to dist.ppf, the per cent point > function for the chosen distribution, to return an array of expected > values. The visualized data points are pairs of expected and actual > values, and a linear regression is done on these to produce the line data > points in this distribution should lie on. > > Where x is the input data array and dist the chosen distribution we have: > > osr = np.sort(x) > osm_uniform = _calc_uniform_order_statistic_medians(len(x)) > osm = dist.ppf(osm_uniform) > slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > > My question concerns the plot display. > > ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and > xticklabels produced produced by qqplot and problplot do not seem correct > for the their intended interpretations. First the numbers on the x-axis do > not represent quantiles; the intervals between them do not in general > contain equal numbers of points. For a normal distribution with sigma=1, > they represent standard deviations. Changing the label on the x-axis does > not seem like a very good solution, because the interpretation of the > values on the x-axis will be different for different distributions. Rather > the right solution seems to be to actually show quantiles on the x-axis. > The numbers on the x-axis can stay as they are, representing quantile > indexes, but they need to be spaced so as to show the actual division > points that carve the population up into groups of the same size. This > can be done in something like the following way. > The ticks are correct I think, but they're theoretical quantiles and not sample quantiles. This was discussed in [1] and is consistent with R [2] and statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis label (mea culpa). Does adding that resolve your concern? [1] https://github.com/scipy/scipy/issues/1821 [2] http://data.library.virginia.edu/understanding-q-q-plots/ [3] http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion