On Sat, Jun 11, 2016 at 2:53 PM, Ralf Gommers <ralf.gomm...@gmail.com> wrote:
> Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote: > >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a >> distribution. First a one-D array of expected percentiles is generated for >> a sample of size N; then that is passed to dist.ppf, the per cent point >> function for the chosen distribution, to return an array of expected >> values. The visualized data points are pairs of expected and actual >> values, and a linear regression is done on these to produce the line data >> points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> >> My question concerns the plot display. >> >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the >> values on the x-axis will be different for different distributions. Rather >> the right solution seems to be to actually show quantiles on the x-axis. >> The numbers on the x-axis can stay as they are, representing quantile >> indexes, but they need to be spaced so as to show the actual division >> points that carve the population up into groups of the same size. This >> can be done in something like the following way. >> > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] > and statsmodels [3]. I see that we just forgot to add "theoretical" to the > x-axis label (mea culpa). Does adding that resolve your concern? > Sent a PR for this: https://github.com/scipy/scipy/pull/6249 Ralf > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion