Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Mark Gawron Sat, 11 Jun 2016 11:49:59 -0700

Thanks, Jozef.  This is very helpful.  And I will direct this
to one of the other mailing lists, once I read the previous posts.


Regarding your remark:  Maybe Im having a terminology problem.  It seems to me 
once you do

>> osm = dist.ppf(osm_uniform)

you’re back in the value space for the particular distribution. So this
gives you known probability intervals, but not UNIFORM probability
intervals (the interval between 0 and 1 STD covers a bigger prob interval
than the the interval between 1 and 2).  And the idea of a quantile is
that it’s a division point in a UNIFORM division of the probability axis.

Mark
On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:

> 
> 
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote:
> Hi Mark,
> 
> Note that the scipy-dev or scipy-user mailing list would have been more 
> appropriate for this question. 
> 
> 
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:
> 
> 
> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected 
> values versus actual data values for visualization of fit to a distribution.  
> First a one-D array of expected percentiles is generated for  a sample of 
> size N; then that is passed to  dist.ppf, the per cent point function for the 
> chosen distribution, to return an array of expected values.  The visualized 
> data points are pairs of expected and actual values, and a linear regression 
> is done on these to produce the line data points in this distribution should 
> lie on.
> 
> Where x is the input data array and dist the chosen distribution we have:
> 
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
> 
> My question concerns the plot display.  
> 
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
> 
> 
> The x-axis of the resulting plot is labeled quantiles, but the xticks and 
> xticklabels produced produced by qqplot and problplot do not seem correct for 
> the their intended interpretations.  First the numbers on the x-axis do not 
> represent quantiles; the intervals between them do not in general contain 
> equal numbers of points.  For a normal distribution with sigma=1, they 
> represent standard deviations.  Changing the label on the x-axis does not 
> seem like a very good solution, because the interpretation of the values on 
> the x-axis will be different for different distributions.  Rather the right 
> solution seems to be to actually show quantiles on the x-axis. The numbers on 
> the x-axis can stay as they are, representing quantile indexes, but they need 
> to be spaced so as to show the actual division points that carve the 
> population up into  groups of the same size.  This can be done in something 
> like the following way. 
> 
> The ticks are correct I think, but they're theoretical quantiles and not 
> sample quantiles. This was discussed in [1] and is consistent with R [2] and 
> statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis 
> label (mea culpa). Does adding that resolve your concern?
> 
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3] 
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
> 
> Ralf
> 
> 
> as related link 
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
> 
> Paul Hobson has done a lot of work for getting different probabitlity scales 
> attached to pp-plots or generalized versions of probability plots. I think 
> qqplots are less ambiguous because they are on the original or standardized 
> scale.
> 
> I haven't worked my way through the various interpretation of probability 
> axis yet because I find it "not obvious". It might be easier for fields that 
> have a tradition of using probability papers.
> 
> It's planned to be added to the statsmodels probability plots so that there 
> will be a large choice of axis labels and scales.
> 
> Josef
>  
>  
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Reply via email to