Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

josef . pktd Sat, 11 Jun 2016 10:03:51 -0700

On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:


> Hi Mark,
>
> Note that the scipy-dev or scipy-user mailing list would have been more
> appropriate for this question.
>
>
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:
>
>>
>>
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>> values versus actual data values for visualization of fit to a
>> distribution.  First a one-D array of expected percentiles is generated for
>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>> function for the chosen distribution, to return an array of expected
>> values.  The visualized data points are pairs of expected and actual
>> values, and a linear regression is done on these to produce the line data
>> points in this distribution should lie on.
>>
>> Where x is the input data array and dist the chosen distribution we have:
>>
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>
>>
>> My question concerns the plot display.
>>
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>
>>
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and
>> xticklabels produced produced by qqplot and problplot do not seem correct
>> for the their intended interpretations.  First the numbers on the x-axis do
>> not represent quantiles; the intervals between them do not in general
>> contain equal numbers of points.  For a normal distribution with sigma=1,
>> they represent standard deviations.  Changing the label on the x-axis does
>> not seem like a very good solution, because the interpretation of the
>> values on the x-axis will be different for different distributions.  Rather
>> the right solution seems to be to actually show quantiles on the x-axis.
>> The numbers on the x-axis can stay as they are, representing quantile
>> indexes, but they need to be spaced so as to show the actual division
>> points that carve the population up into  groups of the same size.  This
>> can be done in something like the following way.
>>
>
> The ticks are correct I think, but they're theoretical quantiles and not
> sample quantiles. This was discussed in [1] and is consistent with R [2]
> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
> x-axis label (mea culpa). Does adding that resolve your concern?
>
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3]
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>
> Ralf
>
>
as related link
http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html

Paul Hobson has done a lot of work for getting different probabitlity
scales attached to pp-plots or generalized versions of probability plots. I
think qqplots are less ambiguous because they are on the original or
standardized scale.

I haven't worked my way through the various interpretation of probability
axis yet because I find it "not obvious". It might be easier for fields
that have a tradition of using probability papers.

It's planned to be added to the statsmodels probability plots so that there
will be a large choice of axis labels and scales.

Josef


>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Reply via email to