Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

josef . pktd Sat, 11 Jun 2016 12:24:29 -0700

On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:


> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
>
> Regarding your remark:  Maybe Im having a terminology problem.  It seems
> to me once you do
>
> osm = dist.ppf(osm_uniform)
>>>
>>>
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
>


Yes and No, quantile, i.e. what you get from ppf, are units of the random
variable. So it is on the scale of the random variable not on a probability
scale. The axis labels are in units of the random variable.

pp-plots have probabilities on the axis and are uniform scaled in
probabilities but non-uniform in the values of the random variable.

The difficult part to follow is if the plot is done uniform in one scale,
but the axis are labeled non-uniform in the other scale. That's what Paul's
probscale does and what you have in mind, AFAIU.

Josef


>
> Mark
>
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
>
>
>
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com>
> wrote:
>
>> Hi Mark,
>>
>> Note that the scipy-dev or scipy-user mailing list would have been more
>> appropriate for this question.
>>
>>
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu>
>> wrote:
>>
>>>
>>>
>>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>>> values versus actual data values for visualization of fit to a
>>> distribution.  First a one-D array of expected percentiles is generated for
>>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>>> function for the chosen distribution, to return an array of expected
>>> values.  The visualized data points are pairs of expected and actual
>>> values, and a linear regression is done on these to produce the line data
>>> points in this distribution should lie on.
>>>
>>> Where x is the input data array and dist the chosen distribution we have:
>>>
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>>
>>>
>>> My question concerns the plot display.
>>>
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>>
>>>
>>> The x-axis of the resulting plot is labeled quantiles, but the xticks
>>> and xticklabels produced produced by qqplot and problplot do not seem
>>> correct for the their intended interpretations.  First the numbers on the
>>> x-axis do not represent quantiles; the intervals between them do not in
>>> general contain equal numbers of points.  For a normal distribution with
>>> sigma=1, they represent standard deviations.  Changing the label on the
>>> x-axis does not seem like a very good solution, because the interpretation
>>> of the values on the x-axis will be different for different distributions.
>>> Rather the right solution seems to be to actually show quantiles on the
>>> x-axis. The numbers on the x-axis can stay as they are, representing
>>> quantile indexes, but they need to be spaced so as to show the actual
>>> division points that carve the population up into  groups of the same
>>> size.  This can be done in something like the following way.
>>>
>>
>> The ticks are correct I think, but they're theoretical quantiles and not
>> sample quantiles. This was discussed in [1] and is consistent with R [2]
>> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
>> x-axis label (mea culpa). Does adding that resolve your concern?
>>
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3]
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>>
>> Ralf
>>
>>
> as related link
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>
> Paul Hobson has done a lot of work for getting different probabitlity
> scales attached to pp-plots or generalized versions of probability plots. I
> think qqplots are less ambiguous because they are on the original or
> standardized scale.
>
> I haven't worked my way through the various interpretation of probability
> axis yet because I find it "not obvious". It might be easier for fields
> that have a tradition of using probability papers.
>
> It's planned to be added to the statsmodels probability plots so that
> there will be a large choice of axis labels and scales.
>
> Josef
>
>
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Reply via email to