Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Mark Gawron Sat, 11 Jun 2016 12:33:07 -0700

Ok,

Our messages crossed.  I understand now.


Thanks.

Mark
On Jun 11, 2016, at 12:24 PM, josef.p...@gmail.com wrote:

> 
> 
> On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:
> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
> 
> Regarding your remark:  Maybe Im having a terminology problem.  It seems to 
> me once you do
> 
>>> osm = dist.ppf(osm_uniform)
> 
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
> 
> 
> Yes and No, quantile, i.e. what you get from ppf, are units of the random 
> variable. So it is on the scale of the random variable not on a probability 
> scale. The axis labels are in units of the random variable.
> 
> pp-plots have probabilities on the axis and are uniform scaled in 
> probabilities but non-uniform in the values of the random variable.
> 
> The difficult part to follow is if the plot is done uniform in one scale, but 
> the axis are labeled non-uniform in the other scale. That's what Paul's 
> probscale does and what you have in mind, AFAIU.
> 
> Josef
>  
> 
> Mark
> 
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
> 
>> 
>> 
>> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote:
>> Hi Mark,
>> 
>> Note that the scipy-dev or scipy-user mailing list would have been more 
>> appropriate for this question. 
>> 
>> 
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gaw...@mail.sdsu.edu> wrote:
>> 
>> 
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected 
>> values versus actual data values for visualization of fit to a distribution. 
>>  First a one-D array of expected percentiles is generated for  a sample of 
>> size N; then that is passed to  dist.ppf, the per cent point function for 
>> the chosen distribution, to return an array of expected values.  The 
>> visualized data points are pairs of expected and actual values, and a linear 
>> regression is done on these to produce the line data points in this 
>> distribution should lie on.
>> 
>> Where x is the input data array and dist the chosen distribution we have:
>> 
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>> 
>> My question concerns the plot display.  
>> 
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>> 
>> 
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and 
>> xticklabels produced produced by qqplot and problplot do not seem correct 
>> for the their intended interpretations.  First the numbers on the x-axis do 
>> not represent quantiles; the intervals between them do not in general 
>> contain equal numbers of points.  For a normal distribution with sigma=1, 
>> they represent standard deviations.  Changing the label on the x-axis does 
>> not seem like a very good solution, because the interpretation of the values 
>> on the x-axis will be different for different distributions.  Rather the 
>> right solution seems to be to actually show quantiles on the x-axis. The 
>> numbers on the x-axis can stay as they are, representing quantile indexes, 
>> but they need to be spaced so as to show the actual division points that 
>> carve the population up into  groups of the same size.  This can be done in 
>> something like the following way. 
>> 
>> The ticks are correct I think, but they're theoretical quantiles and not 
>> sample quantiles. This was discussed in [1] and is consistent with R [2] and 
>> statsmodels [3]. I see that we just forgot to add "theoretical" to the 
>> x-axis label (mea culpa). Does adding that resolve your concern?
>> 
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3] 
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>> 
>> Ralf
>> 
>> 
>> as related link 
>> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>> 
>> Paul Hobson has done a lot of work for getting different probabitlity scales 
>> attached to pp-plots or generalized versions of probability plots. I think 
>> qqplots are less ambiguous because they are on the original or standardized 
>> scale.
>> 
>> I haven't worked my way through the various interpretation of probability 
>> axis yet because I find it "not obvious". It might be easier for fields that 
>> have a tradition of using probability papers.
>> 
>> It's planned to be added to the statsmodels probability plots so that there 
>> will be a large choice of axis labels and scales.
>> 
>> Josef
>>  
>>  
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Reply via email to