On 07/23/2014 03:36 AM, LN A-go-go wrote:
>
> with your help.  I have been working the last few days, I am sorry to
> say, unsuccessfully, to calculate the mean (that's easy), split the data
> into sub-groups or secondary means - which are the break values between
> 4 classes.  Create data-sets with incursive values.  I can do it with
> brute force (copy and paste) but need to rise to the pythonic way and
> use a while loop and a nested if-else structure.  My attempts have been
> lame enough that I don't even want to put them here.

A while loop with an if inside is indeed a very plausible solution, so it would be interesting to see your attempts.

> int_list
> [36, 39, 39, 45, 61, 54, 61, 93, 62, 51, 47, 72, 54, 36, 62, 50, 41, 41,
> 40, 62, 62, 58, 57, 54, 49, 43, 47, 50, 45, 41, 54, 57, 57, 55, 62, 51,
> 34, 57, 55, 63, 45, 45, 42, 44, 34, 53, 67, 58, 56, 43, 33]
>>>> int_list.sort()
>>>> int_list
> [33, 34, 34, 36, 36, 39, 39, 40, 41, 41, 41, 42, 43, 43, 44, 45, 45, 45,
> 45, 47, 47, 49, 50, 50, 51, 51, 53, 54, 54, 54, 54, 55, 55, 56, 57, 57,
> 57, 57, 58, 58, 61, 61, 62, 62, 62, 62, 62, 63, 67, 72, 93]
>>>> flo_list = [float(integral) for integral in int_list]

While this last line shows that you've started using list comprehensions, which is a good thing, converting your data to floating point is not a good idea. It is completely unnecessary and (though probably not relevant here) can compromise the accuracy of calculations due to inherent rounding errors. I guess you are doing this to prevent subsequent rounding of the result of sum(int_list)/len(int_list). This is a Python2-specific issue and, personally, I think that as a beginner you should use Python3, where (among other things) this is not a problem.
If you want to stick to Python2 for whatever reason then do:

from __future__ import division

after which integer divisions return a float if required just as in Python3.

>>> sum(int_list)/len(int_list)
51.31372549019608

>>>> flo_list
> [33.0, 34.0, 34.0, 36.0, 36.0, 39.0, 39.0, 40.0, 41.0, 41.0, 41.0, 42.0,
> 43.0, 43.0, 44.0, 45.0, 45.0, 45.0, 45.0, 47.0, 47.0, 49.0, 50.0, 50.0,
> 51.0, 51.0, 53.0, 54.0, 54.0, 54.0, 54.0, 55.0, 55.0, 56.0, 57.0, 57.0,
> 57.0, 57.0, 58.0, 58.0, 61.0, 61.0, 62.0, 62.0, 62.0, 62.0, 62.0, 63.0,
> 67.0, 72.0, 93.0]
>>>> sum(flo_list)
> 2617.0
>>>>  totalnum = sum(flo_list)

stop generating references if you're not going to use them later!
Confuses you and others.

>>>> len(flo_list)
> 51
>>>> mean = sum(flo_list)/len(flo_list)
>>>> mean
> 51.31372549019608

So, you know how to calculate the total mean. For the means of subsamples what you have to do is to apply that same logic to subsamples of the data, which you have to generate. Without going through the lists of values several times, however, I cannot think of any simple implementation of this, which does not involve plenty of novel concepts. One fairly simple approach would be through a while loop as you suggested, but as said before, for loops are often more elegant in Python. I guess the following code is roughly what you had in mind ?

breakpoints = [your_list_of breakpoints]
large_value_buffer = []
int_list_iter = iter(int_list) # see comment below
for breakpoint in breakpoints:
        sublist = large_value_buffer
        for value in int_list_iter:
                if value < breakpoint:
                        sublist.append(value)
                        if large_value_buffer:
                                large_value_buffer = []
                else:
                        if sublist:
                                print(sum(sublist)/len(sublist))
                                large_value_buffer.append(value)
                        break

Essentially, you should know all elements of this small program except the iter(int_list). Essentially, this gives you a one-time iterator, which cannot be reused or reset, to use in the inner for loop. This prevents starting from the beginning of the list every time.

Since this is probably too complicated for you to work it out by yourself at this stage, I decided to give you the complete code, but make sure you understand what it does, especially think about what the large_value_buffer is doing.

One problem with this code is that it silently skips empty bins. Maybe that's something for you to work on ?

Best,
Wolfgang

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to