Hi all, I'd like to ask for some hints or advice regarding the usage of numpy.array and especially slicing.
I only recently tried numpy and was impressed by the speedup in some parts of the code, hence I suspect, that I might miss some other oportunities in this area. I currently use the following code for a simple visualisation of the search matches within the text, the arrays are generally much larger than the sample - the texts size is generally hundreds of kilobytes up to a few MB - with an index position for each character. First there is a list of spans(obtained form the regex match objects), the respective character indices in between these slices should be set to 1: >>> import numpy >>> characters_matches = numpy.zeros(10) >>> matches_spans = numpy.array([[2,4], [5,9]]) >>> for start, stop in matches_spans: ... characters_matches[start:stop] = 1 ... >>> characters_matches array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) Is there maybe a way tu achieve this in a numpy-only way - without the python loop? (I got the impression, the powerful slicing capabilities could make it possible, bud haven't found this kind of solution.) In the next piece of code all the character positions are evaluated with their "neighbourhood" and a kind of running proportions of the matched text parts are computed (the checks_distance could be generally up to the order of the half the text length, usually less : >>> >>> check_distance = 1 >>> floating_checks_proportions = [] >>> for i in numpy.arange(len(characters_matches)): ... lo = i - check_distance ... if lo < 0: ... lo = None ... hi = i + check_distance + 1 ... checked_sublist = characters_matches[lo:hi] ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) ... floating_checks_proportions.append(proportion) ... >>> floating_checks_proportions [0.0, 0.33333333333333331, 0.66666666666666663, 0.66666666666666663, 0.66666666666666663, 0.66666666666666663, 1.0, 1.0, 0.66666666666666663, 0.33333333333333331] >>> I'd like to ask about the possible better approaches, as it doesn't look very elegant to me, and I obviously don't know the implications or possible drawbacks of numpy arrays in some scenarios. the pattern for i in range(len(...)): is usually considered inadequate in python, but what should be used in this case as the indices are primarily needed? is something to be gained or lost using (x)range or np.arange as the python loop is (probably?) inevitable anyway? Is there some mor elegant way to check for the "underflowing" lower bound "lo" to replace with None? Is it significant, which container is used to collect the results of the computation in the python loop - i.e. python list or a numpy array? (Could possibly matplotlib cooperate better with either container?) And of course, are there maybe other things, which should be made better/differently? (using Numpy 1.6.2, python 2.7.3, win XP) Thanks in advance for any hints or suggestions, regards, Vlastimil Brom _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion