Hi, A partial answer to your questions:
On Mon, Jul 30, 2012 at 10:33 PM, Vlastimil Brom <vlastimil.b...@gmail.com>wrote: > Hi all, > I'd like to ask for some hints or advice regarding the usage of > numpy.array and especially slicing. > > I only recently tried numpy and was impressed by the speedup in some > parts of the code, hence I suspect, that I might miss some other > oportunities in this area. > > I currently use the following code for a simple visualisation of the > search matches within the text, the arrays are generally much larger > than the sample - the texts size is generally hundreds of kilobytes up > to a few MB - with an index position for each character. > First there is a list of spans(obtained form the regex match objects), > the respective character indices in between these slices should be set > to 1: > > >>> import numpy > >>> characters_matches = numpy.zeros(10) > >>> matches_spans = numpy.array([[2,4], [5,9]]) > >>> for start, stop in matches_spans: > ... characters_matches[start:stop] = 1 > ... > >>> characters_matches > array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) > > Is there maybe a way tu achieve this in a numpy-only way - without the > python loop? > (I got the impression, the powerful slicing capabilities could make it > possible, bud haven't found this kind of solution.) > > > In the next piece of code all the character positions are evaluated > with their "neighbourhood" and a kind of running proportions of the > matched text parts are computed (the checks_distance could be > generally up to the order of the half the text length, usually less : > > >>> > >>> check_distance = 1 > >>> floating_checks_proportions = [] > >>> for i in numpy.arange(len(characters_matches)): > ... lo = i - check_distance > ... if lo < 0: > ... lo = None > ... hi = i + check_distance + 1 > ... checked_sublist = characters_matches[lo:hi] > ... proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0)) > ... floating_checks_proportions.append(proportion) > ... > >>> floating_checks_proportions > [0.0, 0.33333333333333331, 0.66666666666666663, 0.66666666666666663, > 0.66666666666666663, 0.66666666666666663, 1.0, 1.0, > 0.66666666666666663, 0.33333333333333331] > >>> > Define a function for proportions: from numpy import r_ from numpy.lib.stride_tricks import as_strided as ast def proportions(matches, distance= 1): cd, cd2p1, s= distance, 2* distance+ 1, matches.strides[0] # pad m= r_[[0.]* cd, matches, [0.]* cd] # create a suitable view m= ast(m, shape= (m.shape[0], cd2p1), strides= (s, s)) # average return m[:-2* cd].sum(1)/ cd2p1 and use it like: In []: matches Out[]: array([ 0., 0., 1., 1., 0., 1., 1., 1., 1., 0.]) In []: proportions(matches).round(2) Out[]: array([ 0. , 0.33, 0.67, 0.67, 0.67, 0.67, 1. , 1. , 0.67, 0.33]) In []: proportions(matches, 5).round(2) Out[]: array([ 0.27, 0.36, 0.45, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.36]) > > I'd like to ask about the possible better approaches, as it doesn't > look very elegant to me, and I obviously don't know the implications > or possible drawbacks of numpy arrays in some scenarios. > > the pattern > for i in range(len(...)): is usually considered inadequate in python, > but what should be used in this case as the indices are primarily > needed? > is something to be gained or lost using (x)range or np.arange as the > python loop is (probably?) inevitable anyway? > Here np.arange(.) will create a new array and potentially wasting memory if it's not otherwise used. IMO nothing wrong looping with xrange(.) (if you really need to loop ;). > Is there some mor elegant way to check for the "underflowing" lower > bound "lo" to replace with None? > > Is it significant, which container is used to collect the results of > the computation in the python loop - i.e. python list or a numpy > array? > (Could possibly matplotlib cooperate better with either container?) > > And of course, are there maybe other things, which should be made > better/differently? > > (using Numpy 1.6.2, python 2.7.3, win XP) > My 2 cents, -eat > Thanks in advance for any hints or suggestions, > regards, > Vlastimil Brom > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion