On Jul 17, 2010, at 7:19 PM, "Richard D. Moores" <rdmoo...@gmail.com> wrote:
> I don't fully understand your "you always have two [chunks] in memory
> at once". Does that take care of the overlap I think I need? I don't
> want you or anyone to give me the the whole answer, but could you
> clarify a bit about keeping 2 chunks in memory? Let's say the chunks,
> in order, and without overlap, are A, B, C, D, E, ... So I'd have AB
> to search, then BC, then CD, then DE, ... ? But then what if I find an
> instance of, say, my father's birthdate in B when searching AB. Then
> I'd find the same instance again in BC. How do I prevent that from
> counting as 2 instances?
Imagine you have a series of Buildings in a block, and you are checking to see
if a series of 5 windows in a row is on ( even between buildings). You start at
one end of the block and you check the first five windows. If they are all on,
you have found a match. If not, you move forward one window. Now repeat. After
a few repetitions you will end up at the border between two buildings, and
after a few more you will be looking exclusively at the 2nd building. Then
after a few more you will be looking at the 3rd building and so on.
But are you ever looking at more than one building at a time? Yea but only
adjacent buildings and only for 4 steps. Once, when you have 4 windows from the
first, 1 from the second, again for 3 windows from the first, then 2 and then 1.
The way this translates into an algorithm is that instead of discarding your
old set of data as soon as you need to load another set, you hold onto it long
enough to get all the way engaged with that second set, and from there you can
free up the previous block's memory and move forward. And since you are using a
"sliding window" approach you shouldn't ever find one result twice. Although
note that if you overlap your searches as I said earlier, you can run into some
matching situations that are interesting. Say you search for 55 and your string
is 1827355513. You'll find 55 twice even though there are only 3 fives in the
search string. Just something to keep in mind.
Hope that helps!
-luke
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor