from:"Slaunger"

Re: How can I speed up a script that iterates over a large range (600 billion)?

2011-06-23 Thread Slaunger

As a general note concerning the use of Python on Project Euler, and
the one minute guideline.

For problems 1-100, each problem is easily solved in less than 1
minute processing time *if* the algorithms and math is done "right"
and with thought.

My project Euler scripts solves the first 100 problems with an average
of 0.91 secs/problem on a 4 y old std business Laptop running 32 bit
Win XP. Of these, one problem takes 18 secs.

For some of the later problems it certainly becomes very difficult to
do all problems within 1 minute if you use Python on an ordinary
processing platform. There you need to resort to a compiled language
like C, C++, or dedicated mathematical software packages, which
implement complex mathematical functions using highly efficient native
libraries.

Kim
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Skipping bytes while reading a binary file?

2009-02-06 Thread Slaunger


You might also want to have a look at a numpy memmap viewed as a
recarray.

from numpy import dtype, memmap, recarray
# Define your record in the file, 4bytes for the real value,
# and 4 bytes for the imaginary (assuming Little Endian repr)
descriptor = dtype([("r", "http://mail.python.org/mailman/listinfo/python-list

How to instantiate in a lazy way?

2008-12-01 Thread Slaunger

Hi comp.lang.python,

I am a novice Python programmer working on a project where I deal with
large binary files (>50 GB each)
consisting of a series of variable sized data packets.

Each packet consists of a small header with size and other information
and a much large payload containing the actual data.

Using Python 2.5, struct and numpy arrays I am capable of parsing such
a file quite efficiently into Header and Payload objects which I then
manipulate in various ways.

The most time consuming part of the parsing is the conversion of a
proprietary form of 32 bit floats into the IEEE floats used internally
in Python in the payloads.

For many use cases I am actually not interested in doing the parsing
of the payload right when I pass through it, as I may want to use the
attributes of the header to select the 1/1000 payload which I actually
have to look into the data for and do the resourceful float
conversion.

I would therefore like to have two variants of a Payload class. One
which is instantiated right away with the payload being parsed up in
the float arrays available as instance attributes and another variant,
where the Payload object at the time of instantiation only contains a
pointer to the place (f.tell()) in file where the payload begins. Only
when the non-existing attribute for a parsed up module is actully
accessed should the data be read, parsed up and the attribute created.

In pseudocode:

class PayloadInstant(object):
"""
   This is a normal Payload, where the data are parsed up when
instantiated
"""

@classmethod
def read_from_file(cls, f, size):
"""
Returns a PayloadInstant instance with float data parsed up
and immediately accessible in the data attribute.
Instantiation
is slow but after instantiation, access is fast.
"""

def __init___(self, the_data):
self.data = the_data

class PayloadOnDemand(object):
"""
Behaves as a PayloadInstant object, but instantiation is faster
as only the position of the payload in the file is stored
initially in the object.
Only when acessing the initially non-existing data attribute
are the data actually read and the attribure created and bound to
the instance.
This will actually be a little slower than in PayloadInstant as
the correct file position
has to be seeked out first.
On later calls the object has as efficient attribute access as
PayloadInstant
"""

@classmethod
def read_from_file(cls, f, size):
pos = f.tell()
f.seek(pos + size) #Skip to end of payload
return cls(pos)

# I probably need some __getattr__ or __getattribute__ magic
here...??

def __init__(self, a_file_position):
self.file_position = a_file_position

My question is this a a pyhtonic way to do it, and they I would like a
hint as to how to make the hook
inside the PayloadOnDemand class, such that the inner lazy creation of
the attribute is completely hidden from the outside.

I guess I could also just make a single class, and let an OnDemand
attribute decide how it should behave.

My real application is considerably more complicated than this, but I
think the example grasps the problem in a nutshell.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-01 Thread Slaunger

Slaunger wrote:
>
> class PayloadOnDemand(object):
>     """
>     Behaves as a PayloadInstant object, but instantiation is faster
>     as only the position of the payload in the file is stored
> initially in the object.
>     Only when acessing the initially non-existing data attribute
>     are the data actually read and the attribure created and bound to
> the instance.
>     This will actually be a little slower than in PayloadInstant as
> the correct file position
>     has to be seeked out first.
>     On later calls the object has as efficient attribute access as
> PayloadInstant
>     """
>
>     @classmethod
>     def read_from_file(cls, f, size):
>         pos = f.tell()
>         f.seek(pos + size) #Skip to end of payload
>         return cls(pos)

Extend with ref to file instead:
  return cls(f, pos)
>
>     # I probably need some __getattr__ or __getattribute__ magic
> # there...??

To answer my own rethorical question I guess I should do something
like this

def __getattr__(self, attr_name):
"""
Only called if attr_name is not in the __dict__ for the
instance
"""
if attr_name == 'data':
self.__dict__[attr_name] = read_data(self.f,
self.file_position)

>
>     def __init__(self, a_file_position):
>         self.file_position = a_file_position
>
and then I need to also store a reference to the file in the
constructor...

def __init__(self, a_file, a_file_position):
self.f = a_file
self.file_position = a_file_position

Have I understood correctly how to to it the on demand way?

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Emacs vs. Eclipse vs. Vim

2008-12-01 Thread Slaunger

On 29 Nov., 21:44, Josh <[EMAIL PROTECTED]> wrote:
> If you were a beginning programmer and willing to make an investment in
> steep learning curve for best returns down the road, which would you pick?
>
> I know this topic has been smashed around a bit already, but 'learning
> curve' always seems to be an arguement. If you feel that one is easier
> or harder than the others to learn feel free to tell, but let's not make
> that the deciding factor. Which one will be most empowering down the
> road as a development tool?
>
> Thanks in advance,
>
> JR

Many have written that they have no experience with using Eclipse.

Well, I have a little, and I just want to add my experience.

I am a novice Python programmer and I use Eclipse with the PyDev and
SubClipse
extensions which gives me a Python environment and integration with
Subversion,
which is the version control system I use. My OS is Windows XP and
Server 2003.

Other people are working on the same project using either eclipse on a
linux box
or another editor of choice.

Eclipse works very well for me. The facilities I like are:
* A handy object browser which lets be jump into the part of the code
I am interested in
 (I work with several, quite large modules).
* Autocompletion: When I write . it gives me a suggestion of the
methods/attributes available
  and the doc string (if available) is shown as a tool tip. It can
autogenerate a generic signature
  for a method call with the attribute names prefilled. Very handy as
I quite often forget the order
  of arguments.
* Unit tests: It is quite convenient to write and run unit tests in
the environment (unittest run targets)
* Refactoring: Intelligent rename for instance is handy for renaming
methods and attributes across modules
* Debugger: I debugger environment which works well for me with
watchlists, step-into, step over.
* Chroma-coded
* Auto-indentation
* Macros for block commenting, indenting and unindenting.
* Nice integrated diff toll which integrates well with subversion.
* And tons of other things which I have not explored yet. Like a
coverage run target for instance.
* Some Syntax checking

I am personally satisfied with the startup time and overall
responsiveness of Eclipse, but users of Emacs/Vim
may have other bars for responsiveness than me.

Sometimes I experience some problems with cascading inexplicable
errors ocurring in the IDE when running unit test suites.
This is annoying and they do not occur when I run the tests stand-
alone outside on the command line.

This IDE works well for me. I do not have an opinion about how it
works in comparison with Emacs and Vim,
just wanted to give my opinion on Eclipse and Python as this had not
been discussed so much.

On the prestige level it is certainly not considered as "cool" to use
Eclipse as Emacs/Vim where I am working.
I often hear the opinion that you are not a proper coder/hacker is you
do not master any of these classic editors.

I also think it depends much upon your coding style. Personally I
spend much more time thinking on "how" to implement
this-and-that, than actually coding. That may reflect that I am still
a novice Python Programmer.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-01 Thread Slaunger

On 1 Dec., 16:30, Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
>
> I wouldn't use __getattr__ unless you've got lots of attributes to
> overload.  __getattr__ is a recipe for getting yourself into trouble
> in my experience ;-)
>
> Just do it like this...
>
> class PayloadOnDemand(object):
>       def __init__(self, a_file, a_file_position):
>           self._data = None
>           self.f = a_file
>           self.file_position = a_file_position
>
>       @property
>       def data(self):
>           if self._data is None:
>               self._data = self.really_read_the_data()
>           return self._data
>
> then you'll have a .data attribute which when you read it for the
> first time it will populate itself.
>
> If None is a valid value for data then make a sentinel, eg
>
> class PayloadOnDemand(object):
>       sentinel = object()
>
>       def __init__(self, a_file, a_file_position):
>           self._data = self.sentinel
>           self.f = a_file
>           self.file_position = a_file_position
>
>       @property
>       def data(self):
>           if self._data is self.sentinel:
>               self._data = self.really_read_the_data()
>           return self._data
>
> --
> Nick Craig-Wood <[EMAIL PROTECTED]> --http://www.craig-wood.com/nick- Skjul 
> tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

OK, I get it. In my case I have four attributes to create when one of
them is accessed, I do not know if that is a lot of attributes;-) One
thing I like about the __getattr__ is that it is only called that one
single time where an attempt to read a data attribute fails because
the attribute name is not defined in the __dict__ of the object.

With the property methology you do the value check on each get, which
does not look as "clean". The property methology is also a little less
arcane I guess for less experienced Python programmers to understand
when re-reading the code.

What kind of trouble are you referring to in __getattr__? Is it
recursive calls to the method on accessing object attributes in that
method itself or other complications?

On a related issue, thank you for showing me how to use @property as a
decorator - I was not aware of that possibility, just gotta understand
how to decorate a setter and delete method as well, but I should be
able to look that up by myself...

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-02 Thread Slaunger

On 2 Dec., 11:30, Nick Craig-Wood <[EMAIL PROTECTED]> wrote:

>
> For 4 attributes I'd probably go with the __getattr__.
>
OK, I'll do that!

> Or you could easily write your own decorator to cache the result...
>
> Eghttp://code.activestate.com/recipes/363602/

Cool. I never realized I could write my own decorators!
I've so far only used them for
@classmethod, @staticmethod and stuff like that.
User defined decorators are nice and fun to do as well.
I just hope it will be understandable
in four years also...

>
> >  With the property methology you do the value check on each get, which
> >  does not look as "clean". The property methology is also a little less
> >  arcane I guess for less experienced Python programmers to understand
> >  when re-reading the code.
>
> Less magic is how I would put it.  Magic is fun to write, but a pain
> to come back to.  Over the years I find I try to avoid magic more and
> more in python.
>
Ah, I see. I hope you do not consider user defined decorators
"magic" then? ;-)

> >  What kind of trouble are you referring to in __getattr__? Is it
> >  recursive calls to the method on accessing object attributes in that
> >  method itself or other complications?
>
> Every time I write a __getattr__ I get tripped up by infinite
> recursion!  It is probably just me ;-)
>
And I will probably end up having the same initial problems, but I
found an example
here, which I may try to be inspired from.

http://western-skies.blogspot.com/2008/02/complete-example-of-getattr-in-python.html

> >  On a related issue, thank you for showing me how to use @property as a
> >  decorator - I was not aware of that possibility, just gotta understand
> >  how to decorate a setter and delete method as well, but I should be
> >  able to look that up by myself...
>
> I'm sure you will!
>
> http://www.python.org/doc/2.5.2/lib/built-in-funcs.html
>
Yeah, I just visited that page yesterday!

Again, thank you for your assistance, Nick!

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-02 Thread Slaunger

Just wanted to show the end result in its actual implementation!

I ended up *not* making a decorator, as I already had a good idea
about how to do it
using __getattr__

class PayloadDualFrqIQOnDemand(PayloadDualFrqIQ):
"""
This class has the same interface as its parent,
but unlike its parent, it is instantiated without
its payload parsed up in its instance attributes
Q1, I1, Q2 and I2. Instead it stores a reference to
the file object in which the Payload data can be
read, the file position and
the version of the payload data.

On accessing one of the data attributes, the actual
payload data are read from the file, and the reference to
the file object is unbound.
The constructor signature is therefore different from its
parent as it takes the file object, position and version
as arguments instead of the actual data.
"""

@classmethod
def _unpack_from_file(cls, f, samples, ver):
bytes = samples * cls.bytes_per_sample
initial_pos = f.tell()
f.seek(initial_pos + bytes) #Skip over the payload
return cls(f, initial_pos, samples, ver)

@classmethod
def unpack_from_ver3_file(cls, f, samples):
return cls._unpack_from_file(f, samples, ver=3)

@classmethod
def unpack_from_ver4_file(cls, f, samples):
return cls._unpack_from_file(f, samples, ver=4)

data_attr_names = frozenset(["Q1", "I1", "Q2", "I2"])

def __init__(self, a_file, a_file_position, samples, a_version):
"""
Returns an instance where the object knows where to
look for the payload but it will only be loaded on the
first attempt to read one of the data attributes
in a "normal" PayloadDualFrqIQ object.
"""
self.f = a_file
self.file_position = a_file_position
self.samples = samples
self.ver = a_version

def __getattr__(self, attr_name):
"""
Checks if a request to read a non-existing data attribute
has an attribute corresponding to one of the data attributes
in a normal PayloadDualFrqIQ object.

If true, the data attributes are created and bound to the
object using the file object instance, the file position
and the version.

It is a prerequisite that the file object is still open.
The function leaves the file object at the file position
when it entered the method

"""
cls = self.__class__
if attr_name in cls.data_attr_names:
initial_pos = self.f.tell()
try:
bytes = self.samples * cls.bytes_per_sample
self.f.seek(self.file_position)
buf = self.f.read(bytes)
if self.ver == 3:
bytes_to_data = cls._v3_byte_str_to_data
elif self.ver == 4:
bytes_to_data = cls._v4_byte_str_to_data
else:
raise TermaNotImplemented, \
"Support for ver. %d not implemented." %
self.ver
I1, Q1, I2, Q2 = bytes_to_data(buf)
self.__dict__["I1"] = I1
self.__dict__["Q1"] = Q1
self.__dict__["I2"] = I2
self.__dict__["Q2"] = Q2
return self.__dict__[attr_name]
finally:
# Restore file position
self.f.seek(initial_pos)
# Unbind lazy attributes
del self.f
del self.ver
del self.file_position
del self.samples

This seems to work out well. No infinite loops in __getattr__!

At least it passes the unit test cases I have come up with so far.

No guarantees though, as I may simply not have been smart enough to
bring forth unit test cases which make it crash.

Comments on the code is still appreciated though.

I am still a novice Python programmer, and I may have overlooked
more Pythonic ways to do it.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-02 Thread Slaunger

On 2 Dec., 17:50, George Sakkis <[EMAIL PROTECTED]> wrote:
>
> >                 I1, Q1, I2, Q2 = bytes_to_data(buf)
> >                 self.__dict__["I1"] = I1
> >                 self.__dict__["Q1"] = Q1
> >                 self.__dict__["I2"] = I2
> >                 self.__dict__["Q2"] = Q2
>
> with:
>
>     self.__dict__.update(zip(self.data_attr_names, bytes_to_data
> (buf)))
>
> where data_attr_names = ("I1", "Q1", "I2", "Q2") instead of a
> frozenset. A linear search in a size-4 tuple is unlikely to be the
> bottleneck with much I/O anyway.

Thank you for this little hint, George.
I've never used update on a dict and the zip function before.
This is a nice application of these functions.

And I agree, performance is not an issue by selecting a tuple instead
of a frozenset. The bytes_to_data function is the performance
bottleneck
in the actual application (implemented in parent class).

-- Slaunger

--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-03 Thread Slaunger

On 3 Dec., 11:30, Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
> >          cls = self.__class__
> >          if attr_name in cls.data_attr_names:
>
> self.data_attr_names should do instead of cls.data_attr_names unless
> you are overriding it in the instance (which you don't appear to be).

Yeah, I know. I just like the cls notation for code readability
because it
tells you that it is a class attribute, which is not instance-
dependent.

That may be legacy from my Java past, where I used to do it that way.
I know perfectly well that self. would do it. i just find that
notation
a little misleading

> >                  I1, Q1, I2, Q2 = bytes_to_data(buf)
> >                  self.__dict__["I1"] = I1
> >                  self.__dict__["Q1"] = Q1
> >                  self.__dict__["I2"] = I2
> >                  self.__dict__["Q2"] = Q2
> >                  return self.__dict__[attr_name]
>
> I think you want setattr() here - __dict__ is an implemetation detail
> - classes with __slots__ for instance don't have a __dict__.  I'd
> probably do this

Oh my, I did not know that. __slots__?? Something new I got to
understand.
But you are right. thanks!

>
>                    for k, v in zip(("I1", "Q1", "I2", "Q2"), 
> bytes_to_data(buf)):
>                        setattr(self, k, v)
>                    return object.__getattr__(self, attr_name)
>
And perhaps even more readable (how I do it now, no need to call
__getattr__ for an attribute, whcih is already there):
...
for attr, value in zip(cls.data_attr_names,
bytes_to_data(buf)):
setattr(self, attr, value)

return getattr(self, attr_name)

> :-)
>
> I would probably factor out the contents of the if statement into a
> seperate method, but that is a matter of taste!

Agreed. I thought about that myself for better code readability.

As a final comment, I have actually refacted the code quite a bit
as I have to do this ...OnDemand trick for several classes, which have
different data attributes with different names.

In this process I have actaully managed to isolate all the ...OnDemand
stuff
in an abstract PayloadOnDemand class

I can now use this "decorator-like"/helper class to very easily make
an ...OnDemand
variant of a class by just doing multiple inheritance - no
implementation:

class PayloadBaconAndEggsOnDemand(PayloadOnDemand,
PayloadBaconAndEggs): pass

I guess this somewhat resembles the decorator approach - I just could
not figure out
how to kake a general purpose decorator.

For this to actually work the "instant" PayloadBaconAndEggs class
simply has to define
and implement a few class attributes and static utility functions for
the unpacking.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to instantiate in a lazy way?

2008-12-03 Thread Slaunger

On 3 Dec., 15:30, Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
> Slaunger <[EMAIL PROTECTED]> wrote:
> >  On 3 Dec., 11:30, Nick Craig-Wood <[EMAIL PROTECTED]> wrote:
> > > > ? ? ? ? ?cls = self.__class__
> > > > ? ? ? ? ?if attr_name in cls.data_attr_names:
>
> > > self.data_attr_names should do instead of cls.data_attr_names unless
> > > you are overriding it in the instance (which you don't appear to be).
>
> >  Yeah, I know. I just like the cls notation for code readability
> >  because it tells you that it is a class attribute, which is not
> >  instance- dependent.
>
> >  That may be legacy from my Java past, where I used to do it that
> >  way.  I know perfectly well that self. would do it. i just find
> >  that notation a little misleading
>
> I quite like it... It looks in the instance, then in the class which I
> find to be very elegant - you can set a default in the class and
> override it on a per object or per subclass basis.
>
In principle yes.

In the particular case in which it is used I happen to know that it
would not make sense to have a different attribute at the instance
level.

That is, however quite hard to realize for outside reviewers based on
the small snippets I have revealed here. So, i certainly understand
your view point.

The cls notation sort of emphasizes that instances are not supposed to
override it (for me at least), and if they did, it would be ignored.
In other applications, I would use self. too.


>
> > > ? ? ? ? ? ? ? ? ? ?for k, v in zip(("I1", "Q1", "I2", "Q2"), 
> > > bytes_to_data(buf)):
> > > ? ? ? ? ? ? ? ? ? ? ? ?setattr(self, k, v)
> > > ? ? ? ? ? ? ? ? ? ?return object.__getattr__(self, attr_name)
>
> >  And perhaps even more readable (how I do it now, no need to call
> >  __getattr__ for an attribute, whcih is already there):
> >                  ...
> >                  for attr, value in zip(cls.data_attr_names,
> >  bytes_to_data(buf)):
> >                      setattr(self, attr, value)
>
> >                  return getattr(self, attr_name)
>
> I wrote the object.__getattr__ call to stop recursion troubles.  If
> you are sure you've set the attribute then plain getattr() is OK I
> guess...

Ah, Ok. I am sure and my unit tests verify my assurance.
>
> >  In this process I have actaully managed to isolate all the
> >  ...OnDemand stuff in an abstract PayloadOnDemand class
>
> >  I can now use this "decorator-like"/helper class to very easily
> >  make an ...OnDemand variant of a class by just doing multiple
> >  inheritance - no implementation:
>
> >  class PayloadBaconAndEggsOnDemand(PayloadOnDemand, PayloadBaconAndEggs): 
> > pass
>
> You've reinvented a Mixin class!
>
>  http://en.wikipedia.org/wiki/Mixin
>
> It is a good technique.
>

Wow, there is a name for it! It did not know that.

Hmm... I never really took the time to study those GoF design
patterns.

(I am a physicist after all... and really a programmer)

I guess I could save a lot of time constantly re-inventing the wheel.

Are there any good design pattern books focused on applications in
Python?
(Actually, I will post that question in a separate thread)

Once again, I am extremely pleased with your very thoughtful comments,
Nick. Thanks!

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Pythonic design patterns

2008-12-04 Thread Slaunger

Hi comp.lang.python

I am this novice Python programmer, who is not educated as a computer
scientist (I am a physicist), and who (regrettably) has never read the
GOF on design patterns.

I find myself spending a lot of time in Python making some designs, to
solve some task, which is the end turn out to be closely related to
well established design patterns / programming idioms, which other
users in this forum has been kind enough to point out. Only my
implementations are often not that clean, and I may call things
something different than the normal convention, which is a source of
confusion for myself and others trying to communicate with me.

I guess I could boost my productivity by learning these well-proven
and well-established design patterns by heart.

I was therefore wondering if you could recommend a book or a resource
concerning design patterns with special focus on the possibilities in
Python?

In that manner I may be able to both learn programming more pythonic
AND learn the design patterns.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Pythonic design patterns

2008-12-04 Thread Slaunger

Thank you all for sharing your views, links and suggestions on my
question. I see where this is getting, and I have extracted the
following points:

1. Many classic design patterns, especially the creational ones
(Factory, etc.) aren't really that useful in Python as the built-in
features in the language and the new style objects has ways of doing
these things far more natural than statically types languages.
2. Don't be religious about the design pattern and applying them too
frantically. They may look cool, but there is a great danger for over-
engineering and subsequent lower code readability, debuggability, and
maintainability.
3. Seek inspiration from well-written sample code.
4. It is good to know them, but be bold sometimes and do something
simpler.

I think I will buy the Cookbook, not for its design patterns, but more
for seing good examples of pythonic code and commonly used Python
programming idioms. I already have the "Python in a Nutshell", which I
like very much as a no-nonsense presentation of the language and its
batteries, and that book would probably be a valuable addition to my
limited collected (there are a few condensed pages concerning new-
style objects, which I read over and over again). Later, if I still
have the appetite for it and feel the need I might dive into some of
the other resources mentioned. As a matter of fact I have visited all
the links now and gotten some valuable inspirations.

-- 
--
http://mail.python.org/mailman/listinfo/python-list

Best way to report progress at fixed intervals

2008-12-09 Thread Slaunger

Hi comp.lang.python

I am a novice Python 2.5 programmer, who write some cmd line scripts
for processing large amounts of data.

I would like to have possibility to regularly print out the progress
made during the processing, say every 1 seconds, and i am wondering
what a proper generic way to do this is.

I have created this test example to show the general problem. Running
the script gives me the output:

Work through all 20 steps reporting progress every 1.0 secs...
Work step 0
Work step 1
Work step 2
Work step 3
Work step 4
Processed 4 of 20
Work step 5
Work step 6
Work step 7
Work step 8
Processed 8 of 20
Work step 9
Work step 10
Work step 11
Work step 12
Work step 13
Processed 13 of 20
Work step 14
Work step 15
Work step 16
Work step 17
Processed 17 of 20
Work step 18
Work step 19
Finished working through 20 steps

The script that does this is as follows:

testregularprogress.py:

"""
Test module for testing generic ways of displaying progress
information
at regular intervals.
"""
import random
import threading
import time

def work(i):
"""
Dummy process function, which takes a random time in the interval
0.0-0.5 secs to execute
"""
print "Work step %d" % i
time.sleep(0.5 * random.random())


def workAll(verbose=True, max_iter=20, progress_interval=1.0):

class _Progress(object):

def __init__(self):
self.no = 0
self.max = max_iter
self.start_timer = verbose

def __str__(self):
self.start_timer = True # I do not like this appraoch
return "Processed %d of %d" % (self.no, self.max)

p = _Progress()

def report_progress():
print p

if verbose:
print "Work through all %d steps reporting progress every
%3.1f secs..." % \
(max_iter, progress_interval)

for i in xrange(max_iter):
if p.start_timer :
p.start_timer = False # Let the progress instance set the
flag
timer = threading.Timer(progress_interval,
report_progress)
timer.start()
work(i)
p.no = i + 1

# Kill the last timer, which is still active at this time
timer.cancel()

if verbose:
print "Finished working through %d steps" % max_iter

if __name__ == "__main__":
workAll()

Quite frankly, I do not like what I have made! It is a mess,
responsibilities are mixed, and it seems overly complicated. But I
can't figure out how to do this right.

I would therefore like some feedback on this proposed generic "report
progress at regular intervals" approach presented here. What could I
do better?

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: When (and why) to use del?

2008-12-09 Thread Slaunger

On 9 Dec., 17:35, Albert Hopkins <[EMAIL PROTECTED]> wrote:
> I'm looking at a person's code and I see a lot of stuff like this:
>
>         def myfunction():
>             # do some stuff stuff
>             my_string = function_that_returns_string()
>             # do some stuff with my_string
>             del my_string
>             # do some other stuff
>             return
>
> and also
>
>         def otherfunction():
>             try:
>                 # some stuff
>             except SomeException, e:
>                 # more stuff
>                 del e
>             return
>
> I think this looks ugly, but also does it not hurt performance by
> preempting the gc?  My feeling is that this is a misuse of 'del'. Am I
> wrong?  Is there any advantage of doing the above?

I agree with you. In my mind there is no reason for such kinds of
deletes. The code seems to have been made by a person who úsually
programs in a language which does not have a garbage collector. I do
not know if it has any noticeable impact on the performance.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to report progress at fixed intervals

2008-12-09 Thread Slaunger

On 9 Dec., 19:35, [EMAIL PROTECTED] wrote:
>
> I felt like a little lunchtime challenge, so I wrote something that
> I think matches your spec, based on your sample code.  This is not
> necessarily the best implementation, but I think it is simpler and
> clearer than yours.  The biggest change is that the work is being
> done in the subthread, while the main thread does the monitoring.
>
Well, thank you for spending your lunch time break on my little
problem.

> It would be fairly simple to enhance this so that you could pass
> arbitrary arguments in to the worker function, in addition to
> or instead of the loop counter.
>
Yes, I agree

> ---
> """
> Test module for testing generic ways of displaying progress
> information at regular intervals.
> """
> import random
> import threading
> import time
>
> def work(i):
>      """
>      Dummy process function, which takes a random time in the interval
>      0.0-0.5 secs to execute
>      """
>      print "Work step %d" % i
>      time.sleep(0.5 * random.random())
>
> class Monitor(object):
>      """
>      This class creates an object that will execute a worker function
>      in a loop and at regular intervals emit a progress report on
>      how many times the function has been called.
>      """
>
>      def dowork(self):
>          """
>          Call the worker function in a loop, keeping track of how
>          many times it was called in self.no
>          """
>          for self.no in xrange(self.max_iter):
>              self.func(self.no)
>
>      def __call__(self, func, verbose=True, max_iter=20, 
> progress_interval=1.0):
I had to look up the meaning of __call__, to grasp this, but I get
your methology
>          """
>          Repeatedly call 'func', passing it the loop count, for max_iter
>          iterations, and every progress_interval seconds report how
>          many times the function has been called.
>          """
>          # Not all of these need to be instance variables, but they might
>          # as well be in case we want to reference them in an enhanced
>          # dowork function.
>          self.func = func
>          self.verbose = verbose
>          self.max_iter=max_iter
>          self.progress_interval=progress_interval
>
>          if self.verbose:
>              print ("Work through all %d steps reporting progress every "
>                  "%3.1f secs...") % (self.max_iter, self.progress_interval)
>
>          # Create a thread to run the loop, and start it going.
>          worker = threading.Thread(target=self.dowork)
>          worker.start()
>
>          # Monitoring loop.
>          loops = 0
>          # We're going to loop ten times per second using an integer count,
>          # so multiply the seconds parameter by 10 to give it the same
>          # magnitude.
>          intint = int(self.progress_interval*10)
Is this not an unnecessary complication?
>          # isAlive will be false after dowork returns
>          while worker.isAlive():
>              loops += 1
>              # Wait 0.1 seconds between checks so that we aren't chewing
>              # CPU in a spin loop.
>              time.sleep(0.1)
Why not just call this with progress_interval directly?
>              # when the modulus (second element of divmod tuple) is zero,
>              # then we have hit a new progress_interval, so emit the report.
And then avoid this if expression?
>              if not divmod(loops, intint)[1]:
>                  print "Processed %d of %d" % (self.no, self.max_iter)
>
>          if verbose:
>              print "Finished working through %d steps" % max_iter
>
> if __name__ == "__main__":
>      #Create the monitor.
>      monitor = Monitor()
>      #Run the work function under monitoring.
>      monitor(work)
I was unfamiliar with this notation, but now I understand it simply
invokes __call__. Thank you for showing me that!

OK. I agree this is a more elegant implementation, although I my mind,
I find it more natural if the reporting goes on in a subthread, but
that is a matter of taste, I guess. Anyway: Thank you again for
spending your lunch break on this!

-- Slaunger

--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to report progress at fixed intervals

2008-12-09 Thread Slaunger

On 10 Dec., 00:11, [EMAIL PROTECTED] wrote:
> >>          # Monitoring loop.
> >>          loops = 0
> >>          # We're going to loop ten times per second using an integer count,
> >>          # so multiply the seconds parameter by 10 to give it the same
> >>          # magnitude.
> >>          intint = int(self.progress_interval*10)
> > Is this not an unnecessary complication?
> >>          # isAlive will be false after dowork returns
> >>          while worker.isAlive():
> >>              loops += 1
> >>              # Wait 0.1 seconds between checks so that we aren't chewing
> >>              # CPU in a spin loop.
> >>              time.sleep(0.1)
> > Why not just call this with progress_interval directly?
>
> Because then the program make take up to progress_interval seconds to
> complete even after all the work is done.  For a long running program
> and a short progress_interval that might not matter, so yes, that would
> be a reasonable simplification depending on your requirements.
>
Ah, OK. With my timer.cancel() statement in my original proposal I
avoided that.
>
> > OK. I agree this is a more elegant implementation, although I my mind,
> > I find it more natural if the reporting goes on in a subthread, but
>
> You could pretty easily rewrite it to put the reporter in the subthread,
> it was just more natural to _me_ to put the worker in the subthread,
> so that's how I coded it.  Note, however, that if you were to write a
> GUI front end it might be important to put the worker in the background
> because on some OSes it is hard to update GUI windows from anything
> other than the main thread.  (I ran into this in a Windows GUI ap I
> wrote using wxPython).
>
Ah, yes, you right. For GUIs this is often quite important. I don't do
much GUI, so This is not something I had strongly in mind.

Br,

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to report progress at fixed intervals

2008-12-09 Thread Slaunger

On 10 Dec., 03:44, George Sakkis <[EMAIL PROTECTED]> wrote:
> On Dec 9, 11:40 am, Slaunger <[EMAIL PROTECTED]> wrote:
>
> > I would therefore like some feedback on this proposed generic "report
> > progress at regular intervals" approach presented here. What could I
> > do better?
>
> There is a pypi package that might do what you're looking for (haven't
> used it though):http://pypi.python.org/pypi/progressbar/.
>
> HTH,
> George

Thank you. I will keep that in mind, if I ever get to doing GUI-based
progress.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to report progress at fixed intervals

2008-12-09 Thread Slaunger

On 10 Dec., 08:03, Arnaud Delobelle <[EMAIL PROTECTED]> wrote:
> Slaunger <[EMAIL PROTECTED]> writes:
> > On 10 Dec., 03:44, George Sakkis <[EMAIL PROTECTED]> wrote:
> >> On Dec 9, 11:40 am, Slaunger <[EMAIL PROTECTED]> wrote:
>
> >> > I would therefore like some feedback on this proposed generic "report
> >> > progress at regular intervals" approach presented here. What could I
> >> > do better?
>
> >> There is a pypi package that might do what you're looking for (haven't
> >> used it though):http://pypi.python.org/pypi/progressbar/.
>
> >> HTH,
> >> George
>
> > Thank you. I will keep that in mind, if I ever get to doing GUI-based
> > progress.
>
> > -- Slaunger
>
> It's a text progress bar
>
> --
> Arnaud- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

Sorry, apparently I did not realize that at first sight. Anyway, I'd
rather avoid using further external modules besides the standard
batteries, as I would have to update several workstations with
different OSes (some of which I do not have admin access to) to use
the new module.

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to report progress at fixed intervals

2008-12-10 Thread Slaunger

On 10 Dec., 12:08, eric <[EMAIL PROTECTED]> wrote:
> Don't mind if I give my shot ?
>
> def work(i):
>     """
>     Dummy process function, which takes a random time in the interval
>     0.0-0.5 secs to execute
>     """
>     print "Work step %d" % i
>     time.sleep(0.5 * random.random())
>
> def workAll(work, verbose=True, max_iter=20, progress_interval=1.0):
>     '''
>     pass the real job as a callable
>     '''
>     progress = time.time()
>     for i in range(max_iter): # do the requested loop
>         work(i)
>         if verbose:
>             print "Work through all %d steps reporting progress every
> %3.1f secs..." %(max_iter, progress_interval)
>         interval = time.time()-progress
>         if interval>progress_interval:
>             print "Processed %d of %d at pace %s" % (i, max_iter,
> interval)
>             progress +=interval
>
> if __name__=="__main__":
>     workAll(work, False)
>
> It's works fine, and the "pace" is 'almost' the required one. You earn
> a no-thread-mess, and cleaner alg.
>
> But the loop is controlled by the caller (the WorkAll function) this
> is also called ass-backward algorithm, and you cannot expect
> algorithms to be assbackward (even if it's the best way to implement
> them).
>
> You can use the yield statement, to turn  easilly your alg into a
> nice, stopable assbackward algo:
>
> def work():
>     """
>     Dummy process function, which takes a random time in the interval
>     0.0-0.5 secs to execute
>     """
>     for i in range(50):
>         print "Work step %d" % i
>         time.sleep(0.5 * random.random())
>         yield i # kind-of "publish it and let the caller do whatever
> it want s (good practice anyway)
>
> def workAll(work, verbose=True, max_iter=20, progress_interval=1.0):
>     '''
>     pass the real job as a generator
>     '''
>     progress = time.time()
>     i = 0
>     for w in work: # do the requested loop
>         if verbose:
>             print "Work through all %d steps reporting progress every
> %3.1f secs..." %(max_iter, progress_interval)
>         interval = time.time()-progress
>         if interval>progress_interval:
>             print "Processed %d at pace %s" % (w, interval)
>             progress +=interval
>         if i>=max_iter:
>             work.close()
>         i+=1
>
> if __name__=="__main__":
>     workAll(work(), False)     # note the calling difference
>
> hope it helps.

Hi eric,

No, I certainly don't mind you giving a try ;-)

I actually started out doing something like your first version here,
but I am a little annoyed by the fact that the progress report
interval is not a sure thing. For instance in my real applications, I
have seldomly occuring work steps, which may take significantly longer
than the progress_interval, and I'd like to let it keep reporting
that, oh, I am still woking, albeit on the same work step, to maintain
a sense of the script being alive.

I like you generator approach though.

Anyway, I have now given my own proposal another iteration based on
what I have seen here (and my personal preferences), and I have come
up with this:

 src ===
"""
Test module for testing generic ways of displaying progress
information
at regular intervals.
"""
import random
import threading
import time

def work(i):
"""
Dummy process function, which takes a random time in the interval
0.0-0.5 secs to execute
"""
print "Work step %d" % i
time.sleep(0.5 * random.random())


def workAll(verbose=True, max_iter=20, progress_interval=1.0):

class ProgressReporter(threading.Thread):

def __init__(self):
threading.Thread.__init__(self)
self.setDaemon(True)
self.i = 0
self.max = max_iter
self.start_timer = verbose
self.progress_interval = progress_interval

def run(self):
while self.start_timer:
print "Processed %d of %d." % (self.i + 1, self.max)
    time.sleep(self.progress_interval)

p = ProgressReporter()

if verbose:
print "Work through all %d steps reporting every %3.1f
secs..." % \
(max_iter, progress_interval)
p.start()

for i in xrange(max_iter):
work(i)
p.i = i

if verbose:
print "Finished working through %d steps" % max_iter

if __name__ == "__main__":
workAll()

= end src 

I like this much better than my own first attempt in my initial post
on this thread.

-- Slaunger

--
http://mail.python.org/mailman/listinfo/python-list

How to "reduce" a numpy array using a costum binary function

2008-11-13 Thread Slaunger

I know there must be a simple method to do this.

I have implemented this function for calculating a checksum based on a
ones complement addition:

def complement_ones_checksum(ints):
"""
Returns a complements one checksum based
on a specified numpy.array of dtype=uint16
"""
result = 0x0
for i in ints:
result += i
result = (result  + (result >> 16)) & 0x
return result

It works, but is of course inefficient. My prfiler syas this is the
99.9% botteleneck in my applicaiton.

What is the efficient numpy way to do this?

No need to dwelve into fast inlining of c-code or Fortran and stuff
like that although that may give further performance imporevements.
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to "reduce" a numpy array using a costum binary function

2008-11-13 Thread Slaunger

It is always good to ask yourself a question.
I had forgooten about the reduce function

I guess this implementation

from numpy import *

def compl_add_uint16(a, b):
c = a + b
c += c >> 16
return c & 0x

def compl_one_checksum(uint16s):
return reduce(compl_add_uint16, uint16s, 0x)

is somewhat better?

But is it the best way to do it with numpy?

In [2]: hex(compl_add_uint16(0xF0F0, 0x0F0F))
Out[2]: '0x'

In [3]: hex(compl_add_uint16(0x, 0x0001))
Out[3]: '0x1'

In [5]: hex(compl_one_checksum(array([], dtype=uint16)))
Out[5]: '0x0'

In [6]: hex(compl_one_checksum(array([0xF0F0, 0x0F0F, 0x0001],
dtype=uint16)))
Out[6]: '0x1L'
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to "reduce" a numpy array using a costum binary function

2008-11-13 Thread Slaunger

On 13 Nov., 22:48, Robert Kern <[EMAIL PROTECTED]> wrote:
> Slaunger wrote:
> > It is always good to ask yourself a question.
> > I had forgooten about the reduce function
>
> > I guess this implementation
>
> > from numpy import *
>
> > def compl_add_uint16(a, b):
> >     c = a + b
> >     c += c >> 16
> >     return c & 0x
>
> > def compl_one_checksum(uint16s):
> >     return reduce(compl_add_uint16, uint16s, 0x)
>
> > is somewhat better?
>
> > But is it the best way to do it with numpy?
>
> It's not too bad, if you only have 1D arrays to worry about (or you are only
> concerned with reducing down the first axis). With a Python-implemented
> function, there isn't much that will get you faster.

Yes, I only have 1D arrays in this particular problem.

>
> My coworker Ilan Schnell came up with a neat way to use PyPy's RPython->C
> translation scheme and scipy.weave's ad-hoc extension module-building
> capabilities to generate new numpy ufuncs (which have a .reduce() method)
> implemented in pure RPython.
>
>    http://conference.scipy.org/proceedings/SciPy2008/paper_16/full_text.pdf
>    http://svn.scipy.org/svn/scipy/branches/fast_vectorize/
>

OK. Thanks. I am still a rather inexperienced SciPy and Python
programmer, and I must admit
that right now this seems to be in the advanced end for me. But, now
that you
mention weave I have given it a thought to reimplement my binary compl
add function
shown above using weave - if my profiler says that is where I should
be spending my
time optimizing.

> If you have more numpy questions, please join us on the numpy mailing list.
>
>    http://www.scipy.org/Mailing_Lists
>

Thank you for directing me to that numpy specific mailing list.
I have been on scipy.org many times, but apparently overlooked
that very prominent link to mailing lists.

Slaunger

> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>   that is made terrible by our own mad attempt to interpret it as though it 
> had
>   an underlying truth."
>    -- Umberto Eco- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

--
http://mail.python.org/mailman/listinfo/python-list

Best practise hierarchy for user-defined exceptions

2008-11-17 Thread Slaunger

Hi there,

I am a newcomer to Pyhton coming from Java working on a relatively
large Pyhton project with several packages and modules. To improve
exception handling I would like to introduce some user-defined
exceptions to distinguish between exceptions raised in self-written
code as compared to std libray modules used there-in.

Say, for instance I would like to define a MyParseError exception to
indicate that something has gone wrong while parsing a byte string, a
file, or while packing into or unpacking from a struct.Struct
instance.

Here is my stub-implemented idea on how to do it so far, which is
inspired by how I would have done it in Java (but which may not be
very Pythonic??):

class MyException(Exception):

pass

class MyStandardError(MyException, StandardError):

pass

class MyParseError(MyStandardError, ValueError):

   pass

Some comments and questions

1. The hierarchy is deliberately deep and maps to the std library such
that it is easier to extend
2. The implementations are empty but can be extended with hook for
logging, statistics, etc.
3. I use multiple inheritance in the two sub-classes. I have not tried
that before. Is this A Good Thing or A Bad Thing to do?
4. Which __xx__ methods would you normally implement for the user-
defined exception classes? I was thinking of __str__, for example? Is
there a recommended __str__ idiom to use for that?

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise hierarchy for user-defined exceptions

2008-11-17 Thread Slaunger

On 17 Nov., 13:05, "Chris Rebert" <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 17, 2008 at 3:47 AM, Slaunger <[EMAIL PROTECTED]> wrote:
>
> > .
> > Here is my stub-implemented idea on how to do it so far, which is
> > inspired by how I would have done it in Java (but which may not be
> > very Pythonic??):
>
> > class MyException(Exception):
>
> >    pass
>
> The above class probably isn't necessary. You don't need to
> overgeneralize this much.
>
Seems reasonable.
> > class MyStandardError(MyException, StandardError):
>
> >    pass
>
> Rename the previous class to just MyError in light of removing the other 
> class.
>
>
OK.
>
> > class MyParseError(MyStandardError, ValueError):
>
> >   pass
>
> This technique is very Pythonic. Subclass from both the exception
> class for your module and from a built-in one if there's an applicable
> one.
>
>
OK. One thing I *like* about it is that I inherit properties of
ValueError.

One thing I feel *a little uneasy about* is that if I do something
like this

try:
do_something_which_raises_MyParseError_which_I_have_overlooked()
do_something_where_I_know_a_ValueError_can_be_raised()
catch ValueError:
handle_the_anticipated_ValueError_from_std_lib()
finally:


I will not notice that it was an unanticpated condition in my own
code, which caused the ValueError
to be raised. If I had just inherited from MyError, it would fall
through (which I would prefer)

Once I had seen a trace back to the unanticipated MyParseError I would
of course correct the code to

try:
do_something_where_I_now_know_MyParseError_can_be_raised()
do_something_where_I_know_a_ValueError_can_be_raised()
catch MyParseError:
handle_the_anticipated_MyParseError_generated_in_own_code()
catch ValueError:
handle_the_anticipated_ValueError_from_std_lib()
finally:


Is the above multiple catch methology pythonic as well?

On the other hand, with multiple inheritance this works:

try:
do_something_where_I_know_a_MyParseError_can_be_raised()
do_something_where_I_know_a_ValueError_can_be_raised()
catch ValueError:
handle_a_MyParseError_or_a_ValueError_in_the_same_manner()
finally:


which is nice is you are very aware that MyParseError descends from
ValueError (which may not be self-evident)


>
> > Some comments and questions
>
> > 1. The hierarchy is deliberately deep and maps to the std library such
> > that it is easier to extend
>
> Zen of Python: Flat is better than nested.
> This is part of the reason I recommend removing the one class.

Ah, OK, have to adjust my mindset a little. Willing to accomodate
though... ;-)
>
> > 3. I use multiple inheritance in the two sub-classes. I have not tried
> > that before. Is this A Good Thing or A Bad Thing to do?
>
> Good in this case, just be careful to use super() if you override any methods.

I will!

>
> > 4. Which __xx__ methods would you normally implement for the user-
> > defined exception classes? I was thinking of __str__, for example? Is
> > there a recommended __str__ idiom to use for that?
>
> You might override __init__ if you want to store additional
> information about the cause of the exception (e.g. location of parse
> error, name of the rule the error occurred in, etc).
> The __str__ inherited from Exception is usually sufficient, but you
> can override it if you want to. it's a judgement call IMHO.
>

OK. Seems pretty straight-forward. I like Python more and more.

> Cheers,
> Chris

That was a very useful answer. Thank you, Chris for taking your time
to reply on my questions.

Cheers,
-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Best strategy for finding a pattern in a sequence of integers

2008-11-21 Thread Slaunger

Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance

The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger

--
http://mail.python.org/mailman/listinfo/python-list

Re: Best strategy for finding a pattern in a sequence of integers

2008-11-21 Thread Slaunger

>
> > I am pretty sure I can figure out how to do that, but I would like to
> > have some guidance on the most pythonic approach to this.
>
> Then it would be a good starting point to write some code. Then you
> could post it and ask how it can be made more 'pythonic'.
>
That is actually a good point. I will do that.
-- 
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best strategy for finding a pattern in a sequence of integers

2008-11-21 Thread Slaunger

On 21 Nov., 23:36, Mensanator <[EMAIL PROTECTED]> wrote:
> Your rules appear to be incomplete and inconsistent.
OK. Let me try to clarify then...

> > 3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
> > followed by 0
>
> But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
> Can 0's occur without 3's, such as 0, 0, 0?
Yes, 3s always comes in pairs. So, 3, 0, 3, 0 is not allowed.
And of the numbers 0, 3, and 9; 0 will always be the first after the
pair of 3s

>
> > 4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
> > always followed by 1
> > 5. There may be other numbers interspersed within the sequence, but
> > they can be ignored
>
> So, I can have 3, 3, 0, 7, 3, 3, 0?
Yes, there is a point I did not mention propery in my first
description:
The number 7 for instance could appear in that position, but it would
not be repetitive;
as a matter of fact these other numbers can be filtered away before
looking for the pattern,
so let us just forgot about those.

>
> What if the 7 occurs after the pair of 3's? Is the number following
> the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
No, it would have to be 3, 3, 0, 7, 3, 3, 0, it is sequeezed in - but
as mentioned they can be prefiltered out of the problem
>
> > 7. The beginning of a pattern is marked by the transition from oner
> > pattern to the other.
>
> Can there be an ignored number between the patterns? Is
> 9,3,3,0,7,10,6,6,1
> legal? If NO, you violate Rule 5. If YES, you violate the second Rule
> 7.
Yes you are right. This complication is again eliminated by
prefiltering "other" numbers out

-- Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best strategy for finding a pattern in a sequence of integers

2008-11-21 Thread Slaunger

On 21 Nov., 18:10, Gerard flanagan <[EMAIL PROTECTED]> wrote:
> Slaunger wrote:
> > Hi all,
>
> > I am a Python novice, and I have run into a problem in a project I am
> > working on, which boils down to identifying the patterns in a sequence
> > of integers, for example
>
> >  1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> > 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> > 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> > 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> > 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...
>
> > I want to process this such that I get out two patterns, like:
> > (9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
> > and
> > (10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)
>
> Maybe:
>
> #-
> data = '''
> 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
> 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''
>
> data = [int(x) for x in data.split()]
>
> from itertools import groupby
>
> S1 = [0, 3, 9]
>
> s = set()
> for k, g in groupby(data, lambda x: x in S1):
>      seq = tuple(g)
>      # maybe the next line should be 'if 9 in seq or 10 in seq'?
>      if seq[0] in [9, 10]:
>          s.add(seq)
>
> print s
> #--
> set(
> [(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
> (10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])
>
> hth
>
> G.
Hi Gerard,
This definitely looks like a path to walk along, and I think your code
does the trick, although I have to play a little around with the
groupby method, of which I had no prior knowledge. I think I will
write some unit test cases to stress test you concept (on Monday, when
I am back at work). I appreciate your almost full implementation - it
would have sufficed to point me to the itertools module, and then I
think I would have figured out.
-- 
--
http://mail.python.org/mailman/listinfo/python-list

Best practise implementation for equal by value objects

2008-08-06 Thread Slaunger

Hi,

I am new here and relatively new to Python, so be gentle:

Is there a recommended generic implementation of __repr__ for objects
equal by value to assure that eval(repr(x)) == x independet of which
module the call is made from?

Example:

class Age:

def __init__(self, an_age):
self.age = an_age

def __eq__(self, obj):
self.age == obj.age

def __repr__(self):
return self.__class__.__name__ + \
   "(%r)" % self.age

age_ten = Age(10)
print repr(age_ten)
print eval(repr(age_ten))
print eval(repr(age_ten)).age

Running this gives

Age(10)
Age(10)
10

Exactly as I want to.

The problem arises when the Age class is iomported into another module
in another package as then there is a package prefix and the above
implementation of __repr__ does not work.

I have then experimented with doing somthing like

def __repr__(self):
return self.__module__ + '.' + self.__class__.__name__ +
"(%r)" % self.age

This seems to work when called from the outside, but not from the
inside of the module. That is, if I rerun the script above the the
module name prefixed to the representation I get the following error

Traceback (most recent call last):
  File "valuetest.py", line 15, in 
print eval(repr(age_ten))
__main__.Age(10)
  File "", line 1, in 
NameError: name '__main__' is not defined

This is pretty annoying.

My question is: Is there a robust generic type of implementation of
__repr__ which I can use instead?

This is something I plan to reuse for many different Value classes, so
I would like to get it robust.

Thanks,
Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise implementation for equal by value objects

2008-08-06 Thread Slaunger

On 6 Aug., 21:36, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Slaunger wrote:
> > Hi,
>
> > I am new here and relatively new to Python, so be gentle:
>
> > Is there a recommended generic implementation of __repr__ for objects
> > equal by value to assure that eval(repr(x)) == x independet of which
> > module the call is made from?
>
> The CPython implementation gives up on that goal and simply prints
>  for at least two reasons ;-).
>
> 1. In general, it require fairly sophisticated analysis of __init__ to
> decide what representation of what attributes to include and decide if
> the goal is even possible.  If an attribute is an instance of a user
> class, then *its* __init__ needs to be analyzed.  If an attribute is a
> module, class, or function, there is no generic evaluable representation.

OK, the situation is more complicated than that then. In the case here
though,
the attributes would always be sinmple bulit-in types, where
eval(repr(x))==x
or, where the attribute is a user-defined equal-by-value class, that I
have
control over.

The classes I am making as struct type classes with some added
functionlity for
human readable string representation, packing into a stream or
unpacking from a stream
using a "private" class Struct.

I come from a Java and JUnit world, where, if I am used to always
overriding the default reference based implementations of the
equals(), toString(),
and hashCode() methods for "equals-by-value" objects such that they
work well
and efficient in, e.g., hash maps.

With my swich-over to Python, I looked for equivalent features and
stumbled over the
eval(repr(x))==x recommendation. It is not that I actually (yet) need
the repr implementations,
but mostly because I find the condition very useful in PyUnit to check
in a test that I have remembered
to initialize all instance fields in __init__ and that I have
remembered to include all relevant
attributes in the __eq__ implementation.

Whereas this worked fine in a unit test module dedicated to only test
the specific module, the test failed
when called from other test package modules, wrapping the unit tests
from several unit test modules.

>
> 2. Whether eval(repr(x)) even works (returns an answer) depends on
> whether the name bindings in the globals and locals passed to eval
> (which by default are the globals and locals of the context of the eval
> call) match the names used in the repr.  You discovered that to a first
> approximation, this depends on whether the call to repr comes from
> within or without the module containing the class definition.  But the
> situation is far worse.  Consider 'import somemod as m'.  Even if you
> were able to introspect the call and determine that it did not come from
> somemod**, prepending 'somemod.' to the repr *still* would not work.
> Or, the call to repr could come from one context, the result saved and
> passed to another context with different name bindings, and the eval
> call made there.  So an repr that can be eval'ed in any context is hopeless.
>
Ok, nasty stuff

> If this is a practical rather than theoretical question, then use your
> first repr version that uses the classes definition name and only eval
> the result in a context that has that name bound to the class object.
>
> from mymod import Age
> #or
> import mymod
> Age = mymod.Age
>
> #in either case
> eval(repr(Age(10))) == Age(10)
>
> > class Age:
>
> >     def __init__(self, an_age):
> >         self.age = an_age
>
> >     def __eq__(self, obj):
> >         self.age == obj.age
>
> >     def __repr__(self):
> >         return self.__class__.__name__ + \
> >                "(%r)" % self.age
>
Yes, it is most from a practicl point of view, altough I was surprised
that I could not find more material on it in the Python documentation
or mailing groups, and I moight just do what you suggest in the unit
test modules to at least make it robust in that context.

Hmm... a bit of a dissapointment for me that this cannot be done
cleaner
> **
> While such introspection is not part of the language, I believe one
> could do it in CPython, but I forgot the details.  There have been
> threads like 'How do I determine the caller function' with answers to
> that question, and I presume the module of the caller is available also.
OK, I think CPython, for the moment, is too much new stuff to dig into
right now.
Just grasping some of all the possibilities in the API, and how to do
things the right way
is giving me enough challenges for now...

>
> Terry Jan Reedy

Again, thank you for your thorough answer,

Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise implementation for equal by value objects

2008-08-06 Thread Slaunger

On 6 Aug., 21:46, John Krukoff <[EMAIL PROTECTED]> wrote:
> On Wed, 2008-08-06 at 05:50 -0700, Slaunger wrote:
> > Hi,
>
> > I am new here and relatively new to Python, so be gentle:
>
> > Is there a recommended generic implementation of __repr__ for objects
> > equal by value to assure that eval(repr(x)) == x independet of which
> > module the call is made from?
>
> > Example:
>
> > class Age:
>
> >     def __init__(self, an_age):
> >         self.age = an_age
>
> >     def __eq__(self, obj):
> >         self.age == obj.age
>
> >     def __repr__(self):
> >         return self.__class__.__name__ + \
> >                "(%r)" % self.age
>
> > age_ten = Age(10)
> > print repr(age_ten)
> > print eval(repr(age_ten))
> > print eval(repr(age_ten)).age
>
> > Running this gives
>
> > Age(10)
> > Age(10)
> > 10
>
> > Exactly as I want to.
>
> > The problem arises when the Age class is iomported into another module
> > in another package as then there is a package prefix and the above
> > implementation of __repr__ does not work.
>
> > I have then experimented with doing somthing like
>
> >     def __repr__(self):
> >         return self.__module__ + '.' + self.__class__.__name__ +
> > "(%r)" % self.age
>
> > This seems to work when called from the outside, but not from the
> > inside of the module. That is, if I rerun the script above the the
> > module name prefixed to the representation I get the following error
>
> > Traceback (most recent call last):
> >   File "valuetest.py", line 15, in 
> >     print eval(repr(age_ten))
> > __main__.Age(10)
> >   File "", line 1, in 
> > NameError: name '__main__' is not defined
>
> > This is pretty annoying.
>
> > My question is: Is there a robust generic type of implementation of
> > __repr__ which I can use instead?
>
> > This is something I plan to reuse for many different Value classes, so
> > I would like to get it robust.
>
> > Thanks,
> > Slaunger
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> Are you really sure this is what you want to do, and that a less tricky
> serialization format such as that provided by the pickle module wouldn't
> work for you?

Well, it is not so much yet for serialization (although i have not yet
fully understood the implications), it is more because
I think the eval(repr(x))==x is a nice unit test to make sure my
constructor and equals method is implemented correctly (that I have
rememebered all attributes in their implementations).

As mentioned above, I may go for a more pragmatic approach, where i
only use repr if it "standard" imported

Cheers,
Slaunger

>
> --
> John Krukoff <[EMAIL PROTECTED]>
> Land Title Guarantee Company- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise implementation for equal by value objects

2008-08-06 Thread Slaunger

On 7 Aug., 04:34, Steven D'Aprano <[EMAIL PROTECTED]
cybersource.com.au> wrote:
> On Wed, 06 Aug 2008 05:50:35 -0700, Slaunger wrote:
> > Hi,
>
> > I am new here and relatively new to Python, so be gentle:
>
> > Is there a recommended generic implementation of __repr__ for objects
> > equal by value to assure that eval(repr(x)) == x independet of which
> > module the call is made from?
>
> In general, no.
>
> ...
>
OK.

> > My question is: Is there a robust generic type of implementation of
> > __repr__ which I can use instead?
>
> > This is something I plan to reuse for many different Value classes, so I
> > would like to get it robust.
>
> I doubt you could get it that robust, nor is it intended to be.
>
> eval(repr(obj)) giving obj is meant as a guideline, not an invariant --
> there are many things that can break it. For example, here's a module
> with a simple class:

OK, I had not fully understood the implications of 'not' implementing
__repr__
such that eval(repr(x)) == x, so I just tried to make it work to make
sure
life would be easy for me and my object as I went further into the
Python jungle

As mentioned above, i also find the eval(repr(x))==x condition
convenient from
a unit test point of view.

>
> # Parrot module
> class Parrot(object):
>     def __repr__(self):
>         return "parrot.Parrot()"
>     def __eq__(self, other):
>         # all parrots are equal
>         return isinstance(other, Parrot)
>
> Now let's use it:
>
> >>> import parrot
> >>> p = parrot.Parrot()
> >>> s = repr(p)
> >>> assert eval(s) == p
> >>> del parrot
> >>> assert eval(s) == p
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 1, in 
> NameError: name 'parrot' is not defined
>

OK, I see, but this isn't exactly eval(repr(x))==x but
s = repr(x)
eval(s) == x

so, of course, is s is deleted in between it won't work.

In my implementation I only expect this should work as a one-liner.

> If you look at classes in the standard library, they often have reprs
> like this:
>
> >>> repr(timeit.Timer())
>
> ''
>

Yes, I noticed that. But the example here is also an object, which is
equal by reference, not value. And for these
it does not make so much sense to evaluate the representation.

> Certainly you can't expect to successfully eval that!
>
> I believe the recommendation for eval(repr(obj)) to give obj again is
> meant as a convenience for the interactive interpreter, and even there
> only for simple types like int or list. If you can do it, great, but if
> it doesn't work, so be it. You're not supposed to rely on it, and it's
> not meant as a general way to serialize classes.
>
> --
> Steven

OK, I will put less emphasis on it in the future.

Thank you for taking your time to answer.

Slaunger
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise implementation for equal by value objects

2008-08-08 Thread Slaunger

On 7 Aug., 21:25, Paul Rubin  wrote:
> Terry Reedy <[EMAIL PROTECTED]> writes:
> > So when the initializers for instances are all 'nice' (as for range),
> > go for it (as in 'Age(10)').  And test it as you are by eval'ing the
> > rep. Just accept that the eval will only work in contexts with the
> > class name bound to the class.  For built-in like range, it always is,
> > by default -- unless masked by another assignment!
>
> Eval is extremely dangerous.  Think of data from untrusted sources,
> then ask yourself how well you really know where ALL your data came
> from.  It's preferable to avoid using it that way.  There have been a
> few "safe eval" recipes posted here and at ASPN.  It would be good if
> one of them made it into the standard library.  Note that pickle
> (which would otherwise be an obious choice for this) has the same
> problems, though not as severely as flat-out evalling something.

Thank you for pointing out the dangers of eval. I think you are right
to
caution about it. In my particular case it is a closed-loop system, so
no
danger there, but that certainly could have been an issue.

That caution should perhaps be mentioned in
http://docs.python.org/lib/built-in-funcs.html
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best practise implementation for equal by value objects

2008-08-08 Thread Slaunger

On 7 Aug., 21:19, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Slaunger wrote:
> > On 6 Aug., 21:36, Terry Reedy <[EMAIL PROTECTED]> wrote:
> > OK, the situation is more complicated than that then. In the case here
> > though,
> > the attributes would always be sinmple bulit-in types, where
> > eval(repr(x))==x
> > or, where the attribute is a user-defined equal-by-value class, that I
> > have
> > control over.
>
> I think most would agree that a more accurate and informative
> representation is better than a general representation like Pythons
> default.  For instance,
>  >>> a=range(2,10,2) # 3.0
>  >>> a
> range(2, 10, 2)
>
> is nicer than .
>
> So when the initializers for instances are all 'nice' (as for range), go
> for it (as in 'Age(10)').  And test it as you are by eval'ing the rep.
> Just accept that the eval will only work in contexts with the class name
>   bound to the class.  For built-in like range, it always is, by default
> -- unless masked by another assignment!
>
OK, i am encouraged to carry on my quest with the eval(repr)) for my
'nice' classes.
I just revisited the documentation for eval and noticed that there are
optional globals
and locals name space variables, that one could specify:

http://docs.python.org/lib/built-in-funcs.html

Quite frankly I do not understand how to make use of these parameters,
but it is my feeling
that if I enforce a convention of always specifying the globals/locals
parameter in a specific
manner:
assert eval(repr(x), globals, locals) == x
would work independent of how I have imported the module under test.

Now, I just need to figure out if this is right and how to specify the
globals and locals if that is not too cumbersome...
or maybe I am just over-engineering...


--
http://mail.python.org/mailman/listinfo/python-list

mmap 2GB allocation limit on Win XP, 32-bits, Python 2.5.4

2009-07-24 Thread Slaunger

OS: Win XP SP3, 32 bit
Python 2.5.4

Hi I have run into some problems with allocating numpy.memmaps
exceeding and accumulated size of about 2 GB. I have found out that
the real problem relates to numpy.memmap using mmap.mmap

I've written a small test program to illustrate it:

import itertools
import mmap
import os

files = []
mmaps = []
file_names= []
mmap_cap=0
bytes_per_mmap = 100 * 1024 ** 2
try:
for i in itertools.count(1):
file_name = "d:/%d.tst" % i
file_names.append(file_name)
f = open(file_name, "w+b")
files.append(f)
mm = mmap.mmap(f.fileno(), bytes_per_mmap)
mmaps.append(mm)
mmap_cap += bytes_per_mmap
print "Created %d writeable mmaps containing %d MB" % (i,
mmap_cap/(1024**2))

#Clean up
finally:
print "Removing mmaps..."
for mm, f, file_name in zip(mmaps, files, file_names):
mm.close()
f.close()
os.remove(file_name)
print "Done..."


which creates this output

Created 1 writeable mmaps containing 100 MB
Created 2 writeable mmaps containing 200 MB

Created 17 writeable mmaps containing 1700 MB
Created 18 writeable mmaps containing 1800 MB
Removing mmaps...
Done...
Traceback (most recent call last):
  File "C:\svn-sandbox\research\scipy\scipy\src\com\terma\kha
\mmaptest.py", line 16, in 
mm = mmap.mmap(f.fileno(), bytes_per_mmap)
WindowsError: [Error 8] Not enough storage is available to process
this command

There is more than 25 GB of free space on drive d: at this stage.

Is it a bug or a "feature" of the 32 bit OS?

I am surprised about it as I have not found any notes about these
kinds of limitations in the documentation.

I am in dire need of these large memmaps for my task, and it is not an
option to change OS due to other constraints in the system.

Is there anything I can do about it?

Best wishes,
Kim
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: mmap 2GB allocation limit on Win XP, 32-bits, Python 2.5.4

2009-07-27 Thread Slaunger

On 27 Jul., 13:21, Dave Angel  wrote:
> (forwarding this message, as the reply was off-list)
>
>
>
> Kim Hansen wrote:
> > 2009/7/24 Dave Angel :
>
> >> It's not a question of how much disk space there is, but how much virtual
> >> space 32 bits can address.  2**32 is about 4 gig, and Windows XP reserves
> >> about half of that for system use.  Presumably a 64 bit OS would have a 
> >> much
> >> larger limit.
>
> >> Years ago I worked on Sun Sparc system which had much more limited shared
> >> memory access, due to hardware limitations.  So 2gig seems pretty good to
> >> me.
>
> >> There is supposed to be a way to tell the Windows OS to only use 1 gb of
> >> virtual space, leaving 3gb for application use.  But there are some
> >> limitations, and I don't recall what they are.  I believe it has to be done
> >> globally (probably in Boot.ini), rather than per process.  And some things
> >> didn't work in that configuration.
>
> >> DaveA
>
> > Hi Dave,
>
> > In the related post I did on the numpy discussions:
>
> >http://article.gmane.org/gmane.comp.python.numeric.general/31748
>
> > another user was kind enough to run my test program on both 32 bit and
> > 64 bit machines. On the 64 bit machine, there was no such limit, very
> > much in line with what you wrote. Adding the /3GB option in boot.ini
> > did not increase the available memory as well. Apparently, Python
> > needs to have been compiled in a way, which makes it possible to take
> > advantage of that switch and that is either not the case or I did
> > something else wrong as well.
>
> > I acknowledge the explanation concerning the address space available.
> > Being an ignorant of the inner details of the implementation of mmap,
> > it seems like somewhat an "implementation detail" to me that such an
> > address wall is hit. There may be some good arguments from a
> > programming point of view and it may be a relative high limit as
> > compared to other systems but it is certainly at the low side for my
> > application: I work with data files typically 200 GB in size
> > consisting of datapackets each having a fixed size frame and a
> > variable size payload. To handle these large files, I generate an
> > "index" file consisting of just the frames (which has all the metadata
> > I need for finding the payloads I am interested in) and "pointers" to
> > where in the large data file each payload begins. This index file can
> > be up to 1 GB in size and at times I need to have access to two of
> > those at the same time (and then i hit the address wall). I would
> > really really like to be able to access these index files in a
> > read-only manner as an array of records on a file for which I use
> > numpy.memmap (which wraps mmap.mmap) such that I can pick a single
> > element, extract, e.g., every thousand value of a specific field in
> > the record using the convenient indexing available in Python/numpy.
> > Now it seems like I have to resort to making my own encapsulation
> > layer, which seeks to the relevant place in the file, reads sections
> > as bytestrings into recarrays, etc. Well, I must just get on with
> > it...
>
> > I think it would be worthwhile specifying this 32 bit OS limitation in
> > the documentation of mmap.mmap, as I doubt I am the only one being
> > surprised about this address space limitation.
>
> > Cheers,
> > Kim
>
> I agree that some description of system limitations should be included
> in a system-specific document.  There probably is one, I haven't looked
> recently.  But I don't think it belongs in mmap documentation.
>
> Perhaps you still don't recognize what the limit is.  32 bits can only
> address 4 gigabytes of things as first-class addresses.  So roughly the
> same limit that's on mmap is also on list, dict, bytearray, or anything
> else.  If you had 20 lists taking 100 meg each, you would fill up
> memory.  If you had 10 of them, you might have enough room for a 1gb
> mmap area.  And your code takes up some of that space, as well as the
> Python interpreter, the standard library, and all the data structures
> that are normally ignored by the application developer.
>
> BTW,  there is one difference between mmap and most of the other
> allocations.  Most data is allocated out of the swapfile, while mmap is
> allocated from the specified file (unless you use -1 for fileno).  
> Consequently, if the swapfile is already clogged with all the other
> running applications, you can still take your 1.8gb or whatever of your
> virtual space, when much less than that might be available for other
> kinds of allocations.
>
> Executables and dlls are also (mostly) mapped into memory just the same
> as mmap.  So they tend not to take up much space from the swapfile.  In
> fact, with planning, a DLL needn't take up any swapfile space (well, a
> few K is always needed, realistically)..  But that's a linking issue for
> compiled languages.
>
> DaveA- Skjul tekst i anførselstegn -
>
> - Vis tekst i anførselstegn -

I do understand t

37 matches

Mail list logo