sorted() erraticly fails to sort string numbers

2009-04-28 Thread uuid
I would be very interested in a logical explanation why this happens on 
python 2.5.1:


In order to sort an etree by the .text value of one child, I adapted 
this snippet from effbot.org:



import xml.etree.ElementTree as ET

tree = ET.parse("data.xml")

def getkey(elem):
return elem.findtext("number")

container = tree.find("entries")

container[:] = sorted(container, key=getkey)

tree.write("new-data.xml")


While working with a moderately sized xml file (2500 entries to sort 
by), I found that a few elements were not in order. It seems that 
numbers with seven digits were sorted correctly, while those with six 
digits were just added at the end.


I fixed the problem by converting the numbers to int in the callback:


def getkey(elem):
return int(elem.findtext("number"))


So to my naive mind, it seems as if there was some error with the 
sorted() function. Would anyone be as kind as to explain why it could 
be happening? Thanks in advance!


--
http://mail.python.org/mailman/listinfo/python-list


Re: sorted() erraticly fails to sort string numbers

2009-04-28 Thread uuid
I am at the same time impressed with the concise answer and 
disheartened by my inability to see this myself.

My heartfelt thanks!


On 2009-04-28 10:06:24 +0200, Andre Engels  said:


When sorting strings, including strings that represent numbers,
sorting is done alphabetically. In this alphabetical order the numbers
are all ordered the normal way, so two numbers with the same number of
digits will be sorted the same way, but any number starting with "1"
will come before any number starting with "2", whether they denote
units, tens, hundreds or millions. Thus:

"1" < "15999" < "16" < "2"



--
http://mail.python.org/mailman/listinfo/python-list


Re: sorted() erraticly fails to sort string numbers

2009-04-28 Thread uuid

On 2009-04-28 16:18:43 +0200, John Posner  said:

Don't be disheartened! Many people -- myself included, absolutely! -- 
occasionally let a blind spot show in their messages to this list.


Thanks for the encouragement :)


BTW:

container[:] = sorted(container, key=getkey)

... is equivalent to:

container.sort(key=getkey)

(unless I'm showing *my* blind spot here)


I don't think etree element objects support the .sort method.
At least in lxml they don't 
(http://codespeak.net/lxml/api/elementtree.ElementTree.Element-class.html) 



--
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.6 Install on OSX Server 10.5: lWhich flag to use in "configure" to Change the Install location?

2009-04-29 Thread uuid
My first intuition would be that - even if it works - this would break 
future OS X updates, since you're probably not fixing the receipt files.



On 2009-04-29 23:43:34 +0200, Omita  said:


However, as I am using OSX Server I would ideally like the install
location to be here:

/System/Library/Frameworks/Python.framework/



--
http://mail.python.org/mailman/listinfo/python-list


Get multiprocessing.Queue to do priorities

2009-05-09 Thread uuid

Hello,
I was wondering whether there was a way to make multiprocessing.Queue 
behave in a priority queue-like fashion. Subclassing with heappush and 
heappop for put and get doesn't work the old way (multiprocessing.Queue 
seems to use different data structures than Queue.Queue?)


Could one create a heapq within the producer as a proxy, and then feed 
a proper queue from that? Does anyone have an idea on how to deal with 
the queue item flow control? (Since all of the sorting has to happen 
within the heapq, it should only pass items to the real Queue if it's 
empty?)


Thanks in advance!

--
http://mail.python.org/mailman/listinfo/python-list


Re: Get multiprocessing.Queue to do priorities

2009-05-09 Thread uuid
I just read up and it seems that no matter the approach, it's futile to 
use multiprocessing.Queue, since there is a bug that prevents true 
FIFO. (http://bugs.python.org/issue4999)


Any recommendation on an alternate way to build a priority queue to use 
with a "one producer, many consumers" type multiprocessing setup would 
be welcomed!


:|

On 2009-05-09 18:42:34 +0200, uuid  said:


Hello,
I was wondering whether there was a way to make multiprocessing.Queue 
behave in a priority queue-like fashion. Subclassing with heappush and 
heappop for put and get doesn't work the old way (multiprocessing.Queue 
seems to use different data structures than Queue.Queue?)


Could one create a heapq within the producer as a proxy, and then feed 
a proper queue from that? Does anyone have an idea on how to deal with 
the queue item flow control? (Since all of the sorting has to happen 
within the heapq, it should only pass items to the real Queue if it's 
empty?)


Thanks in advance!



--
http://mail.python.org/mailman/listinfo/python-list


Re: Get multiprocessing.Queue to do priorities

2009-05-09 Thread uuid
Scott David Daniels  wrote:
> 
> ? "one producer, many consumers" ?
> What would the priority queue do?  Choose a consumer?

Sorry, I should have provided a little more detail. There is one
producer thread, reading urls from multiple files and external input.
These urls have a certain priority, and are fed to multiple consumer
threads for fetching and further processing (XML parsing and such CPU
intensive stuff). Since the content of the urls is changing over time,
it is crucial to have a certain amount of control over the order in
which the requests occur. So, to answer the question: 
The priority queue would make sure that out of a number of
asynchronously added items, those with a high priority are fetched first
by the worker threads. Sounds like a perfect case for a heap, if only I
could :)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Get multiprocessing.Queue to do priorities

2009-05-09 Thread uuid
The Queue module, apparently, is thread safe, but *not* process safe. 
If you try to use an ordinary Queue, it appears inaccessible to the 
worker process. (Which, after all, is quite logical, since methods for 
moving items between the threads of the same process are quite 
different from inter-process communication.) It appears that creating a 
manager that holds a shared queue might be an option 
(http://stackoverflow.com/questions/342556/python-2-6-multiprocessing-queue-compatible-with-threads).


Just 


for illustration: This shows that Queue.Queue doesn't work with processes:


def worker(queue):
   while True:
   item = queue.get()
   print item
   queue.task_done()

queue_queue = Queue.Queue()

worker_thread = multiprocessing.Process(target=worker, args=(queue_queue,))
worker_thread.start()
for i in range(10):
   queue_queue.put(str(i))
time.sleep(10)
while True:
   try:
   print 'still on queue: ' + queue_queue.get(False)
   except Queue.Empty:
   break
worker_thread.join()


This yields:

still on queue: 0
still on queue: 1
still on queue: 2
still on queue: 3
still on queue: 4
still on queue: 5
still on queue: 6
still on queue: 7
still on queue: 8
still on queue: 9

So no queue item ever arrives at the worker process.



On 2009-05-09 22:00:36 +0200, Scott David Daniels  said:


2.6 has a PriorityQueue in the Queue module.
If you aren't using 2.6, you could copy the code for your own version.



--
http://mail.python.org/mailman/listinfo/python-list


Re: xml in python

2009-05-10 Thread uuid

On 2009-05-10 09:24:36 +0200, Piet van Oostrum  said:




These days ElementTree is considered the most pythonic way.
http://docs.python.org/library/xml.etree.elementtree.html

There is also a reimplementation of the ElementTree API based on libxml2
and libxslt, which has more features but requires a separate install. It
is largely compatible with ElementTree, however.


Indeed - elementtree and its blazing C implementation celementtree are 
included in the standard install.


However, I would also recommend checking out the latter option - lxml. 
It's very fast, has an elementtree compatible API and sports some 
tricks like XPath and XSLT. There's comprehensive documentation as well 
als tutorials, hints and tips here: http://codespeak.net/lxml/


--
http://mail.python.org/mailman/listinfo/python-list


Re: Get multiprocessing.Queue to do priorities

2009-05-10 Thread uuid

Dear Jesse,
thanks for the hint.

I see you are already assigned to the FIFO bug 
(http://bugs.python.org/issue4999), so I won't burden you even more. 
Clearly, a reliable FIFO behavior of multiprocessing.Queue helps more 
than a priority queue, since it can be used to build one, so that 
should really be the first thing to fix.


In the meantime, I think I'll whip up a hack that uses sort of a 
bucket- strategy: fill up a prioritized heapq, and then, in regular 
intervals, unload its contents into a size-limited multiprocessing 
queue.


I'll post this as soon as it works.

-u

On 2009-05-10 15:35:03 +0200, Jesse Noller  said:


Using a manager, or submitting a patch which adds priority queue to
the multiprocessing.queue module is the correct solution for this.

You can file an enhancement in the tracker, and assign/add me to it,
but without a patch it may take me a bit (wicked busy right now).

jesse



--
http://mail.python.org/mailman/listinfo/python-list