sorted() erraticly fails to sort string numbers
I would be very interested in a logical explanation why this happens on
python 2.5.1:
In order to sort an etree by the .text value of one child, I adapted
this snippet from effbot.org:
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
def getkey(elem):
return elem.findtext("number")
container = tree.find("entries")
container[:] = sorted(container, key=getkey)
tree.write("new-data.xml")
While working with a moderately sized xml file (2500 entries to sort
by), I found that a few elements were not in order. It seems that
numbers with seven digits were sorted correctly, while those with six
digits were just added at the end.
I fixed the problem by converting the numbers to int in the callback:
def getkey(elem):
return int(elem.findtext("number"))
So to my naive mind, it seems as if there was some error with the
sorted() function. Would anyone be as kind as to explain why it could
be happening? Thanks in advance!
--
http://mail.python.org/mailman/listinfo/python-list
Re: sorted() erraticly fails to sort string numbers
I am at the same time impressed with the concise answer and disheartened by my inability to see this myself. My heartfelt thanks! On 2009-04-28 10:06:24 +0200, Andre Engels said: When sorting strings, including strings that represent numbers, sorting is done alphabetically. In this alphabetical order the numbers are all ordered the normal way, so two numbers with the same number of digits will be sorted the same way, but any number starting with "1" will come before any number starting with "2", whether they denote units, tens, hundreds or millions. Thus: "1" < "15999" < "16" < "2" -- http://mail.python.org/mailman/listinfo/python-list
Re: sorted() erraticly fails to sort string numbers
On 2009-04-28 16:18:43 +0200, John Posner said: Don't be disheartened! Many people -- myself included, absolutely! -- occasionally let a blind spot show in their messages to this list. Thanks for the encouragement :) BTW: container[:] = sorted(container, key=getkey) ... is equivalent to: container.sort(key=getkey) (unless I'm showing *my* blind spot here) I don't think etree element objects support the .sort method. At least in lxml they don't (http://codespeak.net/lxml/api/elementtree.ElementTree.Element-class.html) -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2.6 Install on OSX Server 10.5: lWhich flag to use in "configure" to Change the Install location?
My first intuition would be that - even if it works - this would break future OS X updates, since you're probably not fixing the receipt files. On 2009-04-29 23:43:34 +0200, Omita said: However, as I am using OSX Server I would ideally like the install location to be here: /System/Library/Frameworks/Python.framework/ -- http://mail.python.org/mailman/listinfo/python-list
Get multiprocessing.Queue to do priorities
Hello, I was wondering whether there was a way to make multiprocessing.Queue behave in a priority queue-like fashion. Subclassing with heappush and heappop for put and get doesn't work the old way (multiprocessing.Queue seems to use different data structures than Queue.Queue?) Could one create a heapq within the producer as a proxy, and then feed a proper queue from that? Does anyone have an idea on how to deal with the queue item flow control? (Since all of the sorting has to happen within the heapq, it should only pass items to the real Queue if it's empty?) Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list
Re: Get multiprocessing.Queue to do priorities
I just read up and it seems that no matter the approach, it's futile to use multiprocessing.Queue, since there is a bug that prevents true FIFO. (http://bugs.python.org/issue4999) Any recommendation on an alternate way to build a priority queue to use with a "one producer, many consumers" type multiprocessing setup would be welcomed! :| On 2009-05-09 18:42:34 +0200, uuid said: Hello, I was wondering whether there was a way to make multiprocessing.Queue behave in a priority queue-like fashion. Subclassing with heappush and heappop for put and get doesn't work the old way (multiprocessing.Queue seems to use different data structures than Queue.Queue?) Could one create a heapq within the producer as a proxy, and then feed a proper queue from that? Does anyone have an idea on how to deal with the queue item flow control? (Since all of the sorting has to happen within the heapq, it should only pass items to the real Queue if it's empty?) Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list
Re: Get multiprocessing.Queue to do priorities
Scott David Daniels wrote: > > ? "one producer, many consumers" ? > What would the priority queue do? Choose a consumer? Sorry, I should have provided a little more detail. There is one producer thread, reading urls from multiple files and external input. These urls have a certain priority, and are fed to multiple consumer threads for fetching and further processing (XML parsing and such CPU intensive stuff). Since the content of the urls is changing over time, it is crucial to have a certain amount of control over the order in which the requests occur. So, to answer the question: The priority queue would make sure that out of a number of asynchronously added items, those with a high priority are fetched first by the worker threads. Sounds like a perfect case for a heap, if only I could :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Get multiprocessing.Queue to do priorities
The Queue module, apparently, is thread safe, but *not* process safe. If you try to use an ordinary Queue, it appears inaccessible to the worker process. (Which, after all, is quite logical, since methods for moving items between the threads of the same process are quite different from inter-process communication.) It appears that creating a manager that holds a shared queue might be an option (http://stackoverflow.com/questions/342556/python-2-6-multiprocessing-queue-compatible-with-threads). Just for illustration: This shows that Queue.Queue doesn't work with processes: def worker(queue): while True: item = queue.get() print item queue.task_done() queue_queue = Queue.Queue() worker_thread = multiprocessing.Process(target=worker, args=(queue_queue,)) worker_thread.start() for i in range(10): queue_queue.put(str(i)) time.sleep(10) while True: try: print 'still on queue: ' + queue_queue.get(False) except Queue.Empty: break worker_thread.join() This yields: still on queue: 0 still on queue: 1 still on queue: 2 still on queue: 3 still on queue: 4 still on queue: 5 still on queue: 6 still on queue: 7 still on queue: 8 still on queue: 9 So no queue item ever arrives at the worker process. On 2009-05-09 22:00:36 +0200, Scott David Daniels said: 2.6 has a PriorityQueue in the Queue module. If you aren't using 2.6, you could copy the code for your own version. -- http://mail.python.org/mailman/listinfo/python-list
Re: xml in python
On 2009-05-10 09:24:36 +0200, Piet van Oostrum said: These days ElementTree is considered the most pythonic way. http://docs.python.org/library/xml.etree.elementtree.html There is also a reimplementation of the ElementTree API based on libxml2 and libxslt, which has more features but requires a separate install. It is largely compatible with ElementTree, however. Indeed - elementtree and its blazing C implementation celementtree are included in the standard install. However, I would also recommend checking out the latter option - lxml. It's very fast, has an elementtree compatible API and sports some tricks like XPath and XSLT. There's comprehensive documentation as well als tutorials, hints and tips here: http://codespeak.net/lxml/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Get multiprocessing.Queue to do priorities
Dear Jesse, thanks for the hint. I see you are already assigned to the FIFO bug (http://bugs.python.org/issue4999), so I won't burden you even more. Clearly, a reliable FIFO behavior of multiprocessing.Queue helps more than a priority queue, since it can be used to build one, so that should really be the first thing to fix. In the meantime, I think I'll whip up a hack that uses sort of a bucket- strategy: fill up a prioritized heapq, and then, in regular intervals, unload its contents into a size-limited multiprocessing queue. I'll post this as soon as it works. -u On 2009-05-10 15:35:03 +0200, Jesse Noller said: Using a manager, or submitting a patch which adds priority queue to the multiprocessing.queue module is the correct solution for this. You can file an enhancement in the tracker, and assign/add me to it, but without a patch it may take me a bit (wicked busy right now). jesse -- http://mail.python.org/mailman/listinfo/python-list
