Excerpts from Austin Clements's message of Fri May 27 03:41:44 +0100 2011:
> >> > > Have you tried simply calling list() on your thread
> >> > > iterator to see how expensive it is? ?My bet is that it's quite cheap,
> >> > > both memory-wise and CPU-wise.
> >> > Funny thing:
> >> > ?q=Database().create_query('*')
> >> > ?time tlist = list(q.search_threads())
> >> > raises a NotmuchError(STATUS.NOT_INITIALIZED) exception. For some reason
> >> > the list constructor must read mere than once from the iterator.
> >> > So this is not an option, but even if it worked, it would show
> >> > the same behaviour as my above test..
> >>
> >> Interesting. ?Looks like the Threads class implements __len__ and that
> >> its implementation exhausts the iterator. ?Which isn't a great idea in
> >> itself, but it turns out that Python's implementation of list() calls
> >> __len__ if it's available (presumably to pre-size the list) before
> >> iterating over the object, so it exhausts the iterator before even
> >> using it.
> >>
> >> That said, if list(q.search_threads()) did work, it wouldn't give you
> >> better performance than your experiment above.
true. Nevertheless I think that list(q.search_threads())
should be equivalent to [t for t in q.search_threads()], which is
something to be fixed in the bindings. Should I file an issue somehow?
Or is enough to state this as a TODO here on the list?> >> > would it be very hard to implement a Query.search_thread_ids() ? > >> > This name is a bit off because it had to be done on a lower level. > >> > >> Lazily fetching the thread metadata on the C side would probably > >> address your problem automatically. ?But what are you doing that > >> doesn't require any information about the threads you're manipulating? > > Agreed. Unfortunately, there seems to be no way to get a list of thread > > ids or a reliable iterator thereof by using the current python bindings. > > It would be enough for me to have the ids because then I could > > search for the few threads I actually need individually on demand. > > There's no way to do that from the C API either, so don't feel left > out. ]:--8) It seems to me that the right solution to your problem > is to make thread information lazy (effectively, everything gathered > in lib/thread.cc:_thread_add_message). Then you could probably > materialize that iterator cheaply. Alright. I'll put this on my mental notmuch wish list and hope that someone will have addressed this before I run out of ideas how to improve my UI and have time to look at this myself. For now, I go with the [t.get_thread_id for t in q.search_threads()] approach to cache the thread ids myself and live with the fact that this takes time for large result sets. > In fact, it's probably worth > trying a hack where you put dummy information in the thread object > from _thread_add_message and see how long it takes just to walk the > iterator (unfortunately I don't think profiling will help much here > because much of your time is probably spent waiting for I/O). I don't think I understand what you mean by dummy info in a thread object. > I don't think there would be any downside to doing this for eager > consumers like the CLI. one should think so, yes. /p -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110527/8bb52855/attachment.pgp>
