Suggestions for List Archive

Devin Reade Tue, 23 Jan 2001 12:00:54 -0800
It's good that this list has been archived, however it its
current form it's not very usable.  With over 8000 messages
in one folder, it's not feasible to find something without
using server-side search capabilities, however those seem
to be disabled.  (I tried searching for text that I knew to
be in the mailbox, but the server reported no matches.)

So I tried the next best thing -- I tried to do a local
search by treating the folder as disconnected and syncing 
it to local disk.  After wasting both cmu's and my own 
bandwidth, I find that my IMAP client doesn't handle 8000
messages per mailbox very well.

At one point I saw a reference to a third party web based
search engine of this archive, but it couldn't find text
known to be in the archive, either.  Sorry, I didn't record
whose site the search engine was on.

At any rate, may I suggest the following:
        1.  Break the archive up into one subfolder per year,
            so that we have:
                archive.info-cyrus.1999
                archive.info-cyrus.2000
                archive.info-cyrus.2001
            etc.
        2.  Enable server-side search capabilities (are they
            disabled, or just broken?)  Sure it takes more
            CPU, but that's got to be better than people 
            sucking down the whole archive to try to find
            something.
        3.  Implement a working web-based search engine of
            the archive.  If there is already one available,
            publish it on the Cyrus IMAPD web site.  (The
            web search interface should be able to search
            multiple mailboxes so that you don't need to 
            restrict your search to a particular year.)

The same arguments could be made for the other mailing lists
at CMU that I see available via anon IMAP, but it is only this
one (and to a lesser extent the sasl list) that I care about.

Don't forget that with the current 8000 message mailbox scheme,
you're talking about 8000 files in one directory.  You are almost
certainly seeing a major machine performance hit with that many 
files in one directory.  You will be hit not only on queries,
but with every inbound message.  Dividing stuff into years will
give you a bit of rudimentary hashing to bring the file count 
down to a reasonable level.
-- 
        Devin Reade             [EMAIL PROTECTED]
Suggestions for List Archive

Reply via email to