I have indexed a mailing list archive.  My next goal is to nightly 
update that index by indexing the entire month's archive and then 
merging that into the main database.  At present, there are about 4 
years of data.  I'm seeking comments on my approach.

FYI, this is the main index:

-rw-r--r--  1 dan  dan  70930432 Nov  9 16:58 adsl.docdb
-rw-r--r--  1 dan  dan   1939456 Nov  9 16:58 adsl.docs.index
-rw-r--r--  1 dan  dan  80090252 Nov  9 16:58 adsl.wordlist
-rw-r--r--  1 dan  dan  66713600 Nov  9 16:58 adsl.words.db

My first step is to create the merge database:

-rw-r--r--  1 dan  dan     39936 Nov  9 16:52 adsl-merge.docdb
-rw-r--r--  1 dan  dan      2048 Nov  9 16:50 adsl-merge.docs.index
-rw-r--r--  1 dan  dan     33705 Nov  9 16:50 adsl-merge.wordlist
-rw-r--r--  1 dan  dan     54272 Nov  9 16:50 adsl-merge.words.db

Here is the command I use to do the merge of the above two databases:

htmerge -s -a -c adsl.conf -m adsl.merge.conf

But in order to do that, I need to first do the following:

cp adsl-merge.docdb.work      adsl-merge.docdb
cp adsl-merge.docs.index.work adsl-merge.docs.index
cp adsl-merge.wordlist.work   adsl-merge.wordlist
cp adsl-merge.words.db.work   adsl-merge.words.db

cp adsl.docdb      adsl.docdb.work
cp adsl.docs.index adsl.docs.index.work
cp adsl.wordlist   adsl.wordlist.work
cp adsl.words.db   adsl.words.db.work

After the merge, this moves the new search data into production:

mv adsl.docdb.work      adsl.docdb
mv adsl.docs.index.work adsl.docs.index
mv adsl.wordlist.work   adsl.wordlist
mv adsl.words.db.work   adsl.words.db

It all seems to work.  Any comments?

Thanks.
-- 
Dan Langille : http://www.langille.org/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to