On Wed, 23 Mar 2011 13:58:32 +0100, Thomas Koch <tho...@koch.ro> wrote:
Esteban Manchado Velázquez:
On Mon, 28 Feb 2011 22:31:42 +0100, Thomas Koch <tho...@koch.ro> wrote:
> [...]
> I monitored /var/lib/dhelp and saw that the file documents.index
> (~150MB) is rewritten for each invocation of index++. The Swish search
> engine should have some support to merge index files instead of
> rewriting the index every time.
Re-reading your previous mail, I just realised you say your index is
around 150Mib. I just had a look at mine, it's only 4.5Mib. Do you have
any idea what documentation takes so much space?
I wonder what's the average size of people's indices.
However I know from Lucene, that there are other ways how indexers can
handle incremental updates. Lucene writes indexes in so called segment
files. Every time one commits a number of documents to the index, a new
segment file is added to the index but no old file is changed.
Occassionally some smaller segment files are merged to one bigger
segment file to keep the total number of files low.
If Swish-e is not capable of this incremental update and merge pattern,
then
you should rather use another indexer. Besides lucene (which has also an
implementation in C) there are also Xapian and Sphinx, but I don't know
whether they support merging segments.
The main problem for me is, I don't have that much time for this, I'm
not sure how big of an issue it is (eg. how many people it affects), and
changing indexer is kind of a big deal: I'll have to check that everything
keeps working in the same way, that the support the same input formats,
that there aren't encoding problems, etc. So it's a lot of work, and I
don't even know how the performance will compare to the current one :-(
Of course, patches and benchmarks are welcome, but I don't think I'll
work on this anytime soon... Sorry.
--
Esteban
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org