Esteban Manchado Velázquez: > On Mon, 28 Feb 2011 22:31:42 +0100, Thomas Koch <tho...@koch.ro> wrote: > > [...] > > I monitored /var/lib/dhelp and saw that the file documents.index > > (~150MB) is rewritten for each invocation of index++. The Swish search > > engine should have some support to merge index files instead of > > rewriting the index every time. > > I'm not sure I understand your point. IIRC the index file is written > "from scratch" in a temporary file and then moved to its final location, > but that doesn't mean that all content is re-indexed. Only the new files > are indexed, while the rest is, I assume, just taken from the existing > index. The --incremental switch does that. > > So if that's what you meant, I don't think I can do much about it :-( > Unless I change to another indexer, that is :-) Hi Esteban,
you describe the process as I encountered it. The problem is, that the rewrite of the already indexed part of the whole index to the new file causes a heavy write load. This may especially be annoying for users of solid state disks. Therefor I consider this Bug to be very important and would kindly ask you to mark and handle it as such. However I know from Lucene, that there are other ways how indexers can handle incremental updates. Lucene writes indexes in so called segment files. Every time one commits a number of documents to the index, a new segment file is added to the index but no old file is changed. Occassionally some smaller segment files are merged to one bigger segment file to keep the total number of files low. If Swish-e is not capable of this incremental update and merge pattern, then you should rather use another indexer. Besides lucene (which has also an implementation in C) there are also Xapian and Sphinx, but I don't know whether they support merging segments. Thomas Koch, http://www.koch.ro -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org