Hmm -

Have you tested search speed (without optimizing) using a merge factor of 2? If the speed is acceptable (should be much faster than MF:10), try a merge factor of 3. Using a merge factor of 2 or 3 and never optimizing should keep searches relatively fast, but also leave a lot of the index files unchanged many times. You still would have to pay larger hits as merges occurred, but most times you should only have to rsync the new segments (and any shared segment files that changed). A merge factor of 2 would not allow many additions before the large segment was changed, but perhaps 3 could give a good compromise?

--
- Mark

http://www.lucidimagination.com



Marc Sturlese wrote:
Thanks Mark, that really did the job! The speed loss in update time is more
than compensated at optimizing time!

Now I am trying to do another test... but not sure if Lucene have this
option, I am using Lucene 2.9-dev.

As I am working with 3G index and always have to optimize (as I said before,
I tried not to optimize to send my index via rsync faster but the speed loss
to serve request in the slaves was huge). I wander if it's possible to do
"block optimizing" (I have just invented the word). The example would be...
I have a 3G index optimized. I start executing updates to the index.. would
be possible to keep doing optimizes just on the new created segments?... so
I would still have the 3G index and would be building another big index from
the segments created from the updates This way I would have to send via
rsync to the slaves just the new "blog" (suposing the slaves already had the
3G index because I would have sended it before).
Is there any way to do something similar to that?
This has come to my mind cause I have to serve the index to the slaves as
many times as possible... and optimizing the index in just one "block" makes
rsync job to take a long time.

Thanks in advance


markrmiller wrote:
Marc Sturlese wrote:
Hey there,
I am creating an index of 3G... it's fast indexing but optimization takes
about 10 min. I need to optimize it every time I update as if I don't do
that, search requests will be much slower.
Wich parameter configuration would be the best to make optimize as fast
as
possible (I don't mind to use a lot of memory, at least for testing, if I
can speed up the process).
Actually I am using for the IndexWriter:

    <ramBufferSizeMB>1024</ramBufferSizeMB>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
    <mergeFactor>10</mergeFactor>
Am I missing any important parameter to do that job?
Thanks in advance

How about using a merge factor of 2? This way you are pretty much always optimized (old large segment, new small segment at most) - you pay a bit in update speed, but I've found it to be very reasonable for many applications.

--
- Mark

http://www.lucidimagination.com









Reply via email to