Actually this isn't quite right.

Lucene flushes a new segment whenever RAM is full (not every 5 docs if
mergeFactor is 5).

Whereas mergeFactor decides how many segments of roughly the same size
are merged at once.

So eg if you index 42 docs, unless the docs are immense (or, are not
indexed in a single session), that will create 1 segment.

Mike

On Mon, Apr 5, 2010 at 6:21 PM, Lance Norskog <goks...@gmail.com> wrote:
> mergeFactor=5 means that if there are 42 documents, there will be 3 index 
> files:
>
> 1 with 25 documents,
> 3 with 5 documents, and
> 1 with 2 documents
>
> Imagine making change with coins of 1 document, 5 documents, 5^2
> documents, 5^3 documents, etc.
>
> On Mon, Apr 5, 2010 at 10:59 AM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
>>
>> This sounds completley normal form what i remembe about mergeFactor.
>>
>> Segmenets are merged "by level" meaning that with a mergeFactor of 5, once
>> 5 "level 1" segments are formed they are merged into a single "level 2"
>> segment.  then 5 more "level 1" segments are allowed to form before the
>> next merge (resulting in 2 "legel 2" sements).  Once you have 5 "level 2"
>> sements, then they are all merged into a single "level 3" segment, etc...
>>
>> : I had my mergeFactor as 5 ,
>> : but when i load a data with some 1,00,000 i got some 12 .cfs files in my
>> : data/index folder .
>> :
>> : How come this is possible .
>> : in what context we can have more no of .cfs files
>>
>>
>> -Hoss
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Reply via email to