Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff    543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff    543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff    543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
>
> Hi,
>
> You are looking for maxMergeDocs, I believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <vivex...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, April 23, 2009 1:08:20 PM
>> Subject: Control segment size
>>
>> Hi,
>>
>>   Is there any configuration to control the segments' file size in
>> Solr? Currently, I've an index (70G) with 80 segment files and one of
>> the file is 24G. We noticed that in some cases commit takes over 2
>> hours to complete (committing 50K records), whereas usually it
>> finishes in 20 seconds. After further investigation it turns out the
>> system was doing lot of paging - the file system buffer was trying to
>> write back the big segment back to disk. I got 20G memory on system
>> with 6 G assigned to Solr instance (running 2 instances).
>>
>> It seems if I can control the segment size to max of 4-5 GB I'll be
>> ok. Is there any way to do so?
>>
>> I got merging factor of 100 - does that impacts the size too? Why
>> different segments have different size?
>>
>> Thanks,
>> -vivek
>
>

Reply via email to