Ah! I did not notice the 'too many open files' part. This means that
your mergeFactor setting is too high for what your operating system
allows. The default mergeFactor is 10 (which translates into thousands
of open file descriptors). You should lower this number.

On Tue, Jul 6, 2010 at 1:14 PM, Jim Blomo <jim.bl...@pbworks.com> wrote:
> On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog <goks...@gmail.com> wrote:
>> You don't need to optimize, only commit.
>
> OK, thanks for the tip, Lance.  I thought the "too many open files"
> problem was because I wasn't optimizing/merging frequently enough.  My
> understanding of your suggestion is that commit also does merging, and
> since I am only building the index, not querying or updating it, I
> don't need to optimize.
>
>> This means that the JVM spends 98% of its time doing garbage
>> collection. This means there is not enough memory.
>
> I'll increase the memory to 4G, decrease the documentCache to 5 and try again.
>
>> I made a mistake - the bug in Lucene is not about PDFs - it happens
>> with every field in every document you index in any way- so doing this
>> in Tika outside Solr does not help. The only trick I can think of is
>> to alternate between indexing large and small documents. This way the
>> bug does not need memory for two giant documents in a row.
>
> I've checked out and built solr from branch_3x with the
> tika-0.8-SNAPSHOT patch.  (Earlier I was having trouble with Tika
> crashing too frequently.)  I've confirmed that LUCENE-2387 is fixed in
> this branch so hopefully I won't run into that this time.
>
>> Also, do not query the indexer at all. If you must, don't do sorted or
>> faceting requests. These eat up a lot of memory that is only freed
>> with the next commit (index reload).
>
> Good to know, though I have not been querying the index and definitely
> haven't ventured into faceted requests yet.
>
> The advice is much appreciated,
>
> Jim
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to