Mark,

Another question to ask is: do you *really* need to be calling commit every 300 
docs?  Unless you really need searchers to see your 300 new docs, you don't 
need to commit.  Just optimize + commit at the end of your whole batch.  
Lowering the mergeFactor is the right thing to do.  Out of curiosity, are you 
using a single instance of Solr for both indexing and searching?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Mark Baird <[EMAIL PROTECTED]>
To: Solr Mailing List <solr-user@lucene.apache.org>
Sent: Monday, December 24, 2007 7:25:00 PM
Subject: An observation on the "Too Many Files Open" problem

Running our Solr server (latest 1.2 Solr release) on a Linux machine we
 ran
into the "Too Many Open Files" issue quite a bit.  We've since changed
 the
ulimit max filehandle setting, as well as the Solr mergeFactor setting
 and
haven't been running into the problem anymore.  However we are seeing
 some
behavior from Solr that seems a little odd to me.  When we are in the
 middle
of our batch index process and we run the lsof command we see a lot of
 open
file handles hanging around that reference Solr index files that have
 been
deleted by Solr and no longer exist on the system.

The documents we are indexing are potentially very large, so due to
 various
memory constraints we only send 300 docs to Solr at a time.  With a
 commit
between each set of 300 documents.  Now one of the things that I read
 may
cause old file handles to hang around was if you had an old IndexReader
still open pointing to those old files.  However whenever you issue a
 commit
to the server it is supposed to close the old IndexReader and open a
 new
one.

So my question is, when the Reader is being closed due to a commit,
 what
exactly is happening?  Is it just being set to null and a new instance
 being
created?  I'm thinking the reader may be sitting around in memory for a
while before the garbage collector finally gets to it, and in that time
 it
is still holding those files open.  Perhaps an explicit method call
 that
closes any open file handles should occur before setting the reference
 to
null?

After looking at the code, it looks like reader.close() is explicitly
 being
called as long as the closeReader property in SolrIndexSearcher is set
 to
true, but I'm not sure how to check if that is always getting set to
 true or
not.  There is one constructor of SolrIndexSearcher that sets it to
 false.

Any insight here would be appreciated.  Are stale file handles
 something I
should just expect from the JVM?  I've never ran into the "Too Many
 Files
Open" exception before, so this is my first time looking at the lsof
command.  Perhaps I'm reading too much into the data it's showing me.


Mark Baird



Reply via email to