Mark,
Another question to ask is: do you *really* need to be calling commit
every 300 docs? Unless you really need searchers to see your 300
new docs,
you don't need to commit. Just optimize + commit at the end of
your whole
batch. Lowering the mergeFactor is the right thing to do. Out of
curiosity, are you using a single instance of Solr for both
indexing and
searching?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Mark Baird <[EMAIL PROTECTED]>
To: Solr Mailing List <solr-user@lucene.apache.org>
Sent: Monday, December 24, 2007 7:25:00 PM
Subject: An observation on the "Too Many Files Open" problem
Running our Solr server (latest 1.2 Solr release) on a Linux
machine we
ran
into the "Too Many Open Files" issue quite a bit. We've since
changed
the
ulimit max filehandle setting, as well as the Solr mergeFactor
setting
and
haven't been running into the problem anymore. However we are seeing
some
behavior from Solr that seems a little odd to me. When we are in the
middle
of our batch index process and we run the lsof command we see a
lot of
open
file handles hanging around that reference Solr index files that have
been
deleted by Solr and no longer exist on the system.
The documents we are indexing are potentially very large, so due to
various
memory constraints we only send 300 docs to Solr at a time. With a
commit
between each set of 300 documents. Now one of the things that I read
may
cause old file handles to hang around was if you had an old
IndexReader
still open pointing to those old files. However whenever you issue a
commit
to the server it is supposed to close the old IndexReader and open a
new
one.
So my question is, when the Reader is being closed due to a commit,
what
exactly is happening? Is it just being set to null and a new
instance
being
created? I'm thinking the reader may be sitting around in memory
for a
while before the garbage collector finally gets to it, and in that
time
it
is still holding those files open. Perhaps an explicit method call
that
closes any open file handles should occur before setting the
reference
to
null?
After looking at the code, it looks like reader.close() is explicitly
being
called as long as the closeReader property in SolrIndexSearcher is
set
to
true, but I'm not sure how to check if that is always getting set to
true or
not. There is one constructor of SolrIndexSearcher that sets it to
false.
Any insight here would be appreciated. Are stale file handles
something I
should just expect from the JVM? I've never ran into the "Too Many
Files
Open" exception before, so this is my first time looking at the lsof
command. Perhaps I'm reading too much into the data it's showing me.
Mark Baird