Re: An observation on the "Too Many Files Open" problem

Mike Klaas Thu, 27 Dec 2007 11:54:59 -0800

Setting maxBufferedDocs to something smaller (say, 300), might be abetter way of limiting your memory usage. I have difficulties withthe odd huge document when using the default maxBufferedDocs=1000 (inthe next Solr version, there should be an option to limit indexingbased on memory usage and not #docs). There is also an option toflush pending deletes every X docs in trunk, which might make adifference if you are overwriting millions of docs.

The autocommit problem seems odd to me: I definitely used it in 1.2(maxDocs only). There have been a few bugs fixed in trunk, though(if that is an option).


-Mike

On 26-Dec-07, at 5:48 AM, Mark Baird wrote:

Well when I wasn't sending regular commits I was getting out of memory
exceptions from Solr fairly often, which I assume is due to thesize of thedocuments I'm sending. I'd love to set the autocommit insolrconfig.xml andnot worry about sending commits on the client side, but autocommitdoesn'tseem to work at all for me. I'm aware of the maxTime bug but I'vetriedsetting maxTime to 0, and completely leaving it out ofsolrconfig.xml alltogether, but no matter what I try my Solr server never does anautocommit.
Yes I'm just using one Solr instance for indexing and searchingcurrently.However this is just in a development environment still. We juststarted
looking at Solr about a month ago and haven't gone to production with
anything yet. We will probably have separate instances forindexing and
searching by the time we go live.


Mark

On Dec 24, 2007 4:20 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
Mark,

Another question to ask is: do you *really* need to be calling commit
every 300 docs? Unless you really need searchers to see your 300new docs,you don't need to commit. Just optimize + commit at the end ofyour whole
batch.  Lowering the mergeFactor is the right thing to do.  Out of
curiosity, are you using a single instance of Solr for bothindexing and
searching?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Mark Baird <[EMAIL PROTECTED]>
To: Solr Mailing List <solr-user@lucene.apache.org>
Sent: Monday, December 24, 2007 7:25:00 PM
Subject: An observation on the "Too Many Files Open" problem
Running our Solr server (latest 1.2 Solr release) on a Linuxmachine we
 ran
into the "Too Many Open Files" issue quite a bit. We've sincechanged
 the
ulimit max filehandle setting, as well as the Solr mergeFactorsetting
 and
haven't been running into the problem anymore.  However we are seeing
 some
behavior from Solr that seems a little odd to me.  When we are in the
 middle
of our batch index process and we run the lsof command we see alot of
 open
file handles hanging around that reference Solr index files that have
 been
deleted by Solr and no longer exist on the system.

The documents we are indexing are potentially very large, so due to
 various
memory constraints we only send 300 docs to Solr at a time.  With a
 commit
between each set of 300 documents.  Now one of the things that I read
 may
cause old file handles to hang around was if you had an oldIndexReader
still open pointing to those old files.  However whenever you issue a
 commit
to the server it is supposed to close the old IndexReader and open a
 new
one.

So my question is, when the Reader is being closed due to a commit,
 what
exactly is happening? Is it just being set to null and a newinstance
 being
created? I'm thinking the reader may be sitting around in memoryfor awhile before the garbage collector finally gets to it, and in thattime
 it
is still holding those files open.  Perhaps an explicit method call
 that
closes any open file handles should occur before setting thereference
 to
null?

After looking at the code, it looks like reader.close() is explicitly
 being
called as long as the closeReader property in SolrIndexSearcher isset
 to
true, but I'm not sure how to check if that is always getting set to
 true or
not.  There is one constructor of SolrIndexSearcher that sets it to
 false.

Any insight here would be appreciated.  Are stale file handles
 something I
should just expect from the JVM?  I've never ran into the "Too Many
 Files
Open" exception before, so this is my first time looking at the lsof
command.  Perhaps I'm reading too much into the data it's showing me.


Mark Baird

Re: An observation on the "Too Many Files Open" problem

Reply via email to