Setting maxBufferedDocs to something smaller (say, 300), might be a better way of limiting your memory usage. I have difficulties with the odd huge document when using the default maxBufferedDocs=1000 (in the next Solr version, there should be an option to limit indexing based on memory usage and not #docs). There is also an option to flush pending deletes every X docs in trunk, which might make a difference if you are overwriting millions of docs.

The autocommit problem seems odd to me: I definitely used it in 1.2 (maxDocs only). There have been a few bugs fixed in trunk, though (if that is an option).

-Mike

On 26-Dec-07, at 5:48 AM, Mark Baird wrote:

Well when I wasn't sending regular commits I was getting out of memory
exceptions from Solr fairly often, which I assume is due to the size of the documents I'm sending. I'd love to set the autocommit in solrconfig.xml and not worry about sending commits on the client side, but autocommit doesn't seem to work at all for me. I'm aware of the maxTime bug but I've tried setting maxTime to 0, and completely leaving it out of solrconfig.xml all together, but no matter what I try my Solr server never does an autocommit.

Yes I'm just using one Solr instance for indexing and searching currently. However this is just in a development environment still. We just started
looking at Solr about a month ago and haven't gone to production with
anything yet. We will probably have separate instances for indexing and
searching by the time we go live.


Mark

On Dec 24, 2007 4:20 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

Mark,

Another question to ask is: do you *really* need to be calling commit
every 300 docs? Unless you really need searchers to see your 300 new docs, you don't need to commit. Just optimize + commit at the end of your whole
batch.  Lowering the mergeFactor is the right thing to do.  Out of
curiosity, are you using a single instance of Solr for both indexing and
searching?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Mark Baird <[EMAIL PROTECTED]>
To: Solr Mailing List <solr-user@lucene.apache.org>
Sent: Monday, December 24, 2007 7:25:00 PM
Subject: An observation on the "Too Many Files Open" problem

Running our Solr server (latest 1.2 Solr release) on a Linux machine we
 ran
into the "Too Many Open Files" issue quite a bit. We've since changed
 the
ulimit max filehandle setting, as well as the Solr mergeFactor setting
 and
haven't been running into the problem anymore.  However we are seeing
 some
behavior from Solr that seems a little odd to me.  When we are in the
 middle
of our batch index process and we run the lsof command we see a lot of
 open
file handles hanging around that reference Solr index files that have
 been
deleted by Solr and no longer exist on the system.

The documents we are indexing are potentially very large, so due to
 various
memory constraints we only send 300 docs to Solr at a time.  With a
 commit
between each set of 300 documents.  Now one of the things that I read
 may
cause old file handles to hang around was if you had an old IndexReader
still open pointing to those old files.  However whenever you issue a
 commit
to the server it is supposed to close the old IndexReader and open a
 new
one.

So my question is, when the Reader is being closed due to a commit,
 what
exactly is happening? Is it just being set to null and a new instance
 being
created? I'm thinking the reader may be sitting around in memory for a while before the garbage collector finally gets to it, and in that time
 it
is still holding those files open.  Perhaps an explicit method call
 that
closes any open file handles should occur before setting the reference
 to
null?

After looking at the code, it looks like reader.close() is explicitly
 being
called as long as the closeReader property in SolrIndexSearcher is set
 to
true, but I'm not sure how to check if that is always getting set to
 true or
not.  There is one constructor of SolrIndexSearcher that sets it to
 false.

Any insight here would be appreciated.  Are stale file handles
 something I
should just expect from the JVM?  I've never ran into the "Too Many
 Files
Open" exception before, so this is my first time looking at the lsof
command.  Perhaps I'm reading too much into the data it's showing me.


Mark Baird





Reply via email to