There seem to be quite a few places where the RecentUpdates class is used but is not properly created/closed throughout the code...
For example in RecoveryStrategy it does this correctly: UpdateLog.RecentUpdates recentUpdates = null; try { recentUpdates = ulog.getRecentUpdates(); recentVersions = recentUpdates.getVersions(ulog.numRecordsToKeep); } catch (Throwable t) { SolrException.log(log, "Corrupt tlog - ignoring. core=" + coreName, t); recentVersions = new ArrayList<Long>(0); } finally { if (recentUpdates != null) { recentUpdates.close(); } } But in a number of other places its used more like this: UpdateLog.RecentUpdates recentUpdates = ulog.getRecentUpdates() try { ... some code ... } finally { recentUpdates.close(); } The problem it would seem is that RecentUpdates.getRecentUpdates() can fail when it calls update() as it is doing IO on the log itself.. in that case you'll get orphaned references to the log... I'm not 100% sure this is my problem.. I'm scouring the logs to see if this codepath was triggered... steve On Wed, May 15, 2013 at 5:26 PM, Walter Underwood <wun...@wunderwood.org>wrote: > Maybe we need a flag in the update handler to ignore commit requests. > > I just enabled a similar thing for our JVM, because something, somewhere > was calling System.gc(). You can completely ignore explicit GC calls or you > can turn them into requests for a concurrent GC. > > A similar setting for Solr might map commit requests to hard commit > (default), soft commit, or none. > > wunder > > On May 15, 2013, at 2:20 PM, Steven Bower wrote: > > > Most definetly understand the don't commit after each record... > > unfortunately the data is being fed by another team which I cannot > > control... > > > > Limiting the number of potential tlog files is good but I think there is > > also an issue in that when the TransactionLog objects are dereferenced > > their RandomAccessFile object is not closed.. thus delaying release of > the > > descriptor until the object is GC'd... > > > > I'm hunting through the UpdateHandler code to try and find where this > > happens now.. > > > > steve > > > > > > On Wed, May 15, 2013 at 5:13 PM, Yonik Seeley <yo...@lucidworks.com> > wrote: > > > >> Hmmm, we keep open a number of tlog files based on the number of > >> records in each file (so we always have a certain amount of history), > >> but IIRC, the number of tlog files is also capped. Perhaps there is a > >> bug when the limit to tlog files is reached (as opposed to the number > >> of documents in the tlog files). > >> > >> I'll see if I can create a test case to reproduce this. > >> > >> Separately, you'll get a lot better performance if you don't commit > >> per update of course (or at least use something like commitWithin). > >> > >> -Yonik > >> http://lucidworks.com > >> > >> On Wed, May 15, 2013 at 5:06 PM, Steven Bower <sbo...@alcyon.net> > wrote: > >>> We have a system in which a client is sending 1 record at a time (via > >> REST) > >>> followed by a commit. This has produced ~65k tlog files and the JVM has > >> run > >>> out of file descriptors... I grabbed a heap dump from the JVM and I can > >> see > >>> ~52k "unreachable" FileDescriptors... This leads me to believe that the > >>> TransactionLog is not properly closing all of it's files before getting > >> rid > >>> of the object... > >>> > >>> I've verified with lsof that indeed there are ~60k tlog files that are > >> open > >>> currently.. > >>> > >>> This is Solr 4.3.0 > >>> > >>> Thanks, > >>> > >>> steve > >> > > > > >