Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Steven Bower
Created https://issues.apache.org/jira/browse/SOLR-4831 to capture this issue On Thu, May 16, 2013 at 10:10 AM, Steven Bower wrote: > Looking at the timestamps on the tlog files they seem to have all been > created around the same time (04:55).. starting around this time I start > seeing the ex

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Steven Bower
Looking at the timestamps on the tlog files they seem to have all been created around the same time (04:55).. starting around this time I start seeing the exception below (there were 1628).. in fact its getting tons of these (200k+) but most of the time inside regular commits... 2013-15-05 04:55:0

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Yonik Seeley
See https://issues.apache.org/jira/browse/SOLR-3939 Do you see these log messages from this in your logs? log.info("I may be the new leader - try and sync"); How reproducible is this bug for you? It would be great to know if the patch in the issue fixes things. -Yonik http://lucidworks.co

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:06 PM, Steven Bower wrote: > This leads me to believe that the > TransactionLog is not properly closing all of it's files before getting rid > of the object... I tried some ad hoc tests, and I can't reproduce this behavior yet. There must be some other code path that inc

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Steven Bower
They are visible to ls... On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley wrote: > On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: > > when the TransactionLog objects are dereferenced > > their RandomAccessFile object is not closed.. > > Have the files been deleted (unlinked from the direct

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Steven Bower
There seem to be quite a few places where the RecentUpdates class is used but is not properly created/closed throughout the code... For example in RecoveryStrategy it does this correctly: UpdateLog.RecentUpdates recentUpdates = null; try { recentUpdates = ulog.getRecentUpdates();

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: > when the TransactionLog objects are dereferenced > their RandomAccessFile object is not closed.. Have the files been deleted (unlinked from the directory), or are they still visible via "ls"? -Yonik http://lucidworks.com

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Walter Underwood
Maybe we need a flag in the update handler to ignore commit requests. I just enabled a similar thing for our JVM, because something, somewhere was calling System.gc(). You can completely ignore explicit GC calls or you can turn them into requests for a concurrent GC. A similar setting for Solr

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: > I'm hunting through the UpdateHandler code to try and find where this > happens now.. UpdateLog.addOldLog() -Yonik http://lucidworks.com

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Steven Bower
Most definetly understand the don't commit after each record... unfortunately the data is being fed by another team which I cannot control... Limiting the number of potential tlog files is good but I think there is also an issue in that when the TransactionLog objects are dereferenced their Random

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
Hmmm, we keep open a number of tlog files based on the number of records in each file (so we always have a certain amount of history), but IIRC, the number of tlog files is also capped. Perhaps there is a bug when the limit to tlog files is reached (as opposed to the number of documents in the tlo

Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Steven Bower
We have a system in which a client is sending 1 record at a time (via REST) followed by a commit. This has produced ~65k tlog files and the JVM has run out of file descriptors... I grabbed a heap dump from the JVM and I can see ~52k "unreachable" FileDescriptors... This leads me to believe that the