On Apr 16, 2012, at 5:29 AM, Willmer, Alex (PTS) wrote:

> Hello,
> 
> I've been working on an full text search plugin for Trac. At initial setup 
> this indexes the entire Subversion repository by reading every node of every 
> version. During testing we discovered that the indexer was running out of 
> file handles, due to a file handle leak. As far as I can tell each 
> core.Stream(fs.file_contents(.)) instance that was created and not 
> subsequently .read() left an unclosed file handle. To work around this I have 
> monkey patched a Stream.close() method that calls svn_stream_close, which is 
> used in a try/finally block. 

Any chance you could post more of your code?  I'm interested in your main loop 
and all Subversion binding calls -- feel free to omit the full-text indexing 
logic details.  If it's company code that you can't share, uh, feel free to 
accidentally send it to me privately or whip up some pseudo code that shows the 
binding calls ;-)

Side bar: I've found the ``psutil`` helpful in the past for tracking down file 
handle leaks:

        import os
        import psutil
        
        def dump_open_files(p):
                print "open files: ", p.get_open_files()

        def main():
                p = psutil.Process(os.getpid())
                for i in xrange(0, svn.fs.youngest_rev(repo)):
                        print p.get_open_files()
                        ... open stream
                        ... index stream
                        print p.get_open_files()
                        ... close stream
                        print p.get_open_files()
                
        
You get the idea.  If there's a leak you should see a direct correlation 
between whatever stream operations you're doing and what's getting reported by 
p.get_open_files().


Can I just clarify you're using the SWIG bindings and not the ctypes-based 
ones, too?

(I ran into a peculiar memory leak a few months back; I had similar code that 
essentially analyzed every revision in a repository.  So, it's not unfathomable 
that there could be a leak.)

> 
> The work-around has fixed our file-handle leak for, but I believe it points 
> to a bug in the Subversion bindings for which I'll try and provide a patch. 
> Before I file a bug I'd like to check I haven't misunderstood anything:
> 1. In the Python bindings core.Stream doesn't have a .close() method [a]. Is 
> there any reason this might be intentional?

It's bed time so I'm not going to look at the source right now -- but, if 
streams are wrapped in the apr_pool weakref black magic, then yeah, .close() 
could be happening behind the scenes when an object gets garbage collected.

Are you noticing constant memory usage whilst analyzing the repo or does that 
grow as well?


> 2. Disregarding Python, in the Subversion library is it required that every 
> svn_stream_t created (by eg a call to svn_fs_file_contents) is explicitly 
> closed, or is there some automatic clean-up/closure provided by the pool 
> system?
> 

Again, not looking at the code, I'd be inclined to blame the Python bindings, 
not the Subversion libraries.  I'd wager that Subversion takes care of cleaning 
itself up when the stream's pool gets destroyed.  It's pretty good like that.



        Trent.






Reply via email to