Hi all,
I just ran into an interesting "feature" using svndbadmin. Basically
subprocess.Popen in fs.py is not buffered and if you run an strace on it you
can see thousands of system calls to reading one character at a time. I'm not
sure if this was a design issue or not but it certainly impacts performance
for me when using this command. On a small test repo with the viewvc database
fully purged and no buffering:
time /usr/lib/viewvc/bin/svndbadmin -v update /usr/local/vault/
real 3m47.653s
user 1m6.610s
sys 2m15.869s
with buffering at 4096
real 1m50.753s
user 0m25.862s
sys 1m1.334s
Note this is on a very small svn repo with only 36 revisions. The following
diff is to
subversion/bindings/swig/python/svn/fs.py
that shows the simple change needed to achieve the same buffering as the
local system, 4096 in my case.
117c117
< p = _subprocess.Popen(cmd, stdout=_subprocess.PIPE,
---
> p = _subprocess.Popen(cmd, bufsize=-1, stdout=_subprocess.PIPE,
Initially the only reason I could think of for doing it this way is due to
large binary files with no newline character ie the ability to slurp a large
file into RAM and blow something up, but this is running "diff" and for
binary files it will only tell you if they differ ie two one meg binary files
that differ give me:
$ diff test.img test2.img
Binary files test.img and test2.img differ
So is it safe to add buffering here?
--
Harry