Re: Git and GCC

2007-12-05 Thread Jeff King
On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote:

> The key to converting repositories of this size is RAM. 4GB minimum,
> more would be better. git-repack is not multi-threaded. There were a
> few attempts at making it multi-threaded but none were too successful.
> If I remember right, with loads of RAM, a repack on a 450MB repository
> was taking about five hours on a 2.8Ghz Core2. But this is something
> you only have to do once for the import. Later repacks will reuse the
> original deltas.

Actually, Nicolas put quite a bit of work into multi-threading the
repack process; the results have been in master for some time, and will
be in the soon-to-be-released v1.5.4.

The downside is that the threading partitions the object space, so the
resulting size is not necessarily as small (but I don't know that
anybody has done testing on large repos to find out how large the
difference is).

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote:

> > The downside is that the threading partitions the object space, so the
> > resulting size is not necessarily as small (but I don't know that
> > anybody has done testing on large repos to find out how large the
> > difference is).
> 
> Quick guesstimate is in the 1% ballpark.

Fortunately, we now have numbers. Harvey Harrison reported repacking the
gcc repo and getting these results:

> /usr/bin/time git repack -a -d -f --window=250 --depth=250
>
> 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps
>
> -r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26 
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

I tried the threaded repack with pack.threads = 3 on a dual-processor
machine, and got:

  time git repack -a -d -f --window=250 --depth=250

  real309m59.849s
  user377m43.948s
  sys 8m23.319s

  -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 
pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
  -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 
pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack

So it is about 5% bigger. What is really disappointing is that we saved
only about 20% of the time. I didn't sit around watching the stages, but
my guess is that we spent a long time in the single threaded "writing
objects" stage with a thrashing delta cache.

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 01:02:58PM -0500, Nicolas Pitre wrote:

> > What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> Maybe you should run the non threaded repack on the same machine to have 
> a good comparison.

Sorry, I should have been more clear. By "saved" I meant "we needed N
minutes of CPU time, but took only M minutes of real time to use it."
IOW, if we assume that the threading had zero overhead and that we were
completely CPU bound, then the task would have taken N minutes of real
time. And obviously those assumptions aren't true, but I was attempting
to say "it would have been at most N minutes of real time to do it
single-threaded."

> And if you have only 2 CPUs, you will have better performances with
> pack.threads = 2, otherwise there'll be wasteful task switching going
> on.

Yes, but balanced by one thread running out of data way earlier than the
other, and completing the task with only one CPU. I am doing a 4-thread
test on a quad-CPU right now, and I will also try it with threads=1 and
threads=6 for comparison.

> And of course, if the delta cache is being trashed, that might be due to 
> the way the existing pack was previously packed.  Hence the current pack 
> might impact object _access_ when repacking them.  So for a really 
> really fair performance comparison, you'd have to preserve the original 
> pack and swap it back before each repack attempt.

I am working each time from the pack generated by fetching from
git://git.infradead.org/gcc.git.

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote:

> > So it is about 5% bigger. What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> If someone can give me a good way to run this test case I can
> have my 64-cpu Niagara-2 box crunch on this and see how fast
> it goes and how much larger the resulting pack file is.

That would be fun to see. The procedure I am using is this:

# compile recent git master with threaded delta
cd git
echo THREADED_DELTA_SEARCH = 1 >>config.mak
make install

# get the gcc pack
mkdir gcc && cd gcc
git --bare init
git config remote.gcc.url git://git.infradead.org/gcc.git
git config remote.gcc.fetch \
  '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*'
git remote update

# make a copy, so we can run further tests from a known point
cd ..
cp -a gcc test

# and test multithreaded large depth/window repacking
cd test
git config pack.threads 4
time git repack -a -d -f --window=250 --depth=250

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 10:35:22AM -0800, Linus Torvalds wrote:

> > What is really disappointing is that we saved only about 20% of the 
> > time. I didn't sit around watching the stages, but my guess is that we 
> > spent a long time in the single threaded "writing objects" stage with a 
> > thrashing delta cache.
> 
> I don't think you spent all that much time writing the objects. That part 
> isn't very intensive, it's mostly about the IO.

It can get nasty with super-long deltas thrashing the cache, I think.
But in this case, I think it ended up being just a poor division of
labor caused by the chunk_size parameter using the quite large window
size (see elsewhere in the thread for discussion).

> I suspect you may simply be dominated by memory-throughput issues. The 
> delta matching doesn't cache all that well, and using two or more cores 
> isn't going to help all that much if they are largely waiting for memory 
> (and quite possibly also perhaps fighting each other for a shared cache? 
> Is this a Core 2 with the shared L2?)

I think the chunk_size more or less explains it. I have had reasonable
success keeping both CPUs busy on similar tasks in the past (but with
smaller window sizes).

For reference, it was a Core 2 Duo; do they all share L2, or is there
something I can look for in /proc/cpuinfo?

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Fri, Dec 07, 2007 at 01:50:47AM -0500, Jeff King wrote:

> Yes, but balanced by one thread running out of data way earlier than the
> other, and completing the task with only one CPU. I am doing a 4-thread
> test on a quad-CPU right now, and I will also try it with threads=1 and
> threads=6 for comparison.

Hmm. As this has been running, I read the rest of the thread, and it
looks like Jon Smirl has already posted the interesting numbers. So
nevermind, unless there is something particular you would like to see.

-Peff