Re: {standard input}:1174: Error: inappropriate arguments for opcode 'mpydu'

2020-09-28 Thread Nicolas Pitre
On Sun, 27 Sep 2020, Rong Chen wrote:

> Hi Nicolas,
> 
> Thanks for the feedback, the error still remains with gcc 10.2.0:

I've created the simplest test case that can be. You won't believe it.

Test case:

$ cat test.c
unsigned int test(unsigned int x, unsigned long long y)
{
y /= 0x2000;
if (x > 1)
y *= x;
return y;
}
$ export 
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/0day/gcc-9.3.0-nolibc/arc-elf/libexec/gcc/arc-elf/9.3.0
$ ~/0day/gcc-9.3.0-nolibc/arc-elf/bin/arc-elf-gcc -mcpu=hs38 -mbig-endian -O2 
-c test.c
/tmp/cc0GAomh.s: Assembler messages:
/tmp/cc0GAomh.s:21: Error: inappropriate arguments for opcode 'mpydu'

I know nothing about ARC. Please anyone take it over from here.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Wed, 5 Dec 2007, Harvey Harrison wrote:

> 
> > git repack -a -d --depth=250 --window=250
> > 
> 
> Since I have the whole gcc repo locally I'll give this a shot overnight
> just to see what can be done at the extreme end or things.

Don't forget to add -f as well.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jeff King wrote:

> On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote:
> 
> > The key to converting repositories of this size is RAM. 4GB minimum,
> > more would be better. git-repack is not multi-threaded. There were a
> > few attempts at making it multi-threaded but none were too successful.
> > If I remember right, with loads of RAM, a repack on a 450MB repository
> > was taking about five hours on a 2.8Ghz Core2. But this is something
> > you only have to do once for the import. Later repacks will reuse the
> > original deltas.
> 
> Actually, Nicolas put quite a bit of work into multi-threading the
> repack process; the results have been in master for some time, and will
> be in the soon-to-be-released v1.5.4.
> 
> The downside is that the threading partitions the object space, so the
> resulting size is not necessarily as small (but I don't know that
> anybody has done testing on large repos to find out how large the
> difference is).

Quick guesstimate is in the 1% ballpark.


Nicolas


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Theodore Tso wrote:

> Linus later pointed out that what we *really* should do is at some
> point was to change repack -f to potentially retry to find a better
> delta, but to reuse the existing delta if it was no worse.  That
> automatically does the right thing in the case where you had
> previously done a repack with --window= --depth=,
> but then later try using "gc --agressive", which ends up doing a worse
> job and throwing away the information from the previous repack with
> large window and depth sizes.  Unfortunately no one ever got around to
> implementing that.

I did start looking at it, but there are subtle issues to consider, such 
as making sure not to create delta loops.  Currently this is avoided by 
never involving already reused deltas in new delta chains, except for 
edge base objects.

IOW, this requires some head scratching which I didn't have the time for 
so far.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jeff King wrote:

> On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote:
> 
> > > The downside is that the threading partitions the object space, so the
> > > resulting size is not necessarily as small (but I don't know that
> > > anybody has done testing on large repos to find out how large the
> > > difference is).
> > 
> > Quick guesstimate is in the 1% ballpark.
> 
> Fortunately, we now have numbers. Harvey Harrison reported repacking the
> gcc repo and getting these results:
> 
> > /usr/bin/time git repack -a -d -f --window=250 --depth=250
> >
> > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 
> > 0maxresident)k
> > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps
> >
> > -r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26 
> > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
> > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 
> > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack
> 
> I tried the threaded repack with pack.threads = 3 on a dual-processor
> machine, and got:
> 
>   time git repack -a -d -f --window=250 --depth=250
> 
>   real309m59.849s
>   user377m43.948s
>   sys 8m23.319s
> 
>   -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
>   -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack
> 
> So it is about 5% bigger.

Right.  I should probably revisit that idea of finding deltas across 
partition boundaries to mitigate that loss.  And those partitions could 
be made coarser as well to reduce the number of such partition gaps 
(just increase the value of chunk_size on line 1648 in 
builtin-pack-objects.c).

> What is really disappointing is that we saved
> only about 20% of the time. I didn't sit around watching the stages, but
> my guess is that we spent a long time in the single threaded "writing
> objects" stage with a thrashing delta cache.

Maybe you should run the non threaded repack on the same machine to have 
a good comparison.  And if you have only 2 CPUs, you will have better 
performances with pack.threads = 2, otherwise there'll be wasteful task 
switching going on.

And of course, if the delta cache is being trashed, that might be due to 
the way the existing pack was previously packed.  Hence the current pack 
might impact object _access_ when repacking them.  So for a really 
really fair performance comparison, you'd have to preserve the original 
pack and swap it back before each repack attempt.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Thu, 6 Dec 2007, Jeff King wrote:
> > >
> > > What is really disappointing is that we saved only about 20% of the
> > > time. I didn't sit around watching the stages, but my guess is that we
> > > spent a long time in the single threaded "writing objects" stage with a
> > > thrashing delta cache.
> >
> > I don't think you spent all that much time writing the objects. That part
> > isn't very intensive, it's mostly about the IO.
> >
> > I suspect you may simply be dominated by memory-throughput issues. The
> > delta matching doesn't cache all that well, and using two or more cores
> > isn't going to help all that much if they are largely waiting for memory
> > (and quite possibly also perhaps fighting each other for a shared cache?
> > Is this a Core 2 with the shared L2?)
> 
> When I lasted looked at the code, the problem was in evenly dividing
> the work. I was using a four core machine and most of the time one
> core would end up with 3-5x the work of the lightest loaded core.
> Setting pack.threads up to 20 fixed the problem. With a high number of
> threads I was able to get a 4hr pack to finished in something like
> 1:15.

But as far as I know you didn't try my latest incarnation which has been
available in Git's master branch for a few months already.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > When I lasted looked at the code, the problem was in evenly dividing
> > > the work. I was using a four core machine and most of the time one
> > > core would end up with 3-5x the work of the lightest loaded core.
> > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > threads I was able to get a 4hr pack to finished in something like
> > > 1:15.
> >
> > But as far as I know you didn't try my latest incarnation which has been
> > available in Git's master branch for a few months already.
> 
> I've deleted all my giant packs. Using the kernel pack:
> 4GB Q6600
> 
> Using the current thread pack code I get these results.
> 
> The interesting case is the last one. I set it to 15 threads and
> monitored with 'top'.
> For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> 74-100% was 100% CPU. It never used all for cores. The only other
> things running were top and my desktop. This is the same load
> balancing problem I observed earlier.

Well, that's possible with a window 25 times larger than the default.

The load balancing is solved with a master thread serving relatively 
small object list segments to any work thread that finished with its 
previous segment.  But the size for those segments is currently fixed to 
window * 1000 which is way too large when window == 250.

I have to find a way to auto-tune that segment size somehow.

But with the default window size there should not be any such noticeable 
load balancing problem.

Note that threading only happens in the compression phase.  The count 
and write phase are hardly paralleled.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> > > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > > > When I lasted looked at the code, the problem was in evenly dividing
> > > > > the work. I was using a four core machine and most of the time one
> > > > > core would end up with 3-5x the work of the lightest loaded core.
> > > > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > > > threads I was able to get a 4hr pack to finished in something like
> > > > > 1:15.
> > > >
> > > > But as far as I know you didn't try my latest incarnation which has been
> > > > available in Git's master branch for a few months already.
> > >
> > > I've deleted all my giant packs. Using the kernel pack:
> > > 4GB Q6600
> > >
> > > Using the current thread pack code I get these results.
> > >
> > > The interesting case is the last one. I set it to 15 threads and
> > > monitored with 'top'.
> > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> > > 74-100% was 100% CPU. It never used all for cores. The only other
> > > things running were top and my desktop. This is the same load
> > > balancing problem I observed earlier.
> >
> > Well, that's possible with a window 25 times larger than the default.
> 
> Why did it never use more than three cores?

You have 648366 objects total, and only 647457 of them are subject to 
delta compression.

With a window size of 250 and a default thread segment of window * 1000 
that means only 3 segments will be distributed to threads, hence only 3 
threads with work to do.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> I have a 4.8GB git process with 4GB of physical memory. Everything
> started slowing down a lot when the process got that big. Does git
> really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily
> this happen at 95% completion. With 8GB of memory you should be able
> to do this repack in under 20 minutes.

Probably you have too many cached delta results.  By default, every 
delta smaller than 1000 bytes is kept in memory until the write phase.  
Try using pack.deltacachesize = 256M or lower, or try disabling this 
caching entirely with pack.deltacachelimit = 0.


Nicolas


Re: Git and GCC

2007-12-07 Thread Nicolas Pitre
On Fri, 7 Dec 2007, Jon Smirl wrote:

> On 12/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> > > >
> > > > time git blame -C gcc/regclass.c > /dev/null
> > >
> > > [EMAIL PROTECTED]:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> > >
> > > real1m21.967s
> > > user1m21.329s
> >
> > Well, I was also hoping for a "compared to not-so-aggressive packing"
> > number on the same machine.. IOW, what I was wondering is whether there is
> > a visible performance downside to the deeper delta chains in the 300MB
> > pack vs the (less aggressive) 500MB pack.
> 
> Same machine with a default pack
> 
> [EMAIL PROTECTED]:/video/gcc/.git/objects/pack$ ls -l
> total 2145716
> -r--r--r-- 1 jonsmirl jonsmirl   23667932 2007-12-07 02:03
> pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx
> -r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03
> pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
> [EMAIL PROTECTED]:/video/gcc/.git/objects/pack$
> 
> Delta lengths have virtually no impact. 

I can confirm this.

I just did a repack keeping the default depth of 50 but with window=100 
instead of the default of 10, and the pack shrunk from 2171385413 bytes 
down to 410607140 bytes.

So our default window size is definitely not adequate for the gcc repo.

OTOH, I recall tytso mentioning something about not having much return 
on  a bigger window size in his tests when he proposed to increase the 
default delta depth to 50.  So there is definitely some kind of threshold 
at which point the increased window size stops being advantageous wrt 
the number of cycles involved, and we should find a way to correlate it 
to the data set to have a better default window size than the current 
fixed default.


Nicolas


Re: Git and GCC

2007-12-10 Thread Nicolas Pitre
On Mon, 10 Dec 2007, Gabriel Paubert wrote:

> On Fri, Dec 07, 2007 at 04:47:19PM -0800, Harvey Harrison wrote:
> > Some interesting stats from the highly packed gcc repo.  The long chain
> > lengths very quickly tail off.  Over 60% of the objects have a chain
> > length of 20 or less.  If anyone wants the full list let me know.  I
> > also have included a few other interesting points, the git default
> > depth of 50, my initial guess of 100 and every 10% in the cumulative
> > distribution from 60-100%.
> > 
> > This shows the git default of 50 really isn't that bad, and after
> > about 100 it really starts to get sparse.  
> 
> Do you have a way to know which files have the longest chains?

With 'git verify-pack -v' you get the delta depth for each object.
Then you can use 'git show' with the object SHA1 to see its content.

> I have a suspiscion that the ChangeLog* files are among them,
> not only because they are, almost without exception, only modified
> by prepending text to the previous version (and a fairly small amount
> compared to the size of the file), and therefore the diff is simple
> (a single hunk) so that the limit on chain depth is probably what
> causes a new copy to be created. 

My gcc repo is currently repacked with a max delta depth of 50, and 
a quick sample of those objects at the depth limit does indeed show the 
content of the ChangeLog file.  But I have occurrences of the root 
directory tree object too, and the "GCC machine description for IA-32" 
content as well.

But yes, the really deep delta chains are most certainly going to 
contain those ChangeLog files.

> Besides that these files grow quite large and become some of the 
> largest files in the tree, and at least one of them is changed 
> for every commit. This leads again to many versions of fairly 
> large files.
> 
> If this guess is right, this implies that most of the size gains
> from longer chains comes from having less copies of the ChangeLog*
> files. From a performance point of view, it is rather favourable
> since the differences are simple. This would also explain why
> the window parameter has little effect.

Well, actually the window parameter does have big effects.  For instance 
the default of 10 is completely inadequate for the gcc repo, since 
changing the window size from 10 to 100 made the corresponding pack 
shrink from 2.1GB down to 400MB, with the same max delta depth.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.

Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. 
And gcc people with an interest in Git internals are probably already on 
the Git mailing list.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> 10%   30  828M
> 20%   15  831M
> 30%   10  834M
> 40%   50  1014M
> 50%   80  1086M
> 60%   80  1500M
> 70% 200  1.53G
> 80% 200  1.85G
> 90% 260  1.87G
> 95% 520  1.97G
> 100% 1335 2.24G
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with 
> gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.

Did you mean the glibc allocator?

> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread? Clock time is obviously
> faster. Are the threads working too close to each other in memory and
> bouncing cache lines between the cores? Q6600 is just two E6600s in
> the same package, the caches are not shared.

Of course there'll always be a certain amount of wasted cycles when 
threaded.  The locking overhead, the extra contention for IO, etc.  So 
12% overhead (3% per thread) when using 4 threads is not that bad I 
would say.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I really don't know.

Did you try with pack.deltacachesize set to 1 ?

And yet, this is still missing the actual issue.  The issue being that 
the 2.1GB pack as a _source_ doesn't cause as much memory to be 
allocated even if the _result_ pack ends up being the same.

I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
Now that it has been repacked, I can't repack it anymore, even when 
single threaded, as it start crowling into swap fairly quickly.  It is 
really non intuitive and actually senseless that Git would require twice 
as much RAM to deal with a pack that is 7 times smaller.


Nicolas (still puzzled)


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> And yet, this is still missing the actual issue.  The issue being that 
> the 2.1GB pack as a _source_ doesn't cause as much memory to be 
> allocated even if the _result_ pack ends up being the same.
> 
> I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
> Now that it has been repacked, I can't repack it anymore, even when 
> single threaded, as it start crowling into swap fairly quickly.  It is 
> really non intuitive and actually senseless that Git would require twice 
> as much RAM to deal with a pack that is 7 times smaller.

OK, here's something else for you to try:

core.deltabasecachelimit=0
pack.threads=2
pack.deltacachesize=1

With that I'm able to repack the small gcc pack on my machine with 1GB 
of ram using:

git repack -a -f -d --window=250 --depth=250

and top reports a ~700m virt and ~500m res without hitting swap at all.
It is only at 25% so far, but I was unable to get that far before.

Would be curious to know what you get with 4 threads on your machine.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> OK, here's something else for you to try:
> 
>   core.deltabasecachelimit=0
>   pack.threads=2
>   pack.deltacachesize=1
> 
> With that I'm able to repack the small gcc pack on my machine with 1GB 
> of ram using:
> 
>   git repack -a -f -d --window=250 --depth=250
> 
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.

Well, around 55% memory usage skyrocketed to 1.6GB and the system went 
deep into swap.  So I restarted it with no threads.

Nicolas (even more puzzled)


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Linus Torvalds wrote:

> That said, I suspect there are a few things fighting you:
> 
>  - threading is hard. I haven't looked a lot at the changes Nico did to do 
>a threaded object packer, but what I've seen does not convince me it is 
>correct. The "trg_entry" accesses are *mostly* protected with 
>"cache_lock", but nothing else really seems to be, so quite frankly, I 
>wouldn't trust the threaded version very much. It's off by default, and 
>for a good reason, I think.

I beg to differ (of course, since I always know precisely what I do, and 
like you, my code never has bugs).

Seriously though, the trg_entry has not to be protected at all.  Why? 
Simply because each thread has its own exclusive set of objects which no 
other threads ever mess with.  They never overlap.

>For example: the packing code does this:
> 
>   if (!src->data) {
>   read_lock();
>   src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
>   read_unlock();
>   ...
> 
>and that's racy. If two threads come in at roughly the same time and 
>see a NULL src->data, theÿ́'ll both get the lock, and they'll both 
>(serially) try to fill it in. It will all *work*, but one of them will 
>have done unnecessary work, and one of them will have their result 
>thrown away and leaked.

No.  Once again, it is impossible for two threads to ever see the same 
src->data at all.  The lock is there simply because read_sha1_file() is 
not reentrant.

>Are you hitting issues like this? I dunno. The object sorting means 
>that different threads normally shouldn't look at the same objects (not 
>even the sources), so probably not, but basically, I wouldn't trust the 
>threading 100%. It needs work, and it needs to stay off by default.

For now it is, but I wouldn't say it really needs significant work at 
this point.  The latest thread patches were more about tuning than 
correctness.

What the threading could be doing, though, is uncovering some other 
bugs, like in the pack mmap windowing code for example.  Although that 
code is serialized by the read lock above, the fact that multiple 
threads are hammering on it in turns means that the mmap window is 
possibly seeking back and forth much more often than otherwise, possibly 
leaking something in the process.

>  - you're working on a problem that isn't really even worth optimizing 
>that much. The *normal* case is to re-use old deltas, which makes all 
>of the issues you are fighting basically go away (because you only have 
>a few _incremental_ objects that need deltaing). 
> 
>In other words: the _real_ optimizations have already been done, and 
>are done elsewhere, and are much smarter (the best way to optimize X is 
>not to make X run fast, but to avoid doing X in the first place!). The 
>thing you are trying to work with is the one-time-only case where you 
>explicitly disable that big and important optimization, and then you 
>complain about the end result being slow!
> 
>It's like saying that you're compiling with extreme debugging and no
>optimizations, and then complaining that the end result doesn't run as 
>fast as if you used -O2. Except this is a hundred times worse, because 
>you literally asked git to do the really expensive thing that it really 
>really doesn't want to do ;)

Linus, please pay attention to the _actual_ important issue here.

Sure I've been tuning the threading code in parallel to the attempt to 
debug this memory usage issue.

BUT.  The point is that repacking the gcc repo using "git repack -a -f 
--window=250" has a radically different memory usage profile whether you 
do the repack on the earlier 2.1GB pack or the later 300MB pack.  
_That_ is the issue.  Ironically, it is the 300MB pack that causes the 
repack to blow memory usage out of proportion.

And in both cases, the threading code has to do the same 
work whether or not the original pack was densely packed or not since -f 
throws away every existing deltas anyway.

So something is fishy elsewhere than in the packing code.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, David Miller wrote:

> From: Nicolas Pitre <[EMAIL PROTECTED]>
> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
> 
> > BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> > --window=250" has a radically different memory usage profile whether you 
> > do the repack on the earlier 2.1GB pack or the later 300MB pack.  
> 
> If you repack on the smaller pack file, git has to expand more stuff
> internally in order to search the deltas, whereas with the larger pack
> file I bet git has to less often undelta'ify to get base objects blobs
> for delta search.

Of course.  I came to that conclusion two days ago.  And despite being 
pretty familiar with the involved code (I wrote part of it myself) I 
just can't spot anything wrong with it so far.

But somehow the threading code keep distracting people from that issue 
since it gets to do the same work whether or not the source pack is 
densely packed or not.

Nicolas 
(who wish he had access to a much faster machine to investigate this issue)


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.

I disagree.  This is insane.

> Google allocator appears to be much better at controlling fragmentation.

Indeed.  And if fragmentation is indeed wasting half of Git's memory 
usage then we'll have to come with a custom memory allocator.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

No.  That would be the equivalent of holding each revision of all files 
uncompressed all at once in memory.

> > That said, I suspect there are a few things fighting you:
> >
> >  - threading is hard. I haven't looked a lot at the changes Nico did to do
> >a threaded object packer, but what I've seen does not convince me it is
> >correct. The "trg_entry" accesses are *mostly* protected with
> >"cache_lock", but nothing else really seems to be, so quite frankly, I
> >wouldn't trust the threaded version very much. It's off by default, and
> >for a good reason, I think.
> >
> >For example: the packing code does this:
> >
> > if (!src->data) {
> > read_lock();
> > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> > read_unlock();
> > ...
> >
> >and that's racy. If two threads come in at roughly the same time and
> >see a NULL src->data, theÿ́'ll both get the lock, and they'll both
> >(serially) try to fill it in. It will all *work*, but one of them will
> >have done unnecessary work, and one of them will have their result
> >thrown away and leaked.
> 
> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

No it may not.  This theory is wrong as explained before.

> >
> >Are you hitting issues like this? I dunno. The object sorting means
> >that different threads normally shouldn't look at the same objects (not
> >even the sources), so probably not, but basically, I wouldn't trust the
> >threading 100%. It needs work, and it needs to stay off by default.
> >
> >  - you're working on a problem that isn't really even worth optimizing
> >that much. The *normal* case is to re-use old deltas, which makes all
> >of the issues you are fighting basically go away (because you only have
> >a few _incremental_ objects that need deltaing).
> 
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

It's not only for repack.  Someone just reported git-blame being 
unusable too due to insane memory usage, which I suspect is due to the 
same issue.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > On Tue, 11 Dec 2007, Nicolas Pitre wrote:
> >
> > > OK, here's something else for you to try:
> > >
> > >   core.deltabasecachelimit=0
> > >   pack.threads=2
> > >   pack.deltacachesize=1
> > >
> > > With that I'm able to repack the small gcc pack on my machine with 1GB
> > > of ram using:
> > >
> > >   git repack -a -f -d --window=250 --depth=250
> > >
> > > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > > It is only at 25% so far, but I was unable to get that far before.
> >
> > Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> > deep into swap.  So I restarted it with no threads.
> >
> > Nicolas (even more puzzled)
> 
> On the plus side you are seeing what I see, so it proves I am not imagining 
> it.

Well... This is weird.

It seems that memory fragmentation is really really killing us here.  
The fact that the Google allocator did manage to waste quite less memory 
is a good indicator already.

I did modify the progress display to show accounted memory that was 
allocated vs memory that was freed but still not released to the system.  
At least that gives you an idea of memory allocation and fragmentation 
with glibc in real time:

diff --git a/progress.c b/progress.c
index d19f80c..46ac9ef 100644
--- a/progress.c
+++ b/progress.c
@@ -8,6 +8,7 @@
  * published by the Free Software Foundation.
  */
 
+#include 
 #include "git-compat-util.h"
 #include "progress.h"
 
@@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, 
const char *done)
if (progress->total) {
unsigned percent = n * 100 / progress->total;
if (percent != progress->last_percent || progress_update) {
+   struct mallinfo m = mallinfo();
progress->last_percent = percent;
-   fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
-   progress->title, percent, n,
-   progress->total, tp, eol);
+   fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
+   progress->title, percent, n, progress->total,
+   m.uordblks >> 18, m.fordblks >> 18,
+   tp, eol);
fflush(stderr);
progress_update = 0;
return 1;

This shows that at some point the repack goes into a big memory surge.  
I don't have enough RAM to see how fragmented memory gets though, since 
it starts swapping around 50% done with 2 threads.

With only 1 thread, memory usage grows significantly at around 11% with 
a pretty noticeable slowdown in the progress rate.

So I think the theory goes like this:

There is a block of big objects together in the list somewhere.  
Initially, all those big objects are assigned to thread #1 out of 4.  
Because those objects are big, they get really slow to delta compress, 
and storing them all in a window with 250 slots takes significant 
memory.

Threads 2, 3, and 4 have "easy" work loads, so they complete fairly 
quicly compared to thread #1.  But since the progress display is global 
then you won't notice that one thread is actually crawling slowly.

To keep all threads busy until the end, those threads that are done with 
their work load will steal some work from another thread, choosing the 
one with the largest remaining work.  That is most likely thread #1.  So 
as threads 2, 3, and 4 complete, they will steal from thread 1 and 
populate their own window with those big objects too, and get slow too.

And because all threads gets to work on those big objects towards the 
end, the progress display will then show a significant slowdown, and 
memory usage will almost quadruple.

Add memory fragmentation to that and you have a clogged system.

Solution: 

pack.deltacachesize=1
pack.windowmemory=16M

Limiting the window memory to 16MB will automatically shrink the window 
size when big objects are encountered, therefore keeping much fewer of 
those objects at the same time in memory, which in turn means they will 
be processed much more quickly.  And somehow that must help with memory 
fragmentation as well.

Setting pack.deltacachesize to 1 is simply to disable the caching of 
delta results entirely which will only slow down the writing phase, but 
I wanted to keep it out of the picture for now.

With the above settings, I'm currently repacking the gcc repo with 2 
threads, and memory allocation never exceeded 700m virt and 400m res, 
while the mallinfo shows about 350MB, and progress has reached 90% which 
has never occurred on this machine with the 300MB source pack so far.


Nicolas


Re: Something is broken in repack

2007-12-12 Thread Nicolas Pitre
On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> Add memory fragmentation to that and you have a clogged system.
> 
> Solution: 
> 
>   pack.deltacachesize=1
>   pack.windowmemory=16M
> 
> Limiting the window memory to 16MB will automatically shrink the window 
> size when big objects are encountered, therefore keeping much fewer of 
> those objects at the same time in memory, which in turn means they will 
> be processed much more quickly.  And somehow that must help with memory 
> fragmentation as well.

OK scrap that.

When I returned to the computer this morning, the repack was 
completed... with a 1.3GB pack instead.

So... The gcc repo apparently really needs a large window to efficiently 
compress those large objects.

But when those large objects are already well deltified and you repack 
again with a large window, somehow the memory allocator is way more 
involved, probably even 
more so when there are several threads in parallel amplifying the issue, 
and things probably get to a point of no return with regard to memory 
fragmentation after a while.

So... my conclusion is that the glibc allocator has fragmentation issues 
with this work load, given the notable difference with the Google 
allocator, which itself might not be completely immune to fragmentation 
issues of its own.  And because the gcc repo requires a large window of 
big objects to get good compression, then you're better not using 4 
threads to repack it with -a -f.  The fact that the size of the source 
pack has such an influence is probably only because the increased usage 
of the delta base object cache is playing a role in the global memory 
allocation pattern, allowing for the bad fragmentation issue to occur.

If you could run one last test with the mallinfo patch I posted, without 
the pack.windowmemory setting, and adding the reported values along with 
those from top, then we could formally conclude to memory fragmentation 
issues.

So I don't think Git itself is actually bad.  The gcc repo most 
certainly constitute a nasty use case for memory allocators, but I don't 
think there is much we can do about it besides possibly implementing our 
own memory allocator with active defragmentation where possible (read 
memcpy) at some point to give glibc's allocator some chance to breathe a 
bit more.

In the mean time you might have to use only one thread and lots of 
memory to repack the gcc repo, or find the perfect memory allocator to 
be used with Git.  After all, packing the whole gcc history to around 
230MB is quite a stunt but it requires sufficient resources to 
achieve it. Fortunately, like Linus said, such a wholesale repack is not 
something that most users have to do anyway.


Nicolas


Re: Something is broken in repack

2007-12-12 Thread Nicolas Pitre
On Wed, 12 Dec 2007, Nicolas Pitre wrote:

> I did modify the progress display to show accounted memory that was 
> allocated vs memory that was freed but still not released to the system.  
> At least that gives you an idea of memory allocation and fragmentation 
> with glibc in real time:
> 
> diff --git a/progress.c b/progress.c
> index d19f80c..46ac9ef 100644
> --- a/progress.c
> +++ b/progress.c
> @@ -8,6 +8,7 @@
>   * published by the Free Software Foundation.
>   */
>  
> +#include 
>  #include "git-compat-util.h"
>  #include "progress.h"
>  
> @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, 
> const char *done)
>   if (progress->total) {
>   unsigned percent = n * 100 / progress->total;
>   if (percent != progress->last_percent || progress_update) {
> + struct mallinfo m = mallinfo();
>   progress->last_percent = percent;
> - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> - progress->title, percent, n,
> - progress->total, tp, eol);
> + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
> + progress->title, percent, n, progress->total,
> + m.uordblks >> 18, m.fordblks >> 18,
> + tp, eol);

Note: I didn't know what unit of memory those blocks represents, so the 
shift is most probably wrong.


Nicolas


Re: Something is broken in repack

2007-12-14 Thread Nicolas Pitre
On Fri, 14 Dec 2007, Paolo Bonzini wrote:

> > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > all things.
> 
> I found that the .keep file is not transmitted over the network (at least I
> tried with git+ssh:// and http:// protocols), however.

That is a local policy.


Nicolas