Re: {standard input}:1174: Error: inappropriate arguments for opcode 'mpydu'
On Sun, 27 Sep 2020, Rong Chen wrote: > Hi Nicolas, > > Thanks for the feedback, the error still remains with gcc 10.2.0: I've created the simplest test case that can be. You won't believe it. Test case: $ cat test.c unsigned int test(unsigned int x, unsigned long long y) { y /= 0x2000; if (x > 1) y *= x; return y; } $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/0day/gcc-9.3.0-nolibc/arc-elf/libexec/gcc/arc-elf/9.3.0 $ ~/0day/gcc-9.3.0-nolibc/arc-elf/bin/arc-elf-gcc -mcpu=hs38 -mbig-endian -O2 -c test.c /tmp/cc0GAomh.s: Assembler messages: /tmp/cc0GAomh.s:21: Error: inappropriate arguments for opcode 'mpydu' I know nothing about ARC. Please anyone take it over from here. Nicolas
Re: Git and GCC
On Wed, 5 Dec 2007, Harvey Harrison wrote: > > > git repack -a -d --depth=250 --window=250 > > > > Since I have the whole gcc repo locally I'll give this a shot overnight > just to see what can be done at the extreme end or things. Don't forget to add -f as well. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jeff King wrote: > On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote: > > > The key to converting repositories of this size is RAM. 4GB minimum, > > more would be better. git-repack is not multi-threaded. There were a > > few attempts at making it multi-threaded but none were too successful. > > If I remember right, with loads of RAM, a repack on a 450MB repository > > was taking about five hours on a 2.8Ghz Core2. But this is something > > you only have to do once for the import. Later repacks will reuse the > > original deltas. > > Actually, Nicolas put quite a bit of work into multi-threading the > repack process; the results have been in master for some time, and will > be in the soon-to-be-released v1.5.4. > > The downside is that the threading partitions the object space, so the > resulting size is not necessarily as small (but I don't know that > anybody has done testing on large repos to find out how large the > difference is). Quick guesstimate is in the 1% ballpark. Nicolas
Re: [PATCH] gc --aggressive: make it really aggressive
On Thu, 6 Dec 2007, Theodore Tso wrote: > Linus later pointed out that what we *really* should do is at some > point was to change repack -f to potentially retry to find a better > delta, but to reuse the existing delta if it was no worse. That > automatically does the right thing in the case where you had > previously done a repack with --window= --depth=, > but then later try using "gc --agressive", which ends up doing a worse > job and throwing away the information from the previous repack with > large window and depth sizes. Unfortunately no one ever got around to > implementing that. I did start looking at it, but there are subtle issues to consider, such as making sure not to create delta loops. Currently this is avoided by never involving already reused deltas in new delta chains, except for edge base objects. IOW, this requires some head scratching which I didn't have the time for so far. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jeff King wrote: > On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote: > > > > The downside is that the threading partitions the object space, so the > > > resulting size is not necessarily as small (but I don't know that > > > anybody has done testing on large repos to find out how large the > > > difference is). > > > > Quick guesstimate is in the 1% ballpark. > > Fortunately, we now have numbers. Harvey Harrison reported repacking the > gcc repo and getting these results: > > > /usr/bin/time git repack -a -d -f --window=250 --depth=250 > > > > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata > > 0maxresident)k > > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps > > > > -r--r--r-- 1 hharrison hharrison 29091872 2007-12-06 07:26 > > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx > > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 > > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack > > I tried the threaded repack with pack.threads = 3 on a dual-processor > machine, and got: > > time git repack -a -d -f --window=250 --depth=250 > > real309m59.849s > user377m43.948s > sys 8m23.319s > > -r--r--r-- 1 peff peff 28570088 2007-12-06 10:11 > pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx > -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 > pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack > > So it is about 5% bigger. Right. I should probably revisit that idea of finding deltas across partition boundaries to mitigate that loss. And those partitions could be made coarser as well to reduce the number of such partition gaps (just increase the value of chunk_size on line 1648 in builtin-pack-objects.c). > What is really disappointing is that we saved > only about 20% of the time. I didn't sit around watching the stages, but > my guess is that we spent a long time in the single threaded "writing > objects" stage with a thrashing delta cache. Maybe you should run the non threaded repack on the same machine to have a good comparison. And if you have only 2 CPUs, you will have better performances with pack.threads = 2, otherwise there'll be wasteful task switching going on. And of course, if the delta cache is being trashed, that might be due to the way the existing pack was previously packed. Hence the current pack might impact object _access_ when repacking them. So for a really really fair performance comparison, you'd have to preserve the original pack and swap it back before each repack attempt. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > > On Thu, 6 Dec 2007, Jeff King wrote: > > > > > > What is really disappointing is that we saved only about 20% of the > > > time. I didn't sit around watching the stages, but my guess is that we > > > spent a long time in the single threaded "writing objects" stage with a > > > thrashing delta cache. > > > > I don't think you spent all that much time writing the objects. That part > > isn't very intensive, it's mostly about the IO. > > > > I suspect you may simply be dominated by memory-throughput issues. The > > delta matching doesn't cache all that well, and using two or more cores > > isn't going to help all that much if they are largely waiting for memory > > (and quite possibly also perhaps fighting each other for a shared cache? > > Is this a Core 2 with the shared L2?) > > When I lasted looked at the code, the problem was in evenly dividing > the work. I was using a four core machine and most of the time one > core would end up with 3-5x the work of the lightest loaded core. > Setting pack.threads up to 20 fixed the problem. With a high number of > threads I was able to get a 4hr pack to finished in something like > 1:15. But as far as I know you didn't try my latest incarnation which has been available in Git's master branch for a few months already. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > > When I lasted looked at the code, the problem was in evenly dividing > > > the work. I was using a four core machine and most of the time one > > > core would end up with 3-5x the work of the lightest loaded core. > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > threads I was able to get a 4hr pack to finished in something like > > > 1:15. > > > > But as far as I know you didn't try my latest incarnation which has been > > available in Git's master branch for a few months already. > > I've deleted all my giant packs. Using the kernel pack: > 4GB Q6600 > > Using the current thread pack code I get these results. > > The interesting case is the last one. I set it to 15 threads and > monitored with 'top'. > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > 74-100% was 100% CPU. It never used all for cores. The only other > things running were top and my desktop. This is the same load > balancing problem I observed earlier. Well, that's possible with a window 25 times larger than the default. The load balancing is solved with a master thread serving relatively small object list segments to any work thread that finished with its previous segment. But the size for those segments is currently fixed to window * 1000 which is way too large when window == 250. I have to find a way to auto-tune that segment size somehow. But with the default window size there should not be any such noticeable load balancing problem. Note that threading only happens in the compression phase. The count and write phase are hardly paralleled. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jon Smirl wrote: > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > > > > When I lasted looked at the code, the problem was in evenly dividing > > > > > the work. I was using a four core machine and most of the time one > > > > > core would end up with 3-5x the work of the lightest loaded core. > > > > > Setting pack.threads up to 20 fixed the problem. With a high number of > > > > > threads I was able to get a 4hr pack to finished in something like > > > > > 1:15. > > > > > > > > But as far as I know you didn't try my latest incarnation which has been > > > > available in Git's master branch for a few months already. > > > > > > I've deleted all my giant packs. Using the kernel pack: > > > 4GB Q6600 > > > > > > Using the current thread pack code I get these results. > > > > > > The interesting case is the last one. I set it to 15 threads and > > > monitored with 'top'. > > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and > > > 74-100% was 100% CPU. It never used all for cores. The only other > > > things running were top and my desktop. This is the same load > > > balancing problem I observed earlier. > > > > Well, that's possible with a window 25 times larger than the default. > > Why did it never use more than three cores? You have 648366 objects total, and only 647457 of them are subject to delta compression. With a window size of 250 and a default thread segment of window * 1000 that means only 3 segments will be distributed to threads, hence only 3 threads with work to do. Nicolas
Re: Git and GCC
On Thu, 6 Dec 2007, Jon Smirl wrote: > I have a 4.8GB git process with 4GB of physical memory. Everything > started slowing down a lot when the process got that big. Does git > really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily > this happen at 95% completion. With 8GB of memory you should be able > to do this repack in under 20 minutes. Probably you have too many cached delta results. By default, every delta smaller than 1000 bytes is kept in memory until the write phase. Try using pack.deltacachesize = 256M or lower, or try disabling this caching entirely with pack.deltacachelimit = 0. Nicolas
Re: Git and GCC
On Fri, 7 Dec 2007, Jon Smirl wrote: > On 12/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > > On Thu, 6 Dec 2007, Jon Smirl wrote: > > > > > > > > time git blame -C gcc/regclass.c > /dev/null > > > > > > [EMAIL PROTECTED]:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null > > > > > > real1m21.967s > > > user1m21.329s > > > > Well, I was also hoping for a "compared to not-so-aggressive packing" > > number on the same machine.. IOW, what I was wondering is whether there is > > a visible performance downside to the deeper delta chains in the 300MB > > pack vs the (less aggressive) 500MB pack. > > Same machine with a default pack > > [EMAIL PROTECTED]:/video/gcc/.git/objects/pack$ ls -l > total 2145716 > -r--r--r-- 1 jonsmirl jonsmirl 23667932 2007-12-07 02:03 > pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx > -r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03 > pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack > [EMAIL PROTECTED]:/video/gcc/.git/objects/pack$ > > Delta lengths have virtually no impact. I can confirm this. I just did a repack keeping the default depth of 50 but with window=100 instead of the default of 10, and the pack shrunk from 2171385413 bytes down to 410607140 bytes. So our default window size is definitely not adequate for the gcc repo. OTOH, I recall tytso mentioning something about not having much return on a bigger window size in his tests when he proposed to increase the default delta depth to 50. So there is definitely some kind of threshold at which point the increased window size stops being advantageous wrt the number of cycles involved, and we should find a way to correlate it to the data set to have a better default window size than the current fixed default. Nicolas
Re: Git and GCC
On Mon, 10 Dec 2007, Gabriel Paubert wrote: > On Fri, Dec 07, 2007 at 04:47:19PM -0800, Harvey Harrison wrote: > > Some interesting stats from the highly packed gcc repo. The long chain > > lengths very quickly tail off. Over 60% of the objects have a chain > > length of 20 or less. If anyone wants the full list let me know. I > > also have included a few other interesting points, the git default > > depth of 50, my initial guess of 100 and every 10% in the cumulative > > distribution from 60-100%. > > > > This shows the git default of 50 really isn't that bad, and after > > about 100 it really starts to get sparse. > > Do you have a way to know which files have the longest chains? With 'git verify-pack -v' you get the delta depth for each object. Then you can use 'git show' with the object SHA1 to see its content. > I have a suspiscion that the ChangeLog* files are among them, > not only because they are, almost without exception, only modified > by prepending text to the previous version (and a fairly small amount > compared to the size of the file), and therefore the diff is simple > (a single hunk) so that the limit on chain depth is probably what > causes a new copy to be created. My gcc repo is currently repacked with a max delta depth of 50, and a quick sample of those objects at the depth limit does indeed show the content of the ChangeLog file. But I have occurrences of the root directory tree object too, and the "GCC machine description for IA-32" content as well. But yes, the really deep delta chains are most certainly going to contain those ChangeLog files. > Besides that these files grow quite large and become some of the > largest files in the tree, and at least one of them is changed > for every commit. This leads again to many versions of fairly > large files. > > If this guess is right, this implies that most of the size gains > from longer chains comes from having less copies of the ChangeLog* > files. From a performance point of view, it is rather favourable > since the differences are simple. This would also explain why > the window parameter has little effect. Well, actually the window parameter does have big effects. For instance the default of 10 is completely inadequate for the gcc repo, since changing the window size from 10 to 100 made the corresponding pack shrink from 2.1GB down to 400MB, with the same max delta depth. Nicolas
Re: Something is broken in repack
On Tue, 11 Dec 2007, Jon Smirl wrote: > I added the gcc people to the CC, it's their repository. Maybe they > can help up sort this out. Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. And gcc people with an interest in Git internals are probably already on the Git mailing list. Nicolas
Re: Something is broken in repack
On Tue, 11 Dec 2007, Jon Smirl wrote: > Switching to the Google perftools malloc > http://goog-perftools.sourceforge.net/ > > 10% 30 828M > 20% 15 831M > 30% 10 834M > 40% 50 1014M > 50% 80 1086M > 60% 80 1500M > 70% 200 1.53G > 80% 200 1.85G > 90% 260 1.87G > 95% 520 1.97G > 100% 1335 2.24G > > Google allocator knocked 600MB off from memory use. > Memory consumption did not fall during the write out phase like it did with > gcc. > > Since all of this is with the same code except for changing the > threading split, those runs where memory consumption went to 4.5GB > with the gcc allocator must have triggered an extreme problem with > fragmentation. Did you mean the glibc allocator? > Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of > being faster are not true. > > So why does our threaded code take 20 CPU minutes longer (12%) to run > than the same code with a single thread? Clock time is obviously > faster. Are the threads working too close to each other in memory and > bouncing cache lines between the cores? Q6600 is just two E6600s in > the same package, the caches are not shared. Of course there'll always be a certain amount of wasted cycles when threaded. The locking overhead, the extra contention for IO, etc. So 12% overhead (3% per thread) when using 4 threads is not that bad I would say. > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc) > with 4 threads? But only need 950MB with one thread? Where's the extra > gigabyte going? I really don't know. Did you try with pack.deltacachesize set to 1 ? And yet, this is still missing the actual issue. The issue being that the 2.1GB pack as a _source_ doesn't cause as much memory to be allocated even if the _result_ pack ends up being the same. I was able to repack the 2.1GB pack on my machine which has 1GB of ram. Now that it has been repacked, I can't repack it anymore, even when single threaded, as it start crowling into swap fairly quickly. It is really non intuitive and actually senseless that Git would require twice as much RAM to deal with a pack that is 7 times smaller. Nicolas (still puzzled)
Re: Something is broken in repack
On Tue, 11 Dec 2007, Nicolas Pitre wrote: > And yet, this is still missing the actual issue. The issue being that > the 2.1GB pack as a _source_ doesn't cause as much memory to be > allocated even if the _result_ pack ends up being the same. > > I was able to repack the 2.1GB pack on my machine which has 1GB of ram. > Now that it has been repacked, I can't repack it anymore, even when > single threaded, as it start crowling into swap fairly quickly. It is > really non intuitive and actually senseless that Git would require twice > as much RAM to deal with a pack that is 7 times smaller. OK, here's something else for you to try: core.deltabasecachelimit=0 pack.threads=2 pack.deltacachesize=1 With that I'm able to repack the small gcc pack on my machine with 1GB of ram using: git repack -a -f -d --window=250 --depth=250 and top reports a ~700m virt and ~500m res without hitting swap at all. It is only at 25% so far, but I was unable to get that far before. Would be curious to know what you get with 4 threads on your machine. Nicolas
Re: Something is broken in repack
On Tue, 11 Dec 2007, Nicolas Pitre wrote: > OK, here's something else for you to try: > > core.deltabasecachelimit=0 > pack.threads=2 > pack.deltacachesize=1 > > With that I'm able to repack the small gcc pack on my machine with 1GB > of ram using: > > git repack -a -f -d --window=250 --depth=250 > > and top reports a ~700m virt and ~500m res without hitting swap at all. > It is only at 25% so far, but I was unable to get that far before. Well, around 55% memory usage skyrocketed to 1.6GB and the system went deep into swap. So I restarted it with no threads. Nicolas (even more puzzled)
Re: Something is broken in repack
On Tue, 11 Dec 2007, Linus Torvalds wrote: > That said, I suspect there are a few things fighting you: > > - threading is hard. I haven't looked a lot at the changes Nico did to do >a threaded object packer, but what I've seen does not convince me it is >correct. The "trg_entry" accesses are *mostly* protected with >"cache_lock", but nothing else really seems to be, so quite frankly, I >wouldn't trust the threaded version very much. It's off by default, and >for a good reason, I think. I beg to differ (of course, since I always know precisely what I do, and like you, my code never has bugs). Seriously though, the trg_entry has not to be protected at all. Why? Simply because each thread has its own exclusive set of objects which no other threads ever mess with. They never overlap. >For example: the packing code does this: > > if (!src->data) { > read_lock(); > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); > read_unlock(); > ... > >and that's racy. If two threads come in at roughly the same time and >see a NULL src->data, theÿ́'ll both get the lock, and they'll both >(serially) try to fill it in. It will all *work*, but one of them will >have done unnecessary work, and one of them will have their result >thrown away and leaked. No. Once again, it is impossible for two threads to ever see the same src->data at all. The lock is there simply because read_sha1_file() is not reentrant. >Are you hitting issues like this? I dunno. The object sorting means >that different threads normally shouldn't look at the same objects (not >even the sources), so probably not, but basically, I wouldn't trust the >threading 100%. It needs work, and it needs to stay off by default. For now it is, but I wouldn't say it really needs significant work at this point. The latest thread patches were more about tuning than correctness. What the threading could be doing, though, is uncovering some other bugs, like in the pack mmap windowing code for example. Although that code is serialized by the read lock above, the fact that multiple threads are hammering on it in turns means that the mmap window is possibly seeking back and forth much more often than otherwise, possibly leaking something in the process. > - you're working on a problem that isn't really even worth optimizing >that much. The *normal* case is to re-use old deltas, which makes all >of the issues you are fighting basically go away (because you only have >a few _incremental_ objects that need deltaing). > >In other words: the _real_ optimizations have already been done, and >are done elsewhere, and are much smarter (the best way to optimize X is >not to make X run fast, but to avoid doing X in the first place!). The >thing you are trying to work with is the one-time-only case where you >explicitly disable that big and important optimization, and then you >complain about the end result being slow! > >It's like saying that you're compiling with extreme debugging and no >optimizations, and then complaining that the end result doesn't run as >fast as if you used -O2. Except this is a hundred times worse, because >you literally asked git to do the really expensive thing that it really >really doesn't want to do ;) Linus, please pay attention to the _actual_ important issue here. Sure I've been tuning the threading code in parallel to the attempt to debug this memory usage issue. BUT. The point is that repacking the gcc repo using "git repack -a -f --window=250" has a radically different memory usage profile whether you do the repack on the earlier 2.1GB pack or the later 300MB pack. _That_ is the issue. Ironically, it is the 300MB pack that causes the repack to blow memory usage out of proportion. And in both cases, the threading code has to do the same work whether or not the original pack was densely packed or not since -f throws away every existing deltas anyway. So something is fishy elsewhere than in the packing code. Nicolas
Re: Something is broken in repack
On Tue, 11 Dec 2007, David Miller wrote: > From: Nicolas Pitre <[EMAIL PROTECTED]> > Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) > > > BUT. The point is that repacking the gcc repo using "git repack -a -f > > --window=250" has a radically different memory usage profile whether you > > do the repack on the earlier 2.1GB pack or the later 300MB pack. > > If you repack on the smaller pack file, git has to expand more stuff > internally in order to search the deltas, whereas with the larger pack > file I bet git has to less often undelta'ify to get base objects blobs > for delta search. Of course. I came to that conclusion two days ago. And despite being pretty familiar with the involved code (I wrote part of it myself) I just can't spot anything wrong with it so far. But somehow the threading code keep distracting people from that issue since it gets to do the same work whether or not the source pack is densely packed or not. Nicolas (who wish he had access to a much faster machine to investigate this issue)
Re: Something is broken in repack
On Tue, 11 Dec 2007, Jon Smirl wrote: > This makes sense. Those runs that blew up to 4.5GB were a combination > of this effect and fragmentation in the gcc allocator. I disagree. This is insane. > Google allocator appears to be much better at controlling fragmentation. Indeed. And if fragmentation is indeed wasting half of Git's memory usage then we'll have to come with a custom memory allocator. > Is there a reasonable scheme to force the chains to only be loaded > once and then shared between worker threads? The memory blow up > appears to be directly correlated with chain length. No. That would be the equivalent of holding each revision of all files uncompressed all at once in memory. > > That said, I suspect there are a few things fighting you: > > > > - threading is hard. I haven't looked a lot at the changes Nico did to do > >a threaded object packer, but what I've seen does not convince me it is > >correct. The "trg_entry" accesses are *mostly* protected with > >"cache_lock", but nothing else really seems to be, so quite frankly, I > >wouldn't trust the threaded version very much. It's off by default, and > >for a good reason, I think. > > > >For example: the packing code does this: > > > > if (!src->data) { > > read_lock(); > > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz); > > read_unlock(); > > ... > > > >and that's racy. If two threads come in at roughly the same time and > >see a NULL src->data, theÿ́'ll both get the lock, and they'll both > >(serially) try to fill it in. It will all *work*, but one of them will > >have done unnecessary work, and one of them will have their result > >thrown away and leaked. > > That may account for the threaded version needing an extra 20 minutes > CPU time. An extra 12% of CPU seems like too much overhead for > threading. Just letting a couple of those long chain compressions be > done twice No it may not. This theory is wrong as explained before. > > > >Are you hitting issues like this? I dunno. The object sorting means > >that different threads normally shouldn't look at the same objects (not > >even the sources), so probably not, but basically, I wouldn't trust the > >threading 100%. It needs work, and it needs to stay off by default. > > > > - you're working on a problem that isn't really even worth optimizing > >that much. The *normal* case is to re-use old deltas, which makes all > >of the issues you are fighting basically go away (because you only have > >a few _incremental_ objects that need deltaing). > > I agree, this problem only occurs when people import giant > repositories. But every time someone hits these problems they declare > git to be screwed up and proceed to thrash it in their blogs. It's not only for repack. Someone just reported git-blame being unusable too due to insane memory usage, which I suspect is due to the same issue. Nicolas
Re: Something is broken in repack
On Tue, 11 Dec 2007, Jon Smirl wrote: > On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > > > OK, here's something else for you to try: > > > > > > core.deltabasecachelimit=0 > > > pack.threads=2 > > > pack.deltacachesize=1 > > > > > > With that I'm able to repack the small gcc pack on my machine with 1GB > > > of ram using: > > > > > > git repack -a -f -d --window=250 --depth=250 > > > > > > and top reports a ~700m virt and ~500m res without hitting swap at all. > > > It is only at 25% so far, but I was unable to get that far before. > > > > Well, around 55% memory usage skyrocketed to 1.6GB and the system went > > deep into swap. So I restarted it with no threads. > > > > Nicolas (even more puzzled) > > On the plus side you are seeing what I see, so it proves I am not imagining > it. Well... This is weird. It seems that memory fragmentation is really really killing us here. The fact that the Google allocator did manage to waste quite less memory is a good indicator already. I did modify the progress display to show accounted memory that was allocated vs memory that was freed but still not released to the system. At least that gives you an idea of memory allocation and fragmentation with glibc in real time: diff --git a/progress.c b/progress.c index d19f80c..46ac9ef 100644 --- a/progress.c +++ b/progress.c @@ -8,6 +8,7 @@ * published by the Free Software Foundation. */ +#include #include "git-compat-util.h" #include "progress.h" @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, const char *done) if (progress->total) { unsigned percent = n * 100 / progress->total; if (percent != progress->last_percent || progress_update) { + struct mallinfo m = mallinfo(); progress->last_percent = percent; - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", - progress->title, percent, n, - progress->total, tp, eol); + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", + progress->title, percent, n, progress->total, + m.uordblks >> 18, m.fordblks >> 18, + tp, eol); fflush(stderr); progress_update = 0; return 1; This shows that at some point the repack goes into a big memory surge. I don't have enough RAM to see how fragmented memory gets though, since it starts swapping around 50% done with 2 threads. With only 1 thread, memory usage grows significantly at around 11% with a pretty noticeable slowdown in the progress rate. So I think the theory goes like this: There is a block of big objects together in the list somewhere. Initially, all those big objects are assigned to thread #1 out of 4. Because those objects are big, they get really slow to delta compress, and storing them all in a window with 250 slots takes significant memory. Threads 2, 3, and 4 have "easy" work loads, so they complete fairly quicly compared to thread #1. But since the progress display is global then you won't notice that one thread is actually crawling slowly. To keep all threads busy until the end, those threads that are done with their work load will steal some work from another thread, choosing the one with the largest remaining work. That is most likely thread #1. So as threads 2, 3, and 4 complete, they will steal from thread 1 and populate their own window with those big objects too, and get slow too. And because all threads gets to work on those big objects towards the end, the progress display will then show a significant slowdown, and memory usage will almost quadruple. Add memory fragmentation to that and you have a clogged system. Solution: pack.deltacachesize=1 pack.windowmemory=16M Limiting the window memory to 16MB will automatically shrink the window size when big objects are encountered, therefore keeping much fewer of those objects at the same time in memory, which in turn means they will be processed much more quickly. And somehow that must help with memory fragmentation as well. Setting pack.deltacachesize to 1 is simply to disable the caching of delta results entirely which will only slow down the writing phase, but I wanted to keep it out of the picture for now. With the above settings, I'm currently repacking the gcc repo with 2 threads, and memory allocation never exceeded 700m virt and 400m res, while the mallinfo shows about 350MB, and progress has reached 90% which has never occurred on this machine with the 300MB source pack so far. Nicolas
Re: Something is broken in repack
On Wed, 12 Dec 2007, Nicolas Pitre wrote: > Add memory fragmentation to that and you have a clogged system. > > Solution: > > pack.deltacachesize=1 > pack.windowmemory=16M > > Limiting the window memory to 16MB will automatically shrink the window > size when big objects are encountered, therefore keeping much fewer of > those objects at the same time in memory, which in turn means they will > be processed much more quickly. And somehow that must help with memory > fragmentation as well. OK scrap that. When I returned to the computer this morning, the repack was completed... with a 1.3GB pack instead. So... The gcc repo apparently really needs a large window to efficiently compress those large objects. But when those large objects are already well deltified and you repack again with a large window, somehow the memory allocator is way more involved, probably even more so when there are several threads in parallel amplifying the issue, and things probably get to a point of no return with regard to memory fragmentation after a while. So... my conclusion is that the glibc allocator has fragmentation issues with this work load, given the notable difference with the Google allocator, which itself might not be completely immune to fragmentation issues of its own. And because the gcc repo requires a large window of big objects to get good compression, then you're better not using 4 threads to repack it with -a -f. The fact that the size of the source pack has such an influence is probably only because the increased usage of the delta base object cache is playing a role in the global memory allocation pattern, allowing for the bad fragmentation issue to occur. If you could run one last test with the mallinfo patch I posted, without the pack.windowmemory setting, and adding the reported values along with those from top, then we could formally conclude to memory fragmentation issues. So I don't think Git itself is actually bad. The gcc repo most certainly constitute a nasty use case for memory allocators, but I don't think there is much we can do about it besides possibly implementing our own memory allocator with active defragmentation where possible (read memcpy) at some point to give glibc's allocator some chance to breathe a bit more. In the mean time you might have to use only one thread and lots of memory to repack the gcc repo, or find the perfect memory allocator to be used with Git. After all, packing the whole gcc history to around 230MB is quite a stunt but it requires sufficient resources to achieve it. Fortunately, like Linus said, such a wholesale repack is not something that most users have to do anyway. Nicolas
Re: Something is broken in repack
On Wed, 12 Dec 2007, Nicolas Pitre wrote: > I did modify the progress display to show accounted memory that was > allocated vs memory that was freed but still not released to the system. > At least that gives you an idea of memory allocation and fragmentation > with glibc in real time: > > diff --git a/progress.c b/progress.c > index d19f80c..46ac9ef 100644 > --- a/progress.c > +++ b/progress.c > @@ -8,6 +8,7 @@ > * published by the Free Software Foundation. > */ > > +#include > #include "git-compat-util.h" > #include "progress.h" > > @@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, > const char *done) > if (progress->total) { > unsigned percent = n * 100 / progress->total; > if (percent != progress->last_percent || progress_update) { > + struct mallinfo m = mallinfo(); > progress->last_percent = percent; > - fprintf(stderr, "%s: %3u%% (%u/%u)%s%s", > - progress->title, percent, n, > - progress->total, tp, eol); > + fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s", > + progress->title, percent, n, progress->total, > + m.uordblks >> 18, m.fordblks >> 18, > + tp, eol); Note: I didn't know what unit of memory those blocks represents, so the shift is most probably wrong. Nicolas
Re: Something is broken in repack
On Fri, 14 Dec 2007, Paolo Bonzini wrote: > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > all things. > > I found that the .keep file is not transmitted over the network (at least I > tried with git+ssh:// and http:// protocols), however. That is a local policy. Nicolas