Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-05 Thread Mike Hommey
On Fri, Jul 05, 2019 at 02:45:16PM +0900, Mike Hommey wrote: > On Fri, Jul 05, 2019 at 01:09:55AM -0400, Jeff King wrote: > > On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > > > Finally, with 1 thread, the picture changes greatly. The overall process > > > takes 2.5h: > > > - 50 seco

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-05 Thread Jakub Narebski
Mike Hommey writes: > On Fri, Jul 05, 2019 at 01:14:13AM -0400, Jeff King wrote: >> On Thu, Jul 04, 2019 at 10:13:20PM +0900, Mike Hommey wrote: [...] >> I think I explained all of the memory-usage questions in my earlier >> response, but just for reference: if you have access to it, valgrind's >

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
On Fri, Jul 05, 2019 at 01:14:13AM -0400, Jeff King wrote: > On Thu, Jul 04, 2019 at 10:13:20PM +0900, Mike Hommey wrote: > > > > "public-inbox-index" (reading from git, writing to Xapian+SQLite) > > > on a dev machine got slow because core count exceeded what SATA > > > could handle and had to ca

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
On Fri, Jul 05, 2019 at 01:09:55AM -0400, Jeff King wrote: > On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > > Finally, with 1 thread, the picture changes greatly. The overall process > > takes 2.5h: > > - 50 seconds enumerating and counting objects. > > - ~2.5h compressing objects.

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Jeff King
On Thu, Jul 04, 2019 at 10:13:20PM +0900, Mike Hommey wrote: > > "public-inbox-index" (reading from git, writing to Xapian+SQLite) > > on a dev machine got slow because core count exceeded what SATA > > could handle and had to cap the default Xapian shard count to 3 > > by default for v2 inboxes.

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Jeff King
On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > With 36 threads, the overall process takes 45 minutes: > - 50 seconds enumerating and counting objects. > - ~22 minutes compressing objects > - ~22 minutes writing objects I noticed the long writing phase when I repacked as well. The

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > My guess is all those stalls are happening when processing the files I > already had problems with in the past[3], except there are more of them > now (thankfully, they were removed, so there won't be more, but that > doesn't make the e

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > Hi, > > I was looking at the disk size of the gecko repository on github[1], > which started at 4.7GB, and `git gc --aggressive`'d it, which made that > into 2.0G. But to achieve that required quite some resources. > > My first attemp

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
On Thu, Jul 04, 2019 at 12:04:11PM +, Eric Wong wrote: > Mike Hommey wrote: > > I'm puzzled by the fact writing objects is so much faster with 1 thread. > > I/O contention in the multi-threaded cases? > > "public-inbox-index" (reading from git, writing to Xapian+SQLite) > on a dev machine go

Re: Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Eric Wong
Mike Hommey wrote: > I'm puzzled by the fact writing objects is so much faster with 1 thread. I/O contention in the multi-threaded cases? "public-inbox-index" (reading from git, writing to Xapian+SQLite) on a dev machine got slow because core count exceeded what SATA could handle and had to cap

Surprising use of memory and time when repacking mozilla's gecko repository

2019-07-04 Thread Mike Hommey
Hi, I was looking at the disk size of the gecko repository on github[1], which started at 4.7GB, and `git gc --aggressive`'d it, which made that into 2.0G. But to achieve that required quite some resources. My first attempt failed with OOM, on an AWS instance with 16 cores and 32GB RAM. I then we