Jason Wong wrote on Mon, Feb 27, 2012 at 07:36:39 -0800:
> On Thu, Feb 16, 2012 at 12:14 PM, Daniel Shahaf <danie...@elego.de> wrote:
> 
> >
> > The output from these two tells me two things:
> >
> > 1. The minfo-cnt value is reasonable (within a typical ballpark).
> > That's relevant since minfo-cnt abnormalities were seen in another
> > instance of the bug.
> >
> > 2. Everything else looks correct: the 'id:'/'pred:' headers are accurate,
> > and the 'count:' header was incremented correctly.  The 'count:' header
> > does, however, indicate that your repository has _in the past_ triggered
> > an instance of the bug.
> 
> This is true. We have seen the bug happen before. The first occurence
> of this that we had seen was Dec. 7th, 2011, a few days after we went
> from 1.6.16 to 1.7.1. That was the first time we had seen that happen.
> At the time, we did not know about the cause and the developer who
> had encountered the error didn't report it and was able to work

Well, install fail2ban and have it mail you when that string appears
in the logs?  I'll do so too...

> around it. From the Apache logs we have:
> 
>       [Wed Dec 07 15:16:36 2011] [error] [client 10.2.3.1] predecessor
>               count for the root node-revision is wrong: found 59444,
>               committing r59478  [409, #160004]
>       [Wed Dec 07 15:33:47 2011] [error] [client 10.2.3.2] predecessor
>               count for the root node-revision is wrong: found 59482,
>               committing r59516  [409, #160004]
>       [Wed Dec 07 15:35:19 2011] [error] [client 10.2.3.3] predecessor
>               count for the root node-revision is wrong: found 59488,
>               committing r59522  [409, #160004]

As Stefan mentioned, these represent commit attempts that were rejected
in order to prevent a new instance of the bug from entering the history.

>       [Wed Dec 07 15:44:10 2011] [error] [client 10.2.3.4] predecessor
>               count for the root node-revision is wrong: found 59505,
>               committing r59539  [409, #160004]
> 
> Of the ips above, the last line is from the build machine. The others
> were from developer workstations. I mentioned the most recent two
> times first as we were actually aware of the issue at that time and
> it was recent so we knew to start looking into it. Between Dec. 7 and
> Jan. 31, the bug has occurred 12 times, 3 of those times from the
> build server. The rest are from workstations. This month, it has only
> occurred once and it was from the build server.
> 

What percentage of your commits are from the build server?

Is there anything noteworthy about commits that were in progress around
the time the bug occurred?  (their svn:date's would be near the time
stamp in the httpd log)

> Each of these times, the error has occurred in different parts of
> the repository.
> 
> Replies above. Sorry about the delay in replying. I have been really
> busy of late. I will try and get the results this week, if not, it
> will most likely be next week.
> 

No problem.

Reply via email to