Re: RV-Predict bugs

Yilong Li Tue, 15 Sep 2015 10:00:09 -0700

On Tue, Sep 15, 2015 at 7:19 AM, Mark Thomas <ma...@apache.org> wrote:

> On 13/09/2015 21:59, Yilong Li wrote:
> > Sorry about the vague explanation. But the actual reasons are not the
> point
> > here.
>
> No, that is exactly the point. When you claim that something that
> appears to be working without issue for many, many users then you need
> to provide a justification for why there is an issue to back up the
> claims you are making.
>
> > The only thing matters is the Java memory model. And the article I
> > refer to explain exactly why and how this can happen.
>
> No, it didn't. I read that article (and parts of the JMM). While the
> article did state there was a problem it did not explain why there was a
> problem.
>
> Long experience has lead us to be sceptical of bugs reported by
> automated tools. When looking at such bugs and we can't see what the
> problem is and the person raising the bug can't provide a clear
> explanation - and continues not to provide a clear explanation when
> asked for one several times - then that is the point where we start to
> think the problem is with the tool.
>

Sorry, I should not assume that the concepts such as happens-before order
and memory model are well understood. Let's talk about how this is allowed
to happen under JMM. Consider the this example again:

if (st_200 == null ) {
    st_200 = sm.getString("sc.200");
}
return st_200;

The following is a valid execution trace consists of 5 events:
            T1                       T2
1   READ  null
2   WRITE s
3                                 READ s
4                                 READ ???
5   READ  s

, where s is the result of sm.getString("sc.200").

T1 sees null in field st_200, initializes it, and return the initialized
value, while T2 sees a non-null value in st_200 and skips the
initialization. The question is what value T2 can read and then return in
event 4. This is addressed in JLS $17.4.5 Happens-before Order (
https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5):

We say that a read *r* of a variable *v* is allowed to observe a write *w*
 to *v* if, in the *happens-before* partial order of the execution trace:

   -

   *r* is not ordered before *w* (i.e., it is not the case that *hb(r, w)*),
   and
   -

   there is no intervening write *w*' to *v* (i.e. no write *w*' to *v* such
   that *hb(w, w')* and *hb(w', r)*).

Informally, a read *r* is allowed to see the result of a write *w* if there
is no *happens-before* ordering to prevent that read.

The question boils down to: does the write in event 2 prevent event 4 from
reading the initial null value of st_200? No, because there is no
happens-before order involved here. So what kind of constructs introduce
happens-before order? This question is also addressed in the same section
of JLS:

It follows from the above definitions that:

   -

   An unlock on a monitor *happens-before* every subsequent lock on that
   monitor.
   -

   A write to a volatile field (§8.3.1.4
   <https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.3.1.4>
   ) *happens-before* every subsequent read of that field.
   -

   A call to start() on a thread *happens-before* any actions in the
   started thread.
   -

   All actions in a thread *happen-before* any other thread successfully
   returns from a join() on that thread.
   -

   The default initialization of any object *happens-before* any other
   actions (other than default-writes) of a program.

> I agree that such non-thread-safe lazy initialization rarely causes
> > problem. But, guys,
>
> 'guys' is not a gender neutral word. Do you really mean to suggest that
> only the male committers look at the other issues? I tend to avoid such
> phraseology where I can and use something like 'all' or 'folks' if
> necessary.
>

I apologize but 'guys' can be gender neutral which is what I meant.

> > please take a look at other issues I reported. There
> > are much more interesting concurrency bugs.
>
> Yes, there are a few more interesting / likely to have an impact bugs.
> The tribes issues I've fixed today look like they were reasonably likely
> to cause issues.
>
> > I am reporting these minor
> > issues only because they are clearly bugs.
>
> They are only clearly bugs once a clear explanation of the concurrency
> issue that underlies many of them has been explained clearly and
> understood.
>
> Opening a large number of bugs (even valid ones) without a clear
> justification is not the way to win friends and influence people. It
> would be far better to open a single, well explained bug, get that
> accepted and fixed and then tell the community that the tool has
> identified more bugs and ask the community how they would like to handle
> them.
>

Sorry, I wasn't aware that the community might not like that. However, I
would argue that many of the bug reports are clear enough for a Tomcat
developer to pinpoint the problem relatively easily. It has the stack
traces of the two conflicting accesses, the lock acquired by each thread,
and the locations where the two threads are created. I file the report in a
format that the tool generates doesn't mean I just blindly copy and paste
whatever the tool outputs. I can assure you that I have done my homework to
examine the source code at my best from an outsider's perspective. But
anyway, the point is very well taken.

> I'm still not completely happy that I understand why this is a bug. The
> article Chunk provided a link to is a good start and if I make a few
> assumptions about what I think the JMM means in a few places then it
> does make sense. However, I'd still like to see a step-by-step guide to
> how this goes wrong with references to the JMM to explain why each step
> is legal.
>
> > Mark, if you have a specific concurrency issue you would like to
> > investigate, the best way is to run RV-Predict against the code that once
> > revealed the problem.
>
> I don't have any particular issues in mind, just a suspicion that
> concurrency issues lie behind some of the hard to reproduce bugs that
> get reported from time to time. If I could produce the bug I would have
> fixed it already.
>

Technically speaking, RV-Predict doesn't require a developer to be able to
produce the bug or data race in order to detect it. One can run RV-Predict
on code that fails only intermittently and RV-Predict may still catch the
bug. Perhaps it can be helpful to ask the bug reporter to run their code
against RV-Predict if it's just too hard to extract a test case that can
reproduce the failure consistently enough.

It will be interesting to see - once all these bugs have been fixed -
> whether the stability of the CI test runs improves. There have been
> issues around some of the async tests for example that might have a root
> cause in one of these bugs.
>
> > Even if the failure occurs rarely, RV-Predict may be
> > able to identify the cause for you. I get all these race reports by
> running
> > RV-Predict against the test suite so if the small unit test doesn't
> > exercise the problematic code enough you may not get the specific answer
> > you are looking for.
>
> Once we have a test case and if concurrency issues are suspected then
> I'd certainly try RV-Predict before trying to debug it manually.
>

Thanks again for the prompt responses and the quick fixes of the reported
bugs. This has also helped RV-Predict to get into a better shape.

Cheers,
Yilong

>
> Mark
>
> >
> > Yilong
> >
> > On Sun, Sep 13, 2015 at 1:30 PM, Mark Thomas <ma...@apache.org> wrote:
> >
> >> On 13/09/2015 19:02, Rémy Maucherat wrote:
> >>> 2015-09-13 18:45 GMT+02:00 Mark Thomas <ma...@apache.org>:
> >>>
> >>>> Yilong,
> >>>>
> >>>> You need to be subscribed to the dev list in order to post to it.
> >>>>
> >>>> On 13/09/2015 14:08, Yilong Li wrote:
> >>>>> Hi Mark,
> >>>>>
> >>>>> On Sun, Sep 13, 2015 at 4:08 AM, Mark Thomas <ma...@apache.org
> >>>>> <mailto:ma...@apache.org>> wrote:
> >>>>>
> >>>>>     Please do not add any more RV-Predict bug reports to Bugzilla
> until
> >>>> the
> >>>>>     current set have been evaluated. To be honest it would have been
> >>>> better
> >>>>>     if you had paused after adding 58367-58379.
> >>>>>
> >>>>>
> >>>>> I am going to stop here because these are all the bugs reported by
> >>>>> RV-Predict in one run of the entire test suite.
> >>>>
> >>>> OK. Lets resolve these and then see where we stand.
> >>>>
> >>>>
> >>> As far as I am concerned, I would prefer batch closing these "issues"
> >>> nobody ever had.
> >>
> >> That is very tempting.
> >>
> >> Looking at the articles by the folks that are the experts in the field
> >> (i.e. the authors of JSR-133) it looks like the issues raised are valid
> >> - even if we don't see then very often / at all. Despite that, I'd be a
> >> lot happier with a clearer explanation of what can go wrong and why/how.
> >>
> >> On the plus side, the issues I've looked at so far have provided
> >> opportunities for clean-up in trunk.
> >>
> >> I'm hoping (but haven't look at all of them yet) that at least one of
> >> these reports is going to identify the root cause of one of those hard
> >> to reproduce / hard to track down concurrency issues.
> >>
> >> Mark
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> >> For additional commands, e-mail: dev-h...@tomcat.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
>
>

Re: RV-Predict bugs

Reply via email to