On Tue, Sep 15, 2015 at 7:19 AM, Mark Thomas <ma...@apache.org> wrote:
> On 13/09/2015 21:59, Yilong Li wrote: > > Sorry about the vague explanation. But the actual reasons are not the > point > > here. > > No, that is exactly the point. When you claim that something that > appears to be working without issue for many, many users then you need > to provide a justification for why there is an issue to back up the > claims you are making. > > > The only thing matters is the Java memory model. And the article I > > refer to explain exactly why and how this can happen. > > No, it didn't. I read that article (and parts of the JMM). While the > article did state there was a problem it did not explain why there was a > problem. > > Long experience has lead us to be sceptical of bugs reported by > automated tools. When looking at such bugs and we can't see what the > problem is and the person raising the bug can't provide a clear > explanation - and continues not to provide a clear explanation when > asked for one several times - then that is the point where we start to > think the problem is with the tool. > Sorry, I should not assume that the concepts such as happens-before order and memory model are well understood. Let's talk about how this is allowed to happen under JMM. Consider the this example again: if (st_200 == null ) { st_200 = sm.getString("sc.200"); } return st_200; The following is a valid execution trace consists of 5 events: T1 T2 1 READ null 2 WRITE s 3 READ s 4 READ ??? 5 READ s , where s is the result of sm.getString("sc.200"). T1 sees null in field st_200, initializes it, and return the initialized value, while T2 sees a non-null value in st_200 and skips the initialization. The question is what value T2 can read and then return in event 4. This is addressed in JLS $17.4.5 Happens-before Order ( https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5): We say that a read *r* of a variable *v* is allowed to observe a write *w* to *v* if, in the *happens-before* partial order of the execution trace: - *r* is not ordered before *w* (i.e., it is not the case that *hb(r, w)*), and - there is no intervening write *w*' to *v* (i.e. no write *w*' to *v* such that *hb(w, w')* and *hb(w', r)*). Informally, a read *r* is allowed to see the result of a write *w* if there is no *happens-before* ordering to prevent that read. The question boils down to: does the write in event 2 prevent event 4 from reading the initial null value of st_200? No, because there is no happens-before order involved here. So what kind of constructs introduce happens-before order? This question is also addressed in the same section of JLS: It follows from the above definitions that: - An unlock on a monitor *happens-before* every subsequent lock on that monitor. - A write to a volatile field (§8.3.1.4 <https://docs.oracle.com/javase/specs/jls/se8/html/jls-8.html#jls-8.3.1.4> ) *happens-before* every subsequent read of that field. - A call to start() on a thread *happens-before* any actions in the started thread. - All actions in a thread *happen-before* any other thread successfully returns from a join() on that thread. - The default initialization of any object *happens-before* any other actions (other than default-writes) of a program. > I agree that such non-thread-safe lazy initialization rarely causes > > problem. But, guys, > > 'guys' is not a gender neutral word. Do you really mean to suggest that > only the male committers look at the other issues? I tend to avoid such > phraseology where I can and use something like 'all' or 'folks' if > necessary. > I apologize but 'guys' can be gender neutral which is what I meant. > > please take a look at other issues I reported. There > > are much more interesting concurrency bugs. > > Yes, there are a few more interesting / likely to have an impact bugs. > The tribes issues I've fixed today look like they were reasonably likely > to cause issues. > > > I am reporting these minor > > issues only because they are clearly bugs. > > They are only clearly bugs once a clear explanation of the concurrency > issue that underlies many of them has been explained clearly and > understood. > > Opening a large number of bugs (even valid ones) without a clear > justification is not the way to win friends and influence people. It > would be far better to open a single, well explained bug, get that > accepted and fixed and then tell the community that the tool has > identified more bugs and ask the community how they would like to handle > them. > Sorry, I wasn't aware that the community might not like that. However, I would argue that many of the bug reports are clear enough for a Tomcat developer to pinpoint the problem relatively easily. It has the stack traces of the two conflicting accesses, the lock acquired by each thread, and the locations where the two threads are created. I file the report in a format that the tool generates doesn't mean I just blindly copy and paste whatever the tool outputs. I can assure you that I have done my homework to examine the source code at my best from an outsider's perspective. But anyway, the point is very well taken. > I'm still not completely happy that I understand why this is a bug. The > article Chunk provided a link to is a good start and if I make a few > assumptions about what I think the JMM means in a few places then it > does make sense. However, I'd still like to see a step-by-step guide to > how this goes wrong with references to the JMM to explain why each step > is legal. > > > Mark, if you have a specific concurrency issue you would like to > > investigate, the best way is to run RV-Predict against the code that once > > revealed the problem. > > I don't have any particular issues in mind, just a suspicion that > concurrency issues lie behind some of the hard to reproduce bugs that > get reported from time to time. If I could produce the bug I would have > fixed it already. > Technically speaking, RV-Predict doesn't require a developer to be able to produce the bug or data race in order to detect it. One can run RV-Predict on code that fails only intermittently and RV-Predict may still catch the bug. Perhaps it can be helpful to ask the bug reporter to run their code against RV-Predict if it's just too hard to extract a test case that can reproduce the failure consistently enough. It will be interesting to see - once all these bugs have been fixed - > whether the stability of the CI test runs improves. There have been > issues around some of the async tests for example that might have a root > cause in one of these bugs. > > > Even if the failure occurs rarely, RV-Predict may be > > able to identify the cause for you. I get all these race reports by > running > > RV-Predict against the test suite so if the small unit test doesn't > > exercise the problematic code enough you may not get the specific answer > > you are looking for. > > Once we have a test case and if concurrency issues are suspected then > I'd certainly try RV-Predict before trying to debug it manually. > Thanks again for the prompt responses and the quick fixes of the reported bugs. This has also helped RV-Predict to get into a better shape. Cheers, Yilong > > Mark > > > > > Yilong > > > > On Sun, Sep 13, 2015 at 1:30 PM, Mark Thomas <ma...@apache.org> wrote: > > > >> On 13/09/2015 19:02, Rémy Maucherat wrote: > >>> 2015-09-13 18:45 GMT+02:00 Mark Thomas <ma...@apache.org>: > >>> > >>>> Yilong, > >>>> > >>>> You need to be subscribed to the dev list in order to post to it. > >>>> > >>>> On 13/09/2015 14:08, Yilong Li wrote: > >>>>> Hi Mark, > >>>>> > >>>>> On Sun, Sep 13, 2015 at 4:08 AM, Mark Thomas <ma...@apache.org > >>>>> <mailto:ma...@apache.org>> wrote: > >>>>> > >>>>> Please do not add any more RV-Predict bug reports to Bugzilla > until > >>>> the > >>>>> current set have been evaluated. To be honest it would have been > >>>> better > >>>>> if you had paused after adding 58367-58379. > >>>>> > >>>>> > >>>>> I am going to stop here because these are all the bugs reported by > >>>>> RV-Predict in one run of the entire test suite. > >>>> > >>>> OK. Lets resolve these and then see where we stand. > >>>> > >>>> > >>> As far as I am concerned, I would prefer batch closing these "issues" > >>> nobody ever had. > >> > >> That is very tempting. > >> > >> Looking at the articles by the folks that are the experts in the field > >> (i.e. the authors of JSR-133) it looks like the issues raised are valid > >> - even if we don't see then very often / at all. Despite that, I'd be a > >> lot happier with a clearer explanation of what can go wrong and why/how. > >> > >> On the plus side, the issues I've looked at so far have provided > >> opportunities for clean-up in trunk. > >> > >> I'm hoping (but haven't look at all of them yet) that at least one of > >> these reports is going to identify the root cause of one of those hard > >> to reproduce / hard to track down concurrency issues. > >> > >> Mark > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > >> For additional commands, e-mail: dev-h...@tomcat.apache.org > >> > >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >