We do talos testing on in-house machinery (iX machines with 4-core). Not sure if that would trigger some of the issues you are hoping to be caught.
In the future, we should be able to have some jobs run on different EC2 instance types. See https://bugzilla.mozilla.org/show_bug.cgi?id=985650 It will require lots of work but it is possible. cheers, Armen On 14-04-08 03:45 AM, ishikawa wrote: > On (2014年04月08日 15:20), Gabriele Svelto wrote: >> On 07/04/2014 23:13, Dave Hylands wrote: >>> Personally, I think that the more ways we can test for threading issues the >>> better. >>> It seems to me that we should do some amount of testing on single core and >>> multi-core. >>> >>> Then I suppose the question becomes how many cores? 2? 4? 8? >>> >>> Maybe we can cycle through some different number of cores so that we get >>> coverage without duplicating everything? >> >> One configuration that is particularly good at catching threading errors >> (especially narrow races) is constraining the software to run on two >> hardware threads on the same SMT-enabled core. This effectively forces >> the threads to share the L1 D$ which in turn can reveal some otherwise >> very-hard-to-find data synchronization issues. >> >> I don't know if we have that level of control on our testing hardware >> but if we do then that's a scenario we might want to include. >> >> Gabriele > > I run thunderbird under valgrind from time to time. > > Valgrind slows down the CPU execution by a very large factor and > it seems to open many windows for thread races. > (Sometimes a very short window is prolonged enough so that events caused by, > say, > I/O can fall inside this prolonged usually short window.) > > During valgrind execution,I have seen errors that were not reported > anywhere, and many have > happened only once :-( > > If VM (such as VirtualBox, VMplayer or something) can artificially > change the execution time of CPU or even different cores slightly (maybe > 1/2, 1/3, 1/4) > I am sure many thread-race issues will be caught. > > I agree that this is a brute-force approach, but please recall that the > first space shuttle launch needed to be > aborted due to software glitch. It was a timing issue and according to the > analysis of the time, > it could happen once in 72 (or was it 74) cases. > Even NASA with a large pocket of money and its subcontractor could not catch > it before launch. > > I am afraid that the situation has not changed much (unless we use a > computer language well suited to > avoid these thread-race issues.) > We need all the help to track down visible and dormant thread-races. > If artificial CPU execution tweaking (by changing the # of cores or even > more advanced tweaking methods if available) can help, it is worth a try. > Maybe not always if such a work cost extra money, but > a prolonged (say a week) testing from time to time (each quarter or half a > year, or > maybe just prior to testing of beta of major release?). > > > TIA > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform