Re: Enabling CCache on LLVM bots

David Spickett Fri, 22 Jul 2022 09:19:48 -0700

> Right: but you're using flang and mlir HEAD as a test suite for clang as I 
> understand it, I don't think that MLIR/flang developers should get their 
> patches flagged here (which is what buildbots are doing).
> We (google) are doing similar tests, but in a silent infrastructure and 
> investigate issues that we report asynchronously:


Point taken, and moving to silent is a lot simpler way to achieve that
than filtering changes.

I'll discuss this with the team next week, thanks again for the feedback!

On Fri, 22 Jul 2022 at 16:21, Mehdi AMINI <joker....@gmail.com> wrote:
>
>
>
> On Fri, Jul 22, 2022 at 3:39 PM David Spickett <david.spick...@linaro.org> 
> wrote:
>>
>> > OK! But how is testing flang in stage2 helpful for this? This is just 
>> > testing that clang from stage-1 is properly functioning right? The 
>> > test-suite is there for this I think?
>>
>> Yes, it is checking that the clang in stage 1 generates correct SVE
>> code. So that when that clang becomes the next release, flang works
>> out of the box.
>>
>> The test suite (meaning https://github.com/llvm/llvm-test-suite) does
>> include some fortran programs but they are run in a mode where flang
>> is used to "parse unparse" basically go source to AST to source. Then
>> compiled with gofrtran. So you have some coverage there but not a lot.
>>
>> > That is: no change to MLIR or Flang should affect the behavior of 
>> > clang-stage-1? (I'm trying to figure out if these commits should trigger 
>> > testing)
>>
>> If SVE codegen were perfect then I'd agree. Given that it isn't I
>> wouldn't want to miss:
>> * some change to flang/mlir comes in
>> * this change hits some particular codegen situation
>> * clang built in stage 1 emits incorrect code
>
>
> Right: but you're using flang and mlir HEAD as a test suite for clang as I 
> understand it, I don't think that MLIR/flang developers should get their 
> patches flagged here (which is what buildbots are doing).
> We (google) are doing similar tests, but in a silent infrastructure and 
> investigate issues that we report asynchronously: that is the difference is 
> that the community does not bear the weight of the bot maintenance.
> This is exacerbated by the fact that this bot runs too slowly and batches too 
> many commits together to provide a good signal, you likely should look into 
> making it silent (that is only you gets notified automatically).
>
>
>> Similarly if there is a change to clang then you compile the same
>> flang but get some of it wrong, hopefully the tests pick this up. We
>> want to check the runtime behaviour of the code stage 1 clang
>> generates.
>>
>> Granted it can be frustrating to be on a blame list just because your
>> perfectly valid code hit a pre-existing bug. It's something we try to
>> mitigate by proactively telling people "hey we see your commit broke
>> this but don't worry we know it's something else". We could reason
>> that if a flang/mlir change made it through our other 2 stage bots or
>> the 1 stage SVE bots, it cannot be the root of the issue. If that
>> makes sense.
>>
>> We'd still want to trigger a new 2 stage SVE build on that change
>> though. So it would need buildbot to be able to take the blame list
>> and filter out flang/mlir changes (and if stage 1 of the 2 stage
>> broke, the stage 1 bot already reported that so it's fine to not
>> notify again). I'm not sure if it can handle that.
>>
>> > By the way, your method of having ccache enabled globally mean it is 
>> > enabled implicitly during stage-2 as well? This won't have cache hit 
>> > (stage-2...) but it'll take cache space unfortunately (and there is a 
>> > slight overhead to going through the cache all the time).
>>
>> Stage 2 bypasses the cache. Stage 1 uses /usr/bin/cc which invokes
>> ccache then stage 2 has the compilers set directly to the clang
>> executable built in stage 1.
>
>
> Ah of course, it all WAI :)
>
>>
>>
>> On Fri, 22 Jul 2022 at 12:28, Mehdi AMINI <joker....@gmail.com> wrote:
>> >
>> >
>> >
>> > On Fri, Jul 22, 2022 at 1:18 PM David Spickett <david.spick...@linaro.org> 
>> > wrote:
>> >>
>> >> > I didn't know this builder would test flang as well: which stage of the 
>> >> > build is doing so?
>> >>
>> >> It tests it only for stage 2, and the 1 stage bot checks stage 1. So
>> >> we do have some of that focus you talked about between
>> >> clang-aarch64-sve-vla (which is the 1 stage) and
>> >> clang-aarch64-sve-vla-2stage. We also have plain AArch64 bots checking
>> >> flang, but the point here is to exercise SVE codegen.
>> >
>> >
>> > OK! But how is testing flang in stage2 helpful for this? This is just 
>> > testing that clang from stage-1 is properly functioning right? The 
>> > test-suite is there for this I think?
>> >
>> > That is: no change to MLIR or Flang should affect the behavior of 
>> > clang-stage-1? (I'm trying to figure out if these commits should trigger 
>> > testing)
>> >
>> >
>> >>
>> >> You are right that the 2 stage bot could skip building flang in stage
>> >> 1 because it's not going to test it. In theory our ccaching means this
>> >> isn't a big deal (it does now, thanks again!) but we could explicitly
>> >> disable it and save a lot of linking at least.
>> >>
>> >> We will do our best to improve the build times but ultimately we are
>> >> limited by hardware availability which is a more difficult problem to
>> >> fix.
>> >
>> >
>> > Right, and stage-2 can't use any ccache anyway: this is tricky...
>> >
>> > By the way, your method of having ccache enabled globally mean it is 
>> > enabled implicitly during stage-2 as well? This won't have cache hit 
>> > (stage-2...) but it'll take cache space unfortunately (and there is a 
>> > slight overhead to going through the cache all the time).
>> >
>> >
>> >>
>> >>
>> >> On Fri, 22 Jul 2022 at 10:16, Mehdi AMINI <joker....@gmail.com> wrote:
>> >> >
>> >> > On Fri, Jul 22, 2022 at 5:46 AM Thiago Jung Bauermann <
>> >> > thiago.bauerm...@linaro.org> wrote:
>> >> >
>> >> > >
>> >> > > Hello Mehdi,
>> >> > >
>> >> > > Mehdi AMINI <joker....@gmail.com> writes:
>> >> > >
>> >> > > > I don't know if you are maintaining also 
>> >> > > > clang-aarch64-sve-vla-2stage
>> >> > > > but it takes far too long right now (>10 hours).
>> >> > >
>> >> > > Yes, I actually increased its ccache size on July 6th but only from 5 
>> >> > > GB
>> >> > > to 20 GB because the machine running that builder is also a 
>> >> > > development
>> >> > > box. Looking at the build times graph it looks like it helped a bit 
>> >> > > but
>> >> > > not a lot.
>> >> > >
>> >> > > I now increased the size again to 40 GB. I'll monitor to see if 
>> >> > > there's
>> >> > > an impact.
>> >> > >
>> >> > > > It notifies for large number of people which is overly noisy right 
>> >> > > > now.
>> >> > > > See: https://lab.llvm.org/buildbot/#/builders/198/builds/1234
>> >> > >
>> >> > > Does adding a “depends_on_projects” argument help reduce the number of
>> >> > > notified people? Or is there something else that could/should be done
>> >> > > about that?
>> >> > >
>> >> >
>> >> > It will at least eliminate people who commit in unrelated projects.
>> >> >
>> >> >
>> >> > >
>> >> > > > One thing I noticed is that it seems to be missing the
>> >> > > > `depends_on_projects` for the buildbot configuration: that means 
>> >> > > > it'll
>> >> > > > include commits that touches part of the codebase totally unrelated 
>> >> > > > to
>> >> > > > what it is testing (like flang and mlir). That would be a first easy
>> >> > > > step to reduce the number of unrelated changes that get flagged
>> >> > > > incorrectly.
>> >> > >
>> >> > > Thank you for the suggestion, I see that many builders running on 
>> >> > > Linaro
>> >> > > workers don't have it. I'll prepare a patch to add 
>> >> > > “depends_on_projects”
>> >> > > arguments to them.
>> >> > >
>> >> > > Though specifically for clang-aarch64-sve-vla-2stage IIUC its value
>> >> > > would be ["llvm", "mlir", "clang", "flang"] (since it's meant to test
>> >> > > flang as well), so perhaps it wouldn't change much in practice?
>> >> > >
>> >> >
>> >> > I didn't know this builder would test flang as well: which stage of the
>> >> > build is doing so?
>> >> > In general I try to keep builder configs more focused to avoid that bug 
>> >> > in
>> >> > one components hides regression in another one (for example MLIR 
>> >> > breaking
>> >> > stage1 for this bot for a day and in the meantime you get a stage-2
>> >> > regression that won't be detected)
>> >> >
>> >> > In any case, this will still help people contributing outside this list
>> >> > (libc++, lldb, lld, compiler-rt), unless you also need these.
>> >> >
>> >> > Cheers,
>> >> >
>> >> > --
>> >> > Mehdi
>> >> > _______________________________________________
>> >> > linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org
>> >> > To unsubscribe send an email to linaro-toolchain-le...@lists.linaro.org
_______________________________________________
linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org
To unsubscribe send an email to linaro-toolchain-le...@lists.linaro.org

Re: Enabling CCache on LLVM bots

Reply via email to