The GH CI build also fails randomly compared to the green builds that I push in
the first place. Occasionally, this might be a platform-specific issue (which
is what I’d ideally be discovering from CI), but the failing tests are
generally fragile ones in the first place. While I have ideas on how to make
those areas of the codebase more testable, some of them can’t be more readily
tested without drastic measures. Specifically, our tests around rolling file
appender things and anything else relying on scheduled actions.
Besides using an actual file system in the tests (these could run faster if
using a virtual file system instead like [jimfs][0]), the largest limitation
here is that we rely on a ScheduledThreadPoolExecutor for scheduling actions
like rollovers, and this class relies directly on System::nanoTime for
scheduling decisions, so the only way to mock the clock here is to either run
the test in a modified runtime where the system calls behind nanoTime are
mocked or to find a similar OSS implementation (or adapt from the JDK here if
the classes are public domain like the Doug Lea ones usually are) to use
instead which allows for controlling the clock. The latter option is something
I considered before, but it’s not something to be taken lightly since we
already expect the platform version of ScheduledThreadPoolExecutor to be stable
and correct (or hopefully correct; this implementation is more likely to be
thoroughly reviewed compared to something custom). In fact, this alternative
path is basically porting or reimplementing the core of Quartz, so there is
some reference code to adapt there come to think of it.
On the file system side, this would require either more direct adoption of the
NIO2 API rather than the original IO API, or we need a similar abstraction to
make mocking a file system easier to accomplish. Then we can have separate
integration tests that use a real filesystem that could be more predictable if
necessary.
Suffice to say, there’s usually a reason why I get into refactoring in this
project, and most of the time, it involves making it easier to test and extend
things both in the project and as a downstream user. The rolling appender area
is something I’d like to get to sometime, but we’ve got about ten years worth
of development to untangle (which we’ve largely accomplished in the main branch
already with the modules there, but we can still break out more modules that
involve more complicated refactoring).
[0]: https://github.com/google/jimfs
—
Matt Sicker
> On Nov 20, 2023, at 08:40, Gary Gregory wrote:
>
> The GH CI builds on every push (as opposed to commi) IIRC.
>
> Gary
>
> On Mon, Nov 20, 2023, 9:34 AM Ralph Goers
> wrote:
>
>> Gary uses Windows as his development OS. He is probably the only one of us
>> who does. So he inevitably finds these issues before the rest of us.
>>
>> I don’t know if the CI has a build that runs on Windows for every commit.
>>
>> Ralph
>>
>>> On Nov 20, 2023, at 6:12 AM, Robert Middleton
>> wrote:
>>>
>>> Are the tests run on Windows through Github workflows? It doesn't
>>> look like it to me.
>>>
>>> If you need access to a Windows machine, you can download a
>>> development VM straight from Microsoft:
>>>
>> https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/
>>>
>>> -Robert Middleton
>>>
>>> On Mon, Nov 20, 2023 at 7:40 AM Apache
>> wrote:
In my experience they never get fixed. To be honest, when I was doing
>> the releases I would have these failures investigated to determine if it
>> was a trait problem vs a problem in the code being released. If it was the
>> latter I would cancel the vote. The only time tests should be disabled is
>> if we know it is a problem in the test but can’t figure out how to fix it.
I also don’t ever recall Gary ever having more than one or two tests
>> fail in a run.
Ralph
> On Nov 20, 2023, at 5:00 AM, Volkan Yazıcı wrote:
>
> I am not asking to disable Windows tests. I am asking to disable tests
> and only those tests that have a failure rate on Windows higher than,
> say, 30%. To be precise, I think there are 2-3 of them dealing with
> network sockets and rolling file appenders. I am not talking about
> dozens or such.
>
> After disabling them, we can create a ticket referencing them. So that
> interested parties can fix them.
>
>> On Mon, Nov 20, 2023 at 12:25 PM Piotr P. Karwasz
>> wrote:
>>
>> Hi Volkan,
>>
>>> On Mon, 20 Nov 2023 at 09:36, Volkan Yazıcı wrote:
>>>
>>> As Gary (the only Windows user among the active Log4j maintainers,
>>> AFAIK) has noticed several times, Log4j tests on Windows are pretty
>>> unstable. It not only fails on Gary's laptop, but Piotr and I need to
>>> give Windows tests in CI a kick on a regular basis. Approximately one
>>> out of three CI runs fails on Windows. Piotr