On 12/5/20 11:22 pm, Joel Sherrill wrote:
On Tue, May 12, 2020 at 4:11 AM Chris Johns <chr...@rtems.org <mailto:chr...@rtems.org>> wrote:

    On 12/5/20 5:15 pm, Sebastian Huber wrote:
     > Hello,
     >
     > On 09/05/2020 03:30, Gedare Bloom wrote:
     >>>>> Without these tests being tagged this way the user would have no
     >>>>> idea where the stand after a build and test run and that
    would mean
     >>>>> we would have to make sure a release has no failures. I consider
     >>>>> that as not practical or realistic.
     >>>> Maybe we need another state, e.g.
    something-is-broken-please-fix-it.
     >>> I do not think so, it is implicit in the failure or the test is
     >>> broken. The only change is to add unexpected-pass, that will be on
     >>> master after the 5 branch.
     >>>
     >> I disagree with this in principle, and it should be reverted
    after we
     >> branch 5. It's fine for now to get the release state sync'd, but we
     >> should find a long-term solution that distinguishes the cases:
     >> 1. we don't expect this test to pass on this bsp
     >> 2. we expect this test to pass, but know it doesn't currently
     >>
     >> They are two very different things, and I don't like conflating them
     >> into one "expected-fail" case
     > originally, I had the same point of view. What I didn't take into
     > account was the perspective of the tester. Now, I think it is
    perfectly
     > fine to flag these tests as expected failure test states. Because
    right
     > now, due to some known bugs such as
    https://devel.rtems.org/ticket/3982
     > and probably also some more issues, these tests fail. On this BSP
    and
     > this RTEMS version, they will always fail. This is not some sort of
     > random failure. When we change test states to expected failure I
    think
     > we should make sure that a ticket exists, which captures that
    there are
     > some test results which indicate issues (expected failure test
    state).
     > The ticket system is the better place to manage this. We should
    not use
     > the test states for this. The test states should be used to
    figure out
     > changes between different test runs. They should enable also to
    quickly
     > check if the outcome of a test run yields the expected results for a
     > certain RTEMS version and BSP.

    Thanks. It is clear to me we lack documentation on this topic and this
    is an oversight on my part which I will attempt to correct.

    I have reviewed Dejagnu and considered other things like the withdrawn
    IEEE 1003.3 standard and there are states we have that need to change
    but I think the original intent is the right path.

    The Dejagnu states are documented here:

    
https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework

    And the exit codes are:

    https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest

    For me they define the goal and intent.

    The test states are metadata for the tester so it can determine the
    result of any given set of tests in relation to the expected state of
    the test when it was built. You need to detach yourself from being a
    developer and put yourself in the position of a tester who's task is to
    give an overall pass or fail for a specific build of RTEMS without
    needing to consider the specifics of any test, bug or feature.

    The primary requirement is to allow machine check of the results to
    determine regressions. A regression is a failure, pass or unresolved
    result that was not expected.

    My current thinking for test states are:

    PASS:
    The test has succeeded and passed without a failure.

    UNEXCEPTED-PASS:
    The test has succeeded when it was expected to fail.

    FAIL:
    The test has not succeeded and has failed when it was expected to pass.
    The failure can be a failed assert, unhandled exception, resource
    constraint, or a faulty test.

    EXCEPTED-FAIL:
    The test has not succeeded and has failed and this is expected.

    UNRESOLVED:
    The test has not completed and the result cannot be determined. The
    result can be unresolved because the test did not start or end, test
    harness failure, insufficient computing resources for the test harness
    to function correctly.

    EXCEPTED-UNRESOLVED:
    The test has not completed and the result cannot be determined and this
    is expected.

    INDETERMINATE:
    The test has succeeded, has failed or in unresolved. The test is an
    edge
    case where the test can pass, can fail, can be unresolved and this is
    expected.

    USER-INPUT:
    The test has not completed and the result is unresolved because it
    requires user intervention that cannot be provided.

    BENCHMARK:
    The test performs a performance type test. These are currently not
    supported.

    UNTESTED:
    The test has not run and is a place holder for a real test that is not
    yet provided.

    UNSUPPORTED:
    The test is not supported for this build of RTEMS, BSP or architecture.

    Note:

    1. Any expected failures, unresolved, or indeterminate test results are
    considered faults and require fixing.

    2. The nature of a failure cannot be inferred from the test's metadata
    state.

    3. The timeout and invalid states will be merged into UNRESOLVED.

    4. The excluded state will be changed to UNSUPPORTED.

    5. The metadata is placed in each test because is it an effective
    way to
    capture the state. Tests can be run as a group, stand alone or at
    different location and the test results can determine a regression. The
    version of the test harness does not need to match the RTEMS build.


Not to be dense but what state do tests which fail but have not been investigated
yet go? GCC just leaves those as FAIL and releases happen with those on
secondary targets.  I know FAILs are undesirable for primary targets with
GCC.

I think the answer is in what you said. They need to be investigated and not left or we cannot make a release. I raised #2962 at the end of March 2017 (over 3 years ago). The test should not have extra states and I think this is important.

I don't want to see a test that fails but we don't know why binned somewhere it will never get investigated.

Then someone needs to investigate them and move them to expected-fail or we cannot release.

    This list of test states account for some missing states. It also adds
    some states I do not see being available until we move to a new build
    system. For UNTESTED and UNSUPPORTED I see a template test being built
    and run and does nothing. This is important because it means we get a
    complete set of test results that are complete and consistent for
    all BSPs.

    I can attend to this change before releasing 5.1 or it can be done on
    master and we can determine if it is back ported to 5.2[34..].


I have previously stated that this is a good goal but it is moving the goal line
for the 5.x releases. I would propose we be happy with the fact we have
reported test results at all for the first time a release happens. Let's not let the perfect be the enemy of the good.

This is not about being perfect, no one is asking for these issues to be fixed. They just need to be investigated, then what ever you decide, a ticket raised, the outcome detailed in the ticket, etc. I feel it is not fair on our users to have to know, investigate or figure out if their BSP is OK after they have built it and run the tests.

In this case, the good is quite
a bit better than previous releases. We need to be more conscious of this

I'm also concerned this task is bigger than you think based solely on the number
of BSPs we have and the number we can execute tests on simulators.

I understand the nature and size of the problem. I suggest you review comment 7 in #2962.

My
build sweep has at least 21 BSPs (hand count) it is testing on simulator and
I didn't count the handful of qemu based ones.

Working through the BSPs is a good thing.

To get an accurate assessment, I think you would have to temporarily
let all tests build for a BSP so you would know which are disabled because
they don't fit.

Huh? No tests are being disabled, the only ones are the excluded tests and I have done nothing with those. All expected-fail tests are run and they are expected to fail.

Then the rest of the tests in the .tcfg which are not ld overflow
issues would have to be examined and categorized.  I don't think the
current list of "don't build" tests fits nicely into one of the new categories.

It is `UNSUPPORTED`. It means the test cannot be supported on that BSP, i.e. not enough memory.

    The change will come with documentation to explain thing a little
    better.

+1

    I hope this addresses the issues we have and I am sorry for creating a
    disturbance so close to a release.


It's a good goal but I think the timing is wrong.


Yes it is important but the timing was defined long ago. I raised the issue (#2962) over 3 years ago and the ticket has been tagged blocker since November 2018. I have been left wedged between these patches or not releasing.

Chris
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to