[ 
https://issues.apache.org/jira/browse/LUCENE-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977691#comment-16977691
 ] 

Chris M. Hostetter commented on LUCENE-9054:
--------------------------------------------

Background...

Yesterday, something caught my eye that made me question the jenkins reports 
i've been generating.

When skimming jenkins build failure emails, i happened to remember seeing a 
"Lucene-Solr-repro" email that mentioned some failures particularly in 
SpellCheckCollatorTest...

[https://builds.apache.org/job/Lucene-Solr-repro/3760/]
{noformat}
[repro] Failures:
[repro]   0/5 failed: org.apache.solr.cloud.CollectionsAPISolrJTest
[repro]   5/5 failed: org.apache.solr.cloud.MoveReplicaHDFSTest
[repro]   5/5 failed: org.apache.solr.spelling.SpellCheckCollatorTest
{noformat}
...this caught my eye, because while i was expecting the MoveReplicaHDFSTest 
failures (and had already AwaitsFixed that test in another jira) I didn't 
remember seeing any recent SpellCheckCollatorTest in my own aggregated jenkins 
reports recently: [http://fucit.org/solr-jenkins-reports/failure-report.html]

I thought maybe the seed being reproduced was more then a week ago (our builds, 
particularly the repro builds, can get fairly behind) and the results of _this_ 
(repro) build may not have been picked up by my aggregation crons yet.

But today, my reports still didn't list these failures. After investigating I 
realized the problem isn't in how my reports are fetching & aggregating the 
data from our jenkins jobs, but in how the {{reproduceJenkinsFailures.py}} 
script works in conjunction with the (default) way jenkins jobs collect the 
test-report XML files for each test....
----
{{reproduceJenkinsFailures.py}} will re-try to call the {{runTests}} function 
multiple times: (1) as originally run by the build being reproduced; (2) at the 
tip of the current branch; (3) at the tip of the branch w/o the original seed.

The problem is that each time the {{runTests}} function is called, junit 
outputs the results to the same 
{{./build/__MODULE__/test/TEST-__FQN_TEST_NAME__-__DUPS.xml}} file (where 
"DUPS" corresponds to the {{tests.dups=N}} test param, example...
{noformat}
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-4.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-5.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-3.xml
./build/solr-core/test/TEST-org.apache.solr.spelling.SpellCheckCollatorTest-2.xml
{noformat}
These 5 files will be (over)written a total of 3 times.

So once the {{reproduceJenkinsFailures.py}} script is completly done, the only 
test results included in the jenkins results, and the the only contributor to 
the "success/failure" of the jenkins job, is how the tests behaved on the tip 
of the branch, w/o the problematic seed.

The results from trying to reproduce the exact seed at the exact SHA, and 
trying to reproduce the exact seed on the tip of the branch are overwritten.
----
I think we should modify either the {{runTest}} or {{printReport}} functions in 
{{reproduceJenkinsFailures.py}} to _move_ all of the {{TEST-*.xml}} files 
produced by each run into a subdir (perhaps named after the style of 
reproduction tested: {{repro_raw}} , {{repro_branch_tip}} , 
{{repro_branch_tip_no_seed}} ) before continuing on to retry the test – and 
then ensure that the jenkin's jobs test reporter plugin is correctly configured 
to search for those junit output files in all subdirs (pretty sure it already 
is just because of how se use a build dir per module)

> reproduceJenkinsFailures.py usage in the Lucene-Solr-repro jenkins job under 
> reports number of failures
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9054
>             Project: Lucene - Core
>          Issue Type: Test
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> Our {{reproduceJenkinsFailures.py}} script as used by the 
> [https://builds.apache.org/job/Lucene-Solr-repro/] runs the tests multiple 
> times, overwriting the same junit {{TEST-*.xml}} test result files each time, 
> causing the jenkins job to under report how many times the various test(s) 
> fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to