gortiz opened a new issue, #11382:
URL: https://github.com/apache/pinot/issues/11382

   # The issue
   
   There are some maven module (like 
[pinot-query-runtime](https://app.codecov.io/gh/apache/pinot/tree/master/pinot-query-runtime))
 that look uncovered in Codecov while having tests (in fact 
`pinot-query-runtime` actual coverage is around 90%).
   
   # The reason
   
   The reason is that we are using jacoco goal `report-aggregate` to create the 
coverage xml files, which does not count the lines covered by tests in the 
current maven module but only lines covered by tests in their dependencies.
   
   # The actual problem
   ## How Jacoco works
   Jacoco actually generates a `jacoco.exec` file. After running surefire tests 
with the jacoco agent, a `jacoco.exec` file is created on each project 
(specifically in `target/jacoco.exec`). 
   
   Given that `jacoco.exec` is a binary file, the goal `report` is offered. 
This goal reads the `jacoco.exec` file and create reports in different formats 
(in Pinot we use the default value, which generates html, csv and xml reports).
   
   But `report` does only read the `jacoco.exec` file stored in 
`target/jacoco.exec` of the current project. Therefore the lines covered by 
other maven modules are not counted. In complex projects with several maven 
modules (like Pinot) it is normal to have tests in module A actually testing 
code in module B. That is specially useful when having a module with the 
integration tests (like we do in Pinot).
   
   In order to support that pattern, jacoco provides a `report-aggregate` goal. 
This goal read all the `jacoco.exec` files of all maven dependencies and merge 
their coverage. But this task *does not* reads the `jacoco.exec` file of the 
current project. For example, if we have the following projects:
   
   ```mermaid
   stateDiagram-v2
       B --> A : depends on
       C --> A : depends on
       C --> B : depends on
   ```
   
   And run `report` or `report-aggregate`:
   
   
   |    | report             | report-aggregate         |
   |----|--------------------|--------------------------|
   | A  | lines covered by A | is empty                 |
   | B  | lines covered by B | lines covered by A       |
   | C  | lines covered by C | lines covered by A and B |
   
   Therefore, if we run `mvn test` with the current configuration (that uses 
`report-aggregate`), the coverage of projects nobody depends on is not included 
in the `jacoco.xml` file.
   
   ## How Codecov works
   
   Codecov Github action is pretty simple. It looks for all the files called 
`jacoco.xml` in the filesystem and merges them.
   
   ## How we run tests in GitHub Actions
   
   In order to run tests faster in GA, we actually split execute four parallel 
test jobs: two to test unit tests and two to test integration tests. This is 
defined in 
[.github/workflows/scripts/.pinot_test.sh](https://github.com/apache/pinot/blob/575398dae0841ba152c31306cb460cfdc776b04b/.github/workflows/scripts/.pinot_test.sh).
 In the case of unit tests, what we do is to split the maven modules in two 
disjoint groups. Given that what we want to achieve by splitting them is to get 
a faster test execution, the criteria we use to decide on which group a project 
should be is simply the time it takes to execute. But by splitting projects we 
are creating more projects _nobody depends on_.
   
   For example, project `pinot-query-runtime` is in the first partition. But 
all the projects that depend on `pinot-query-runtime` (like `pinot-broker` or 
`pinot-server`) are in the second partition. That means when we execute tests 
on the first partition, nobody depends on `pinot-query-runtime` and therefore 
there is no `jacoco.xml` that contains its coverage.
   
   # Possible solutions
   ## To do not parallelize tests
   I've tried that in https://github.com/apache/pinot/pull/11375. Coverage 
seems to improve (and `pinot-query-runtime` looks covered in [codecov 
report](https://app.codecov.io/gh/apache/pinot/pull/11375/tree/pinot-query-runtime))
 but time spend on test increased from 50-60mins to 1h26mins.
   
   ## Use `report` instead of `report-aggregate`
   I've tried that in https://github.com/apache/pinot/pull/11376. Coverage 
seems to be worse (I don't exactly know why) but `pinot-query-runtime` looks 
covered in [codecov 
report](https://app.codecov.io/gh/apache/pinot/pull/11376/tree/pinot-query-runtime).
   
   ## Use `report-aggregate` with the new flag includeCurrentProject
   
   I've tried that in https://github.com/apache/pinot/pull/11381. At the time 
I'm writing these lines, tests are being executed.
   
   This new flag 
([documentation](https://www.eclemma.org/jacoco/trunk/doc/report-aggregate-mojo.html#includeCurrentProject))
 is used to indicate `report-aggregate` to also include the `jacoco.exec` file 
in the same project.
   
   ## Run tests in parallel on a single reactor
   By default testng  (and junit) run tests using a single thread. Given that 
most tests are single thread and should be independent, that is a waste of 
resources. I know it is possible to run tests in parallel in JUnit (I've done 
it in Gradle, but [seems 
possible](https://www.baeldung.com/maven-junit-parallel-tests) in Maven as 
well). I don't have experience with testng.
   
   This seems promising (ideally it would reduce the time spend on tests even 
more than running 2 parallel instances) but given my lack of expertise in 
testng and possible issues we can find (some tests may not be able to run in 
parallel, for example integration tests that reserve some non random ports), I 
didn't try it. We would need to allocate time to do so.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to