gortiz opened a new issue, #11382: URL: https://github.com/apache/pinot/issues/11382
# The issue There are some maven module (like [pinot-query-runtime](https://app.codecov.io/gh/apache/pinot/tree/master/pinot-query-runtime)) that look uncovered in Codecov while having tests (in fact `pinot-query-runtime` actual coverage is around 90%). # The reason The reason is that we are using jacoco goal `report-aggregate` to create the coverage xml files, which does not count the lines covered by tests in the current maven module but only lines covered by tests in their dependencies. # The actual problem ## How Jacoco works Jacoco actually generates a `jacoco.exec` file. After running surefire tests with the jacoco agent, a `jacoco.exec` file is created on each project (specifically in `target/jacoco.exec`). Given that `jacoco.exec` is a binary file, the goal `report` is offered. This goal reads the `jacoco.exec` file and create reports in different formats (in Pinot we use the default value, which generates html, csv and xml reports). But `report` does only read the `jacoco.exec` file stored in `target/jacoco.exec` of the current project. Therefore the lines covered by other maven modules are not counted. In complex projects with several maven modules (like Pinot) it is normal to have tests in module A actually testing code in module B. That is specially useful when having a module with the integration tests (like we do in Pinot). In order to support that pattern, jacoco provides a `report-aggregate` goal. This goal read all the `jacoco.exec` files of all maven dependencies and merge their coverage. But this task *does not* reads the `jacoco.exec` file of the current project. For example, if we have the following projects: ```mermaid stateDiagram-v2 B --> A : depends on C --> A : depends on C --> B : depends on ``` And run `report` or `report-aggregate`: | | report | report-aggregate | |----|--------------------|--------------------------| | A | lines covered by A | is empty | | B | lines covered by B | lines covered by A | | C | lines covered by C | lines covered by A and B | Therefore, if we run `mvn test` with the current configuration (that uses `report-aggregate`), the coverage of projects nobody depends on is not included in the `jacoco.xml` file. ## How Codecov works Codecov Github action is pretty simple. It looks for all the files called `jacoco.xml` in the filesystem and merges them. ## How we run tests in GitHub Actions In order to run tests faster in GA, we actually split execute four parallel test jobs: two to test unit tests and two to test integration tests. This is defined in [.github/workflows/scripts/.pinot_test.sh](https://github.com/apache/pinot/blob/575398dae0841ba152c31306cb460cfdc776b04b/.github/workflows/scripts/.pinot_test.sh). In the case of unit tests, what we do is to split the maven modules in two disjoint groups. Given that what we want to achieve by splitting them is to get a faster test execution, the criteria we use to decide on which group a project should be is simply the time it takes to execute. But by splitting projects we are creating more projects _nobody depends on_. For example, project `pinot-query-runtime` is in the first partition. But all the projects that depend on `pinot-query-runtime` (like `pinot-broker` or `pinot-server`) are in the second partition. That means when we execute tests on the first partition, nobody depends on `pinot-query-runtime` and therefore there is no `jacoco.xml` that contains its coverage. # Possible solutions ## To do not parallelize tests I've tried that in https://github.com/apache/pinot/pull/11375. Coverage seems to improve (and `pinot-query-runtime` looks covered in [codecov report](https://app.codecov.io/gh/apache/pinot/pull/11375/tree/pinot-query-runtime)) but time spend on test increased from 50-60mins to 1h26mins. ## Use `report` instead of `report-aggregate` I've tried that in https://github.com/apache/pinot/pull/11376. Coverage seems to be worse (I don't exactly know why) but `pinot-query-runtime` looks covered in [codecov report](https://app.codecov.io/gh/apache/pinot/pull/11376/tree/pinot-query-runtime). ## Use `report-aggregate` with the new flag includeCurrentProject I've tried that in https://github.com/apache/pinot/pull/11381. At the time I'm writing these lines, tests are being executed. This new flag ([documentation](https://www.eclemma.org/jacoco/trunk/doc/report-aggregate-mojo.html#includeCurrentProject)) is used to indicate `report-aggregate` to also include the `jacoco.exec` file in the same project. ## Run tests in parallel on a single reactor By default testng (and junit) run tests using a single thread. Given that most tests are single thread and should be independent, that is a waste of resources. I know it is possible to run tests in parallel in JUnit (I've done it in Gradle, but [seems possible](https://www.baeldung.com/maven-junit-parallel-tests) in Maven as well). I don't have experience with testng. This seems promising (ideally it would reduce the time spend on tests even more than running 2 parallel instances) but given my lack of expertise in testng and possible issues we can find (some tests may not be able to run in parallel, for example integration tests that reserve some non random ports), I didn't try it. We would need to allocate time to do so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org