Feb 09 2019

Context

On the XWiki project, we've been pursuing a strategy of failing our Maven build automatically whenever the test coverage of each Maven module is below a threshold indicated in the pom.xml of that module. We're using Jacoco to measure this local coverage.

We've been doing this for over 6 years now and we've been generally happy about it. This has allowed us to raise the global test coverage of XWiki by a few percent every year.

More recently, I joined the STAMP European Research Project and one our KPIs is the global coverage, so I got curious and wanted to look at precisely how much we're winning every year. 

I realized that, even though we've been generally increasing our global coverage (computed using Clover), there are times when we actually reduce it or increase very little, even though at the local level all modules increase their local coverage...

Reporting

So I implemented a Jenkins pipeline script that is using Open Clover, that runs every day and that gets the raw Clover data and generates a report. This report shows how the global coverage evolves, Maven module by Maven module and the contribution of each module to the global coverage.

Here's a recent example report comparing global coverage from 2019-01-01 to 2019-01-08, i.e. just 8 days.

report.png

The lines in red are modules that have had changes lowering the global coverage (even though the local coverage for these modules didn't change or even increased!).

Why?

Analyzing a difference

So once we find that a module has lowered the global coverage, how do we analyze where it's coming from?

It's not easy though! What I've done is to take the 2 Clover reports for both dates and compare all packages in the modume and pinpoint the exact code where the coverage was lowered. Then it's about knowing the code base and the existing tests to find why those places are not executed anymore by the tests. Note that Clover helps a lot since its reports can tell you which tests contribute to coverage for each covered line!

I've you're interested, check for example a real analysis of the xwiki-commons-job coverage drop.

Reasons

Here are some reasons I analyzed that can cause a module to lower the global coverage even though its local coverage is stable or increases:

  1. Some functional tests exercise (directly or indirectly) code lines in this module that are not covered by its unit tests.
    1. Then some of this code is removed because it's a) no longer necessary, or b) it's deprecated, or c) it's moved to another module. Since there are no unit tests that covers it in the module, the local coverage doesn't change but the global one for the module does and it's lowered. Note that the full global coverage may not change if the code is moved to another module which itself is covered by unit or functional tests.
    2. It could also happen that the code line was hit because of a bug somewhere. Not a bug that throws an Exception (since that would have failed the test) but a bug that results in some IF path entered and for example generating a warning log. Then the bug is fixed and thus the functional tests don't enter this IF anymore and the coverage is lowered... emoticon_smile (FTR this is what happened for the xwiki-commons-job coverage drop in the shown report above)
  2. Some new module is added and its local coverage is below the average coverage of the other modules.
  3. Some module is removed and it had a higher than average global coverage.
  4. Some tests have failed and especially some functional tests are flickering. This will reduce the coverage of all module code lines that are only tested through tests located in other modules. It's thus important to check the test successes before "trusting" the coverage
  5. The local coverage is computed by Jacoco and we use instructions ratio, whereas the global coverage is computed using Clover which uses the TPC formula. There are cases where the covered instructions would stay stable but the TPC value would decrease. For example if a method is split into 2 methods, the covered byte case instructions remain the same but the TPC will decrease since the number of covered methods will stay fixed but the total number of methods will increase by 1...
  6. Rounding errors. We should ignore too low differences because it's possible that the local coverage would seem to remain the same (for example we round it to 2 digits) while the global coverage decreases (we round it to 4 digits in the shown report - we do that because the contribution of each module to the global coverage is low).

Strategy

So what strategy can we apply to ensure that the global coverage doesn't go down?

Here's the strategy that we're currently discussing/trying to setup on the XWiki project:

  • We run the Clover Jenkins pipeline every night (between 11PM-8AM)
  • The pipeline sends an email whenever the new report has its global TPC going down when compared with the baseline
  • The baseline report is the report generated just after each XWiki release. This means that we keep the same baseline during a whole release
  • We add a step in the Release Plan Template to have the report passing before we can release.
  • The Release Manager is in charge of a release from day 1 to the release day, and is also in charge of making sure that the global coverage job failures get addressed before the release day so that we’re ready on the release day, i.e that the global coverage doesn't go down.
  • Implementation detail: don’t send a failure email when there are failing tests in the build, to avoid false positives.

For reference, the various discussions on the XWiki list:

Conclusion

This experimentation has shown that in the case of XWiki, the global coverage is increasing consistently over the years, even though if, technically, it could go down. Now it also shows that with a bit more care and by ensuring that we always grow the global coverage between releases, we could make that global coverage increase a bit faster.

Additional Learnings

  • Clover does a bad job for comparing reports.
  • Don't trust Clover reports at package level either, they don't include all files.
  • Test failures reported by the Clover report is not accurate at all. For example on this report Clover shows 276 failures and 0 errors. I checked the build logs and the reality is 109 failures and 37 errors. Some tests are reported as failing when they're passing.
    • Here's an interesting example where Clover says that MacroBlockSignatureGeneratorTest#testIncompatibleBlockSignature() is failing but in the logs we have:
      [INFO] Running MacroBlockSignatureGeneratorTest
      [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.023 s - in MacroBlockSignatureGeneratorTest

      What's interesting is that Clover reports:

      testIncompatibleBlockSignature.png

      And the test contains:

      @Test
      public void testIncompatibleBlockSignature() throws Exception
      {
          thrown.expect(IllegalArgumentException.class);
          thrown.expectMessage("Unsupported block [org.xwiki.rendering.block.WordBlock].");

          assertThat(signer.generate(new WordBlock("macro"), CMS_PARAMS), equalTo(BLOCK_SIGNATURE));
      }

      This is a [[known Clover issue with test that asserts exceptions>>https://community.atlassian.com/t5/Questions/JUnit-Rule-ExpectedException-marked-as-failure/qaq-p/76884]]...
      )))
Created by Vincent Massol on 2019/02/09 14:32
Created by Vincent Massol on 2019/02/09 14:32
This wiki is licensed under a Creative Commons 2.0 license