Blog Archive

Last modified by Vincent Massol on 2019/06/11 10:11

Blog - posts for November 2017

Nov 17 2017

Controlling Test Quality

We already know how to control code quality by writing automated tests. We also know how to ensure that the code quality doesn't go down by using a tool to measure code covered by tests and fail the build automatically when it goes under a given threshold (and it seems to be working).

Wouldn't it be nice to be also able to verify the quality of the tests themselves? emoticon_smile

I'm proposing the following strategy for this:

  • Integrate PIT/Descartes in your Maven build
  • PIT/Descartes generates a Mutation Score metric. So the idea is to monitor this metric and ensure that it keeps going in the right direction and doesn't go down. Similar than watching the Clover TPC metric and ensuring it always go up.
  • Thus the idea would be, for each Maven module to set up a Mutation Score threshold (you'd run it once to get the current value and set that value as the initial threshold) and have the PIT/Descartes Maven plugin fail the build if the computed mutation score is below this threshold. In effect this would tell that the last changes have introduced tests that are of lowering quality than existing tests (in average) and that the new tests need to be improved to the level of the others.

In order for this strategy to be implementable we need PIT/Descartes to implement the following enhancements requests first:

I'm eagerly waiting for this issues to be fixed in order to try this strategy on the XWiki project and verify it can work in practice. There are some reason why it couldn't work such as being too painful and not being easy enough to identify test problems and fix them.

WDYT? Do you see this as possibly working?

Nov 14 2017

Comparing Clover Reports

On the XWiki project, we use Clover to compute our global test coverage. We do this over several Git repositories and include functional tests (and more generally the coverage brought by some modules into other modules).

Now I wanted to see the difference between 2 reports that were generated:

I was surprised to see a drop in the global TPC, from 73.2% down to 71.3%. So I took the time to understand the issue.

It appears that Clover classifies your code classes as Application Code and Test Code (I have no idea what strategy it uses to differentiate them) and even though we've used the same version of Clover (4.1.2) for both reports, the test classes were not categorized similarly. It also seems that the TPC value given in the HTML report is from Application Code.

Luckily we asked the Clover Maven plugin to generate not only HTML reports but also XML reports. Thus I was able to write the following Groovy script that I executed in a wiki page in XWiki. I aggregated Application Code and Test code together in order to be able to compare the reports and the global TPC value.

result.png

{{groovy}}
def saveMetrics(def packageName, def metricsElement, def map) {
 def coveredconditionals = metricsElement.@coveredconditionals.toDouble()
 def coveredstatements = metricsElement.@coveredstatements.toDouble()
 def coveredmethods = metricsElement.@coveredmethods.toDouble()
 def conditionals = metricsElement.@conditionals.toDouble()
 def statements = metricsElement.@statements.toDouble()
 def methods = metricsElement.@methods.toDouble()
 def mapEntry = map.get(packageName)
 if (mapEntry) {
    coveredconditionals = coveredconditionals + mapEntry.get('coveredconditionals')
    coveredstatements = coveredstatements + mapEntry.get('coveredstatements')
    coveredmethods = coveredmethods + mapEntry.get('coveredmethods')
    conditionals = conditionals + mapEntry.get('conditionals')
    statements = statements + mapEntry.get('statements')
    methods = methods + mapEntry.get('methods')
 }
 def metrics = [:]
  metrics.put('coveredconditionals', coveredconditionals)
  metrics.put('coveredstatements', coveredstatements)
  metrics.put('coveredmethods', coveredmethods)
  metrics.put('conditionals', conditionals)
  metrics.put('statements', statements)
  metrics.put('methods', methods)
  map.put(packageName, metrics)
}
def scrapeData(url) {
 def root = new XmlSlurper().parseText(url.toURL().text)
 def map = [:]
  root.project.package.each() { packageElement ->
   def packageName = packageElement.@name
    saveMetrics(packageName.text(), packageElement.metrics, map)
 }
  root.testproject.package.each() { packageElement ->
   def packageName = packageElement.@name
    saveMetrics(packageName.text(), packageElement.metrics, map)
 }
 return map
}
def computeTPC(def map) {
 def tpcMap = [:]
 def totalcoveredconditionals = 0
 def totalcoveredstatements = 0
 def totalcoveredmethods = 0
 def totalconditionals = 0
 def totalstatements = 0
 def totalmethods = 0
  map.each() { packageName, metrics ->
   def coveredconditionals = metrics.get('coveredconditionals')
    totalcoveredconditionals += coveredconditionals
   def coveredstatements = metrics.get('coveredstatements')
    totalcoveredstatements += coveredstatements
   def coveredmethods = metrics.get('coveredmethods')
    totalcoveredmethods += coveredmethods
   def conditionals = metrics.get('conditionals')
    totalconditionals += conditionals
   def statements = metrics.get('statements')
    totalstatements += statements
   def methods = metrics.get('methods')
    totalmethods += methods
   def elementsCount = conditionals + statements + methods
   def tpc
   if (elementsCount == 0) {
      tpc = 0
   } else {
      tpc = ((coveredconditionals + coveredstatements + coveredmethods)/(conditionals + statements + methods)).trunc(4) * 100
    }
    tpcMap.put(packageName, tpc)
  }
  tpcMap.put("ALL", ((totalcoveredconditionals + totalcoveredstatements + totalcoveredmethods)/
(totalconditionals + totalstatements + totalmethods)).trunc(4) * 100)
 return tpcMap
}

// map1 = old
def map1 = computeTPC(scrapeData('http://maven.xwiki.org/site/clover/20161220/clover-commons+rendering+platform+enterprise-20161220-2134/clover.xml')).sort()

// map2 = new
def map2 = computeTPC(scrapeData('http://maven.xwiki.org/site/clover/20171109/clover-commons+rendering+platform-20171109-1920/clover.xml')).sort()

  println "= Added Packages"
println "|=Package|=TPC New"
map2.each() { packageName, tpc ->
 if (!map1.containsKey(packageName)) {
    println "|${packageName}|${tpc}"
 }  
}
println "= Differences"
println "|=Package|=TPC Old|=TPC New"
map2.each() { packageName, tpc ->
 def oldtpc = map1.get(packageName)
 if (oldtpc && tpc != oldtpc) {
   def css = oldtpc > tpc ? '(% style="color:red;" %)' : '(% style="color:green;" %)'
    println "|${packageName}|${oldtpc}|${css}${tpc}"
 }
}
println "= Removed Packages"
println "|=Package|=TPC Old"
map1.each() { packageName, tpc ->
 if (!map2.containsKey(packageName)) {
    println "|${packageName}|${tpc}"
 }
}
{{/groovy}}

And the result was quite different from what the HTML report was giving us!

We went from 74.07% in 2016-12-20 to 76.28% in 2017-11-09 (so quite different from the 73.2% to 71.3% figure given by the HTML report). Much nicer! emoticon_smile

Note that one reason I wanted to compare the TPC values was to see if our strategy of failing the build if a module's TPC is below the current threshold was working or not (I had tried to assess it before but it wasn't very conclusive).

Now I know that we won 1.9% of TPC in a bit less than a year and that looks good emoticon_smile

EDIT: I'm aware of the Historical feature of Clover but:

  • We haven't set it up so it's too late to compare old reports
  • I don't think it would help with the issue we faced with test code being counted as Application Code, and that being done differently depending on the generated reports.

Nov 08 2017

Flaky tests handling with Jenkins & JIRA

Flaky tests are a plague because they lower the credibility in your CI strategy, by sending false positive notification emails.

In a previous blog post, I detailed a solution we use on the XWiki project to handle false positives caused by the environment on which the CI build is running. However this solution wasn't handling flaky tests. This blog post is about fixing this!

So the strategy I'm proposing for Flaky tests is the following:

  • When a Flaky test is discovered, create a JIRA issue to remember to work on it and fix it (we currently have the following open issues related to Flaky tests)
  • The JIRA issue is marked as containing a flaky test by filling a custom field called "Flickering Test", using the following format: <package name of test class>.<test class name>#<test method name>. There can be several entries separated by commas.

    Example:

    jiraexample.png

  • In our Pipeline script, after the tests have executed, review the failing ones and check if they are in the list of known flaky tests in JIRA. If so, indicate it in the Jenkins test report. If all failing tests are flickers, don't send a notification email.

    Indication in the job history:

    joblist.png

    Indication on the job result page:

    jobpage.png

    Information on the test page itself:

    testpage.png

Note that there's an alternate solution that can also work:

  • When a Flaky test is discovered, create a JIRA issue to remember to work on it and fix it
  • Add an @Ignore annotation in the test with a detail pointing to the JIRA issue (something like @Ignore("WebDriver doesn't support uploading multiple files in one input, see http://code.google.com/p/selenium/issues/detail?id=2239")
    ). This will prevent the build from executing this flaky test.

This last solution is certainly low-tech compared to the first one. I prefer the first one though for the following reasons:

  • It allows flaky tests to continue executing on the CI and thus serve as a constant reminder that something needs to be fixed. Adding the @Ignore annotation feels like putting the dust under the carpet and there's little chance you're going to come back to it in the future...
  • Since our script acts as postbuild script on the CI agent, there's the possibility to add some logic to auto-discover flaky tests that have not yet been marked as flaky.

Also note that there's a Jenkins plugin for Flaky test but I don't like the strategy involved which is to re-run failing tests a number of times to see if they pass. In theory it can work. In practice this means CI jobs that will take a lot longer to execute, making it impractical for functional UI tests (which is where we have flaky tests in XWiki). In addition, flakiness sometimes only happens when the full test suite is executed (i.e. it depends on what executes before) and sometimes require a large number of runs before passing.

So without further ado, here's the Jenkins Pipeline script to implement the strategy we defined above (you can check the full pipeline script):

/**
 * Check for test flickers, and modify test result descriptions for tests that are identified as flicker. A test is
 * a flicker if there's a JIRA issue having the "Flickering Test" custom field containing the FQN of the test in the
 * format {@code <java package name>#<test name>}.
 *
 * @return true if the failing tests only contain flickering tests
 */

def boolean checkForFlickers()
{
   boolean containsOnlyFlickers = false
    AbstractTestResultAction testResultAction =  currentBuild.rawBuild.getAction(AbstractTestResultAction.class)
   if (testResultAction != null) {
       // Find all failed tests
       def failedTests = testResultAction.getResult().getFailedTests()
       if (failedTests.size() > 0) {
           // Get all false positives from JIRA
           def url = "https://jira.xwiki.org/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?".concat(
                   "jqlQuery=%22Flickering%20Test%22%20is%20not%20empty%20and%20resolution%20=%20Unresolved")
           def root = new XmlSlurper().parseText(url.toURL().text)
           def knownFlickers = []
            root.channel.item.customfields.customfield.each() { customfield ->
               if (customfield.customfieldname == 'Flickering Test') {
                    customfield.customfieldvalues.customfieldvalue.text().split(',').each() {
                        knownFlickers.add(it)
                   }
               }
           }
            echoXWiki "Known flickering tests: ${knownFlickers}"

           // For each failed test, check if it's in the known flicker list.
           // If all failed tests are flickers then don't send notification email
           def containsAtLeastOneFlicker = false
            containsOnlyFlickers = true
            failedTests.each() { testResult ->
               // Format of a Test Result id is "junit/<package name>/<test class name>/<test method name>"
               def parts = testResult.getId().split('/')
               def testName = "${parts[1]}.${parts[2]}#${parts[3]}"
               if (knownFlickers.contains(testName)) {
                   // Add the information that the test is a flicker to the test's description
                   testResult.setDescription(
                       "<h1 style='color:red'>This is a flickering test</h1>${testResult.getDescription() ?: ''}")
                    echoXWiki "Found flickering test: [${testName}]"
                    containsAtLeastOneFlicker = true
               } else {
                   // This is a real failing test, thus we'll need to send athe notification email...
                   containsOnlyFlickers = false
               }
           }

           if (containsAtLeastOneFlicker) {
                manager.addWarningBadge("Contains some flickering tests")
                manager.createSummary("warning.gif").appendText("<h1>Contains some flickering tests</h1>", false,
                   false, false, "red")
           }
       }
   }

   return containsOnlyFlickers
}

Hope you like it! Let me know in comments how you're handling Flaky tests in your project so that we can compare/discuss.

Created by Vincent Massol on 2013/10/22 12:57