Open Source

Last modified by Vincent Massol on 2008/12/22 21:17

45 posts

Jun 09 2019

Scheduled Jenkinsfile

Context

On the XWiki project we use Jenkinsfiles in our GitHub repositories, along with "Github Organization" type of jobs so that Jenkins handles automatically creating and destroying jobs based on git branches in these repositories. This is very convenient and we have a pretty elaborate Jenksinfile (using shared global libraries we developed) in which we execute about 14 different Maven builds, some in series and others in parallel to validate different things, including execution of functional tests.

We recently introduced functional tests that can be executed with different configurations (different databases, different servlet containers, different browsers, etc). Now that represents a lot of combinations and we can't run all of these every time there's a commit in GitHub. So we need to run some of them only once per day, others once per week and the rest once per month.

The problem is that Jenkins doesn't seem to support this feature out of the box when using a Jenkinsfile. In an ideal world, Jenkins would support several Jenkinsfile to achieve this. Right now the obvious solution is to create manually new jobs to run these configuration tests. However, doing this removes the benefits of the Jenkinsfile, the main one being the automatic creation and destruction of job for branches. We started with this and after a few months it became too painful to maintain. So we had to find a better solution...

The Solution

Let me start by saying that I find this solution suboptimal as it's complex and fraught with several problems.

Generally speaking the solution we implemented is based on the Parameterized Shcheduler Plugin but the devil is in the details.

  • Step 1: Make your job a parameterized job by defining a type variable that will hold what type of job you want to execute. In our case standard or docker-latest (to be executed daily), docker-all (to be executed weekly) and docker-unsupported (to be executed monthly). All the docker-* job types will execute our functional tests on various configurations. Also configure the parameterized scheduler plugin accordingly:
    private def getCustomJobProperties()
    {
     return [
        parameters([string(defaultValue: 'standard', description: 'Job type', name: 'type')]),
        pipelineTriggers([
          parameterizedCron('''@midnight %type=docker-latest
    @weekly %type=docker-all
    @monthly %type=docker-unsupported'''
    ),
          cron("@monthly")
       ])
     ]
    }

    You set this in the job with:

    properties(getCustomJobProperties())

    Important note: The job will need to be triggered once before the scheduler and the new parameter are effective!

  • Step 2: Based on the type parameter value, decide what to execute.For example:
    ...
    if (params.type && params.type == 'docker-latest') {
      buildDocker('docker-latest')
    }
    ...
  • Step 3: You may want to manually trigger your job using the Jenkins UI and decide what type of build to execute (this is useful to debug some test problems for example). You can do it this way:
    def choices = 'Standard\nDocker Latest\nDocker All\nDocker Unsupported'
    def selection = askUser(choices)
    if (selection == 'Standard') {
     ...
    } else of (selection == 'Docker Latest') {
     ...
    } else ...

    In our case askUSer is a custom pipeline library defined like this:

    def call(choices)
    {
       def selection

       // If a user is manually triggering this job, then ask what to build
       if (currentBuild.rawBuild.getCauses()[0].toString().contains('UserIdCause')) {
            echo "Build triggered by user, asking question..."
           try {
                timeout(time: 60, unit: 'SECONDS') {
                    selection = input(id: 'selection', message: 'Select what to build', parameters: [
                        choice(choices: choices, description: 'Choose which build to execute', name: 'build')
                   ])
               }
           } catch(err) {
               def user = err.getCauses()[0].getUser()
               if ('SYSTEM' == user.toString()) { // SYSTEM means timeout.
                   selection = 'Standard'
               } else {
                   // Aborted by user
                   throw err
               }
           }
       } else {
            echo "Build triggered automatically, building 'All'..."
            selection = 'Standard'
       }

       return selection
    }

Limitations

While this may sound like a nice solution, it has a drawback. Jenkins's build history gets messed up, because you're reusing the same job name but running different builds. For example, test failure age will get reset every time a different type of build is ran. Note that at least individual test history is kept.

Since different types of builds are executed in the same job, we also wanted the job history to visibly show when scheduled jobs are executed vs the standard jobs. Thus we added the following in our pipeline:

import com.cloudbees.groovy.cps.NonCPS
import com.jenkinsci.plugins.badge.action.BadgeAction
...
def badgeText = 'Docker Build'
def badgeFound = isBadgeFound(currentBuild.getRawBuild().getActions(BadgeAction.class), badgeText)
if (!badgeFound) {
    manager.addInfoBadge(badgeText)
    manager.createSummary('green.gif').appendText("<h1>${badgeText}</h1>", false, false, false, 'green')
    currentBuild.rawBuild.save()
}

@NonCPS
private def isBadgeFound(def badgeActionItems, def badgeText)
{
   def badgeFound = false
    badgeActionItems.each() {
       if (it.getText().contains(badgeText)) {
            badgeFound = true
           return
       }
   }
   return badgeFound
}

Visually this gives the following where you can see information icons for the configuration tests (and you can hover over the information icon with the mouse to see the text):

history.png

What's your solution to this problem? I'd be very eager to know if someone has found a better solution to implement this in Jenkins.

Jul 20 2018

Resolving Maven Artifacts with ShrinkWrap... or not

On the XWiki project we wanted to generate custom XWiki WARs directly from our unit tests (to deploy XWiki in Docker containers automatically and directly from the tests).

I was really excited when I discovered the SkrinkWrap Resolver library. It looked exactly to be what I needed. I didn't want to use Aether (now deprecated) or the new Maven Resolver (wasn't sure what the state was and very little doc to use it).

So I coded the whole WAR generation (pom.xml). Here are some extracts showing you how easy it is to use ShrinkWrap.

Example to find all dependencies of an Artifact:

List<MavenResolvedArtifact> artifacts = resolveArtifactWithDependencies(
    String.format("org.xwiki.platform:xwiki-platform-distribution-war-dependencies:pom:%s", version));

...
protected List<MavenResolvedArtifact> resolveArtifactWithDependencies(String gav)
{
   return getConfigurableMavenResolverSystem()
       .resolve(gav)
       .withTransitivity()
       .asList(MavenResolvedArtifact.class);
}

protected ConfigurableMavenResolverSystem getConfigurableMavenResolverSystem()
{
   return Maven.configureResolver()
       .withClassPathResolution(true)
       .withRemoteRepo(
           "mavenXWikiSnapshot", "http://nexus.xwiki.org/nexus/content/groups/public-snapshots", "default")
       .withRemoteRepo(
           "mavenXWiki", "http://nexus.xwiki.org/nexus/content/groups/public", "default");
}

Here's another example to read the version from a resolved pom.xml file (didn't find how to do that easily with Maven Resolver BTW):

private String getCurrentPOMVersion()
{
    MavenResolverSystem system = Maven.resolver();
    system.loadPomFromFile("pom.xml");
   // Hack around a bit to get to the internal Maven Model object
   ParsedPomFile parsedPom = ((MavenWorkingSessionContainer) system).getMavenWorkingSession().getParsedPomFile();
   return parsedPom.getVersion();
}

And here's how to resolve a single Artifact:

File configurationJARFile = resolveArtifact(
    String.format("org.xwiki.platform:xwiki-platform-tool-configuration-resources:%s", version));
...
protected File resolveArtifact(String gav)
{
   return getConfigurableMavenResolverSystem()
       .resolve(gav)
       .withoutTransitivity()
       .asSingleFile();
}

Pretty nice, isn't it?

It looked nice till I tried to use the generated WAR... Then, all my hopes disappeared... There's one big issue: it seems that ShrinkWrap will resolve dependencies by using a strategy different than what Maven does:

  • Maven: Maven takes the artifact closest to the top (From the Maven web site: "Dependency mediation - this determines what version of a dependency will be used when multiple versions of an artifact are encountered. Currently, Maven 2.0 only supports using the "nearest definition" which means that it will use the version of the closest dependency to your project in the tree of dependencies.").
  • ShrinkWrap: First artifact found in the tree (navigating each dependency node to the deepest).

So this led to a non-functional XWiki WAR with completely different JAR versions than what is generated by our Maven build.

To this day, I still don't know if that's a known bug and since nobody was replying to my thread on the ShrinkWrap forum I created an issue to track it. So far no answer. I hope someone from the ShrinkWrap project will reply.

Conclusion: Time to use the Maven Resolver library... Spoiler: I've succeeded in doing the same thing with it (and I get the same result as with mvn on the command line) and I'll report that in a blog post soon.

Feb 05 2018

FOSDEM 2018

Once more I was happy to go to FOSDEM. This year XWiki SAS, the company I work for, had 12 employees going there and we had about 8 talks accepted + we had a stand for the XWiki open source project that we shared with our friends @ Tiki and Foswiki.

Here were the highlights for me:

  • I talked about what's next on Java testing and covered test coverage, backward compatibility enforcement, mutation testing and environment testing. My experience on the last two types of tests are directly issued from my participation the STAMP research project where we develop and use tools to amplify existing tests.
  • I did another talk about "Addressing the Long Tail of (web) applications", explaining how an advanced structured wiki such as XWiki can be used to quickly create ad-hoc application in wiki pages.
  • Since we had a stand shared between 3 wiki communities (Tiki, Foswiki and XWiki), I was also interested in comparing our features, and how our communities work.
    • I met the nice folks of Tiki at their TikiHouse and had long discussions about how similar and differently we do things. Maybe the topic for a future blog post? emoticon_smile
    • Then I had Michael Daum do a demo to me of the nice features of Foswiki. I was quite impressed and found a lot of similarities in features. 
    • Funnily our 3 wiki solutions are written in 3 different technologies: Tiki in PHP, Foswiki in Perl and XWiki in Java. Nice diversity!
  • I met a lot of people of course (Fosdem is really THE place to be to meet people from the FOSS communities) but I'd like to thank especially Damien Duportal who took the time to sit with me and go over several questions I had about Jenkins pipelines and Docker. I'll most likely blog about some of those solutions in the near future.

All in all, an excellent FOSDEM again, with lots of talks and waffles emoticon_wink

Dec 15 2017

POSS 2017

My company (XWiki SAS) had a booth at the Paris Open Source Summit 2017 (POSS) and we also organized a track about "One job, one solution!", promoting open source solutions.

I was asked to talk about using XWiki as an alternative to Confluence or Sharepoint.

Here are the slides of the talk:

There were about 30 persons in the room. I focused on showing what I believe are the major differences, especially with Confluence that I know better than SharePoint.

If you're interested by more details you can find a comparison between XWiki and Confluence on xwiki.org.

Nov 17 2017

Controlling Test Quality

We already know how to control code quality by writing automated tests. We also know how to ensure that the code quality doesn't go down by using a tool to measure code covered by tests and fail the build automatically when it goes under a given threshold (and it seems to be working).

Wouldn't it be nice to be also able to verify the quality of the tests themselves? emoticon_smile

I'm proposing the following strategy for this:

  • Integrate PIT/Descartes in your Maven build
  • PIT/Descartes generates a Mutation Score metric. So the idea is to monitor this metric and ensure that it keeps going in the right direction and doesn't go down. Similar than watching the Clover TPC metric and ensuring it always go up.
  • Thus the idea would be, for each Maven module to set up a Mutation Score threshold (you'd run it once to get the current value and set that value as the initial threshold) and have the PIT/Descartes Maven plugin fail the build if the computed mutation score is below this threshold. In effect this would tell that the last changes have introduced tests that are of lowering quality than existing tests (in average) and that the new tests need to be improved to the level of the others.

In order for this strategy to be implementable we need PIT/Descartes to implement the following enhancements requests first:

I'm eagerly waiting for this issues to be fixed in order to try this strategy on the XWiki project and verify it can work in practice. There are some reason why it couldn't work such as being too painful and not being easy enough to identify test problems and fix them.

WDYT? Do you see this as possibly working?

Nov 14 2017

Comparing Clover Reports

On the XWiki project, we use Clover to compute our global test coverage. We do this over several Git repositories and include functional tests (and more generally the coverage brought by some modules into other modules).

Now I wanted to see the difference between 2 reports that were generated:

I was surprised to see a drop in the global TPC, from 73.2% down to 71.3%. So I took the time to understand the issue.

It appears that Clover classifies your code classes as Application Code and Test Code (I have no idea what strategy it uses to differentiate them) and even though we've used the same version of Clover (4.1.2) for both reports, the test classes were not categorized similarly. It also seems that the TPC value given in the HTML report is from Application Code.

Luckily we asked the Clover Maven plugin to generate not only HTML reports but also XML reports. Thus I was able to write the following Groovy script that I executed in a wiki page in XWiki. I aggregated Application Code and Test code together in order to be able to compare the reports and the global TPC value.

result.png

{{groovy}}
def saveMetrics(def packageName, def metricsElement, def map) {
 def coveredconditionals = metricsElement.@coveredconditionals.toDouble()
 def coveredstatements = metricsElement.@coveredstatements.toDouble()
 def coveredmethods = metricsElement.@coveredmethods.toDouble()
 def conditionals = metricsElement.@conditionals.toDouble()
 def statements = metricsElement.@statements.toDouble()
 def methods = metricsElement.@methods.toDouble()
 def mapEntry = map.get(packageName)
 if (mapEntry) {
    coveredconditionals = coveredconditionals + mapEntry.get('coveredconditionals')
    coveredstatements = coveredstatements + mapEntry.get('coveredstatements')
    coveredmethods = coveredmethods + mapEntry.get('coveredmethods')
    conditionals = conditionals + mapEntry.get('conditionals')
    statements = statements + mapEntry.get('statements')
    methods = methods + mapEntry.get('methods')
 }
 def metrics = [:]
  metrics.put('coveredconditionals', coveredconditionals)
  metrics.put('coveredstatements', coveredstatements)
  metrics.put('coveredmethods', coveredmethods)
  metrics.put('conditionals', conditionals)
  metrics.put('statements', statements)
  metrics.put('methods', methods)
  map.put(packageName, metrics)
}
def scrapeData(url) {
 def root = new XmlSlurper().parseText(url.toURL().text)
 def map = [:]
  root.project.package.each() { packageElement ->
   def packageName = packageElement.@name
    saveMetrics(packageName.text(), packageElement.metrics, map)
 }
  root.testproject.package.each() { packageElement ->
   def packageName = packageElement.@name
    saveMetrics(packageName.text(), packageElement.metrics, map)
 }
 return map
}
def computeTPC(def map) {
 def tpcMap = [:]
 def totalcoveredconditionals = 0
 def totalcoveredstatements = 0
 def totalcoveredmethods = 0
 def totalconditionals = 0
 def totalstatements = 0
 def totalmethods = 0
  map.each() { packageName, metrics ->
   def coveredconditionals = metrics.get('coveredconditionals')
    totalcoveredconditionals += coveredconditionals
   def coveredstatements = metrics.get('coveredstatements')
    totalcoveredstatements += coveredstatements
   def coveredmethods = metrics.get('coveredmethods')
    totalcoveredmethods += coveredmethods
   def conditionals = metrics.get('conditionals')
    totalconditionals += conditionals
   def statements = metrics.get('statements')
    totalstatements += statements
   def methods = metrics.get('methods')
    totalmethods += methods
   def elementsCount = conditionals + statements + methods
   def tpc
   if (elementsCount == 0) {
      tpc = 0
   } else {
      tpc = ((coveredconditionals + coveredstatements + coveredmethods)/(conditionals + statements + methods)).trunc(4) * 100
    }
    tpcMap.put(packageName, tpc)
  }
  tpcMap.put("ALL", ((totalcoveredconditionals + totalcoveredstatements + totalcoveredmethods)/
(totalconditionals + totalstatements + totalmethods)).trunc(4) * 100)
 return tpcMap
}

// map1 = old
def map1 = computeTPC(scrapeData('http://maven.xwiki.org/site/clover/20161220/clover-commons+rendering+platform+enterprise-20161220-2134/clover.xml')).sort()

// map2 = new
def map2 = computeTPC(scrapeData('http://maven.xwiki.org/site/clover/20171109/clover-commons+rendering+platform-20171109-1920/clover.xml')).sort()

  println "= Added Packages"
println "|=Package|=TPC New"
map2.each() { packageName, tpc ->
 if (!map1.containsKey(packageName)) {
    println "|${packageName}|${tpc}"
 }  
}
println "= Differences"
println "|=Package|=TPC Old|=TPC New"
map2.each() { packageName, tpc ->
 def oldtpc = map1.get(packageName)
 if (oldtpc && tpc != oldtpc) {
   def css = oldtpc > tpc ? '(% style="color:red;" %)' : '(% style="color:green;" %)'
    println "|${packageName}|${oldtpc}|${css}${tpc}"
 }
}
println "= Removed Packages"
println "|=Package|=TPC Old"
map1.each() { packageName, tpc ->
 if (!map2.containsKey(packageName)) {
    println "|${packageName}|${tpc}"
 }
}
{{/groovy}}

And the result was quite different from what the HTML report was giving us!

We went from 74.07% in 2016-12-20 to 76.28% in 2017-11-09 (so quite different from the 73.2% to 71.3% figure given by the HTML report). Much nicer! emoticon_smile

Note that one reason I wanted to compare the TPC values was to see if our strategy of failing the build if a module's TPC is below the current threshold was working or not (I had tried to assess it before but it wasn't very conclusive).

Now I know that we won 1.9% of TPC in a bit less than a year and that looks good emoticon_smile

EDIT: I'm aware of the Historical feature of Clover but:

  • We haven't set it up so it's too late to compare old reports
  • I don't think it would help with the issue we faced with test code being counted as Application Code, and that being done differently depending on the generated reports.

Nov 08 2017

Flaky tests handling with Jenkins & JIRA

Flaky tests are a plague because they lower the credibility in your CI strategy, by sending false positive notification emails.

In a previous blog post, I detailed a solution we use on the XWiki project to handle false positives caused by the environment on which the CI build is running. However this solution wasn't handling flaky tests. This blog post is about fixing this!

So the strategy I'm proposing for Flaky tests is the following:

  • When a Flaky test is discovered, create a JIRA issue to remember to work on it and fix it (we currently have the following open issues related to Flaky tests)
  • The JIRA issue is marked as containing a flaky test by filling a custom field called "Flickering Test", using the following format: <package name of test class>.<test class name>#<test method name>. There can be several entries separated by commas.

    Example:

    jiraexample.png

  • In our Pipeline script, after the tests have executed, review the failing ones and check if they are in the list of known flaky tests in JIRA. If so, indicate it in the Jenkins test report. If all failing tests are flickers, don't send a notification email.

    Indication in the job history:

    joblist.png

    Indication on the job result page:

    jobpage.png

    Information on the test page itself:

    testpage.png

Note that there's an alternate solution that can also work:

  • When a Flaky test is discovered, create a JIRA issue to remember to work on it and fix it
  • Add an @Ignore annotation in the test with a detail pointing to the JIRA issue (something like @Ignore("WebDriver doesn't support uploading multiple files in one input, see http://code.google.com/p/selenium/issues/detail?id=2239")
    ). This will prevent the build from executing this flaky test.

This last solution is certainly low-tech compared to the first one. I prefer the first one though for the following reasons:

  • It allows flaky tests to continue executing on the CI and thus serve as a constant reminder that something needs to be fixed. Adding the @Ignore annotation feels like putting the dust under the carpet and there's little chance you're going to come back to it in the future...
  • Since our script acts as postbuild script on the CI agent, there's the possibility to add some logic to auto-discover flaky tests that have not yet been marked as flaky.

Also note that there's a Jenkins plugin for Flaky test but I don't like the strategy involved which is to re-run failing tests a number of times to see if they pass. In theory it can work. In practice this means CI jobs that will take a lot longer to execute, making it impractical for functional UI tests (which is where we have flaky tests in XWiki). In addition, flakiness sometimes only happens when the full test suite is executed (i.e. it depends on what executes before) and sometimes require a large number of runs before passing.

So without further ado, here's the Jenkins Pipeline script to implement the strategy we defined above (you can check the full pipeline script):

/**
 * Check for test flickers, and modify test result descriptions for tests that are identified as flicker. A test is
 * a flicker if there's a JIRA issue having the "Flickering Test" custom field containing the FQN of the test in the
 * format {@code <java package name>#<test name>}.
 *
 * @return true if the failing tests only contain flickering tests
 */

def boolean checkForFlickers()
{
   boolean containsOnlyFlickers = false
    AbstractTestResultAction testResultAction =  currentBuild.rawBuild.getAction(AbstractTestResultAction.class)
   if (testResultAction != null) {
       // Find all failed tests
       def failedTests = testResultAction.getResult().getFailedTests()
       if (failedTests.size() > 0) {
           // Get all false positives from JIRA
           def url = "https://jira.xwiki.org/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?".concat(
                   "jqlQuery=%22Flickering%20Test%22%20is%20not%20empty%20and%20resolution%20=%20Unresolved")
           def root = new XmlSlurper().parseText(url.toURL().text)
           def knownFlickers = []
            root.channel.item.customfields.customfield.each() { customfield ->
               if (customfield.customfieldname == 'Flickering Test') {
                    customfield.customfieldvalues.customfieldvalue.text().split(',').each() {
                        knownFlickers.add(it)
                   }
               }
           }
            echoXWiki "Known flickering tests: ${knownFlickers}"

           // For each failed test, check if it's in the known flicker list.
           // If all failed tests are flickers then don't send notification email
           def containsAtLeastOneFlicker = false
            containsOnlyFlickers = true
            failedTests.each() { testResult ->
               // Format of a Test Result id is "junit/<package name>/<test class name>/<test method name>"
               def parts = testResult.getId().split('/')
               def testName = "${parts[1]}.${parts[2]}#${parts[3]}"
               if (knownFlickers.contains(testName)) {
                   // Add the information that the test is a flicker to the test's description
                   testResult.setDescription(
                       "<h1 style='color:red'>This is a flickering test</h1>${testResult.getDescription() ?: ''}")
                    echoXWiki "Found flickering test: [${testName}]"
                    containsAtLeastOneFlicker = true
               } else {
                   // This is a real failing test, thus we'll need to send athe notification email...
                   containsOnlyFlickers = false
               }
           }

           if (containsAtLeastOneFlicker) {
                manager.addWarningBadge("Contains some flickering tests")
                manager.createSummary("warning.gif").appendText("<h1>Contains some flickering tests</h1>", false,
                   false, false, "red")
           }
       }
   }

   return containsOnlyFlickers
}

Hope you like it! Let me know in comments how you're handling Flaky tests in your project so that we can compare/discuss.

Oct 29 2017

Softshake 2017

I had the chance to participate to Softshake (2017 edition)  for the first time. It's a small but very nice conference held in Geneva, Switzerland.

From what I gathered, this year there were less attendees than in the previous years (About 150 vs 300 before). However, the organization was very nice:

  • Located inside the Hepia school with plenty of rooms available
  • 6 tracks in parallel, which is incredible for a small conference
  • Breakfast, lunch and snacks organized with good food
  • Speaker dinner with Fondue and all emoticon_wink

I got to present 2 talks:

I was also very happy to see my friend and ex-OCTO Technology colleague Philippe Kernevez, and to meet new OCTO consultants. Reminded me of the good times at OCTO emoticon_smile

Oct 28 2017

Google Summer of Code Summit 2017

This year the XWiki project had 5 4 GSOC students (we lost one to FOSSASIA who clicked faster than us!). This is way cool and we're glad that Google is organizing this every year. XWiki has been participating since the beginning (2005 AFAIR).

Every year the XWiki project sends 2 mentors to participate to eh GSOC summit. This year it was Thomas Mortagne and me who got the honor to go.

Quiz: find us on the group picture:

all.jpg

This conference is organized as an unconference. These are the key highlights I gout out of it:

  • Google will continue organizing GSOC in the future. Yeah!
  • Lots of discussions about Google Code In (for students aged 12-17) and to handle it to the best for organizations. This convinced us to register the XWiki project to participate and... we just got selected 2 days ago. If you're interested to participate, see the XWiki Code-In page. I'm really curious to see how it'll go.
  • I proposed a session on "Wikis: what's next", which turned into a "What is XWiki and why is it different from other wikis" emoticon_wink Incidentally this got me thinking about how to best express what XWiki is in one sentence. So far I've found the following:
    • View 1: Bring the concepts of wiki (collaboration on same content, edit+save, history+rollback, links) to application development.
    • View 2: A runtime web development platform. The default wiki you get is just one example (Similar to Eclipse IDE vs eclipse platform).
    • View3: Provide ability to add semantics to content: from free form data to structured data. Pages can contain free form or structured data / metadata, and control how it's displayed.
    • View 4: A web application server for content-related applications i.e. Provides all components / building blocks to easily create web applications.
  • Extremely well organized by Google. And good food and lots of choice (I'm vegetarian). I'm vegetarian but I love chocolate and this was heaven (all attendees brought chocolate!):

    chocolat.jpg

Thanks Google. Till next time!

Sep 28 2017

Mutation testing with PIT and Descartes

XWiki SAS is part of an European research project named STAMP. As part of this project I've been able to experiment a bit with Descartes, a mutation engine for PIT.

What PIT does is mutate the code under test and check if the existing test suite is able to detect those mutations. In other words, it checks the quality of your test suite.

Descartes plugs into PIT by providing a set of specific mutators. For example one mutator will replace the output of methods by some fixed value (for example a method returning a boolean will always return true). Another will remove the content of void methods. It then generates a report.

Here's an example of running Descartes on a module of XWiki:

report.png

You can see both the test coverage score (computed automatically by PIT using Jacoco) and the Mutation score. 

If we drill down to one class (MacroId.java) we can see for example the following report for the equals() method:

equals.png

What's interesting to note is that the test coverage says that the following code has been tested:

result =
   (getId() == macroId.getId() || (getId() != null && getId().equals(macroId.getId())))
   && (getSyntax() == macroId.getSyntax() || (getSyntax() != null && getSyntax().equals(
    macroId.getSyntax())));

However, the mutation testing is telling us a different story. It says that if you change the equals method code with negative conditions (i.e. testing for inequality), the test still reports success.

If we check the test code:

@Test
public void testEquality()
{
    MacroId id1 = new MacroId("id", Syntax.XWIKI_2_0);
    MacroId id2 = new MacroId("id", Syntax.XWIKI_2_0);
    MacroId id3 = new MacroId("otherid", Syntax.XWIKI_2_0);
    MacroId id4 = new MacroId("id", Syntax.XHTML_1_0);
    MacroId id5 = new MacroId("otherid", Syntax.XHTML_1_0);
    MacroId id6 = new MacroId("id");
    MacroId id7 = new MacroId("id");

    Assert.assertEquals(id2, id1);
   // Equal objects must have equal hashcode
   Assert.assertTrue(id1.hashCode() == id2.hashCode());

    Assert.assertFalse(id3 == id1);
    Assert.assertFalse(id4 == id1);
    Assert.assertFalse(id5 == id3);
    Assert.assertFalse(id6 == id1);

    Assert.assertEquals(id7, id6);
   // Equal objects must have equal hashcode
   Assert.assertTrue(id6.hashCode() == id7.hashCode());
}

We can indeed see that the test doesn't test for inequality. Thus in practice if we replace the equals method by return true; then the test still pass.

That's interesting because that's something that test coverage didn't notice!

More generally the report provides a summary of all mutations it has done and whether they were killed or not by the tests. For example on this class:

mutations.png

Here's what I learnt while trying to use Descartes on XWiki:

  • It's being actively developed
  • It's interesting to classify the results in 3 categories:
    • strong pseudo-tested methods: no matter the return values of a method, the tests still passes. This is the worst offender since it means the tests really needs to be improved. This was the case in the example above.
    • weak pseudo-tested methods: the tests passes with at least one modified value. Not as bad as strong pseudo-tested but you may want still want to check it out.
    • fully tested methods: the tests fail for all mutations and thus can be considered rock-solid!
  • So in the future, the generated report should provide this classification to help analyze the results and focus on important problems.
  • It would be nice if the Maven plugin was improved and be able to fail if the mutation score was below a certain threshold (as we do for test coverage).
  • Performance: It's quite slow compared to Jacoco execution time for example. In my example above it took 34 seconds to execute will all possible mutations (for a project with 14 test classes, 31 tests and 20 classes).
  • It would be nice to have a Sonar integration so that PIT/Descartes could provide some stats on the Sonar dashboard.
  • Big limitation: ATM there's a big limitation: PIT (and/or Descartes) doesn't support being executed on a multi-module project. This means that right now you need to compute the full classpath for all modules and run all sources and tests as if it was a single module. This causes problems for all tests that depend on the filesystem and expect a given directory structure. It's also tedious and a error-prone problem since the classpath order can have side effects.

Conclusion:

PIT/Descartes is very nice but I feel it would need to provide a bit more added-value out of the box for the XWiki open source project to use it in an automated manner. The test coverage report we have are already providing a lot of information about the code that is not tested at all and if we have 5 hours to spend, we would probably spend them on adding tests rather than improving further existing tests. YMMV. If you have a very strong suite of tests and you want to check its quality, then PIT/Descartes is your friend!

If Descartes could provide the build-failure-on-low-threshold feature mentioned above that could be one way we could integrate it in the XWiki build. But for that to be possible PIT/Descartes need to be able to run on multi-module Maven projects.

I'm also currently testing DSpot. DSpot uses PIT and Descartes but in addition it uses the results to generate new tests automatically. That would be even more interesting (if it can work well-enough). I'll post back when I've been able to run DSpot on XWiki and learn more by using it.

Now, the Descartes project could also use the information provided by line coverage to automatically generate tests to cover the spotted issues.

I'd like to thank Oscar Luis Vera Pérez who's actively working on Descartes and who's shown me how to use it and how to analyze the results. Thanks Oscar! I'll also continue to work with Oscar on improving Descartes and executing it on the XWiki code base. 

Created by Vincent Massol on 2008/12/18 22:51