Archive

Last modified by Vincent Massol on 2008/12/20 13:47

Blog - XWiki - posts for September 2017

Sep 28 2017

Mutation testing with PIT and Descartes

XWiki SAS is part of an European research project named STAMP. As part of this project I've been able to experiment a bit with Descartes, a mutation engine for PIT.

What PIT does is mutate the code under test and check if the existing test suite is able to detect those mutations. In other words, it checks the quality of your test suite.

Descartes plugs into PIT by providing a set of specific mutators. For example one mutator will replace the output of methods by some fixed value (for example a method returning a boolean will always return true). Another will remove the content of void methods. It then generates a report.

Here's an example of running Descartes on a module of XWiki:

report.png

You can see both the test coverage score (computed automatically by PIT using Jacoco) and the Mutation score. 

If we drill down to one class (MacroId.java) we can see for example the following report for the equals() method:

equals.png

What's interesting to note is that the test coverage says that the following code has been tested:

result =
   (getId() == macroId.getId() || (getId() != null && getId().equals(macroId.getId())))
   && (getSyntax() == macroId.getSyntax() || (getSyntax() != null && getSyntax().equals(
    macroId.getSyntax())));

However, the mutation testing is telling us a different story. It says that if you change the equals method code with negative conditions (i.e. testing for inequality), the test still reports success.

If we check the test code:

@Test
public void testEquality()
{
    MacroId id1 = new MacroId("id", Syntax.XWIKI_2_0);
    MacroId id2 = new MacroId("id", Syntax.XWIKI_2_0);
    MacroId id3 = new MacroId("otherid", Syntax.XWIKI_2_0);
    MacroId id4 = new MacroId("id", Syntax.XHTML_1_0);
    MacroId id5 = new MacroId("otherid", Syntax.XHTML_1_0);
    MacroId id6 = new MacroId("id");
    MacroId id7 = new MacroId("id");

    Assert.assertEquals(id2, id1);
   // Equal objects must have equal hashcode
   Assert.assertTrue(id1.hashCode() == id2.hashCode());

    Assert.assertFalse(id3 == id1);
    Assert.assertFalse(id4 == id1);
    Assert.assertFalse(id5 == id3);
    Assert.assertFalse(id6 == id1);

    Assert.assertEquals(id7, id6);
   // Equal objects must have equal hashcode
   Assert.assertTrue(id6.hashCode() == id7.hashCode());
}

We can indeed see that the test doesn't test for inequality. Thus in practice if we replace the equals method by return true; then the test still pass.

That's interesting because that's something that test coverage didn't notice!

More generally the report provides a summary of all mutations it has done and whether they were killed or not by the tests. For example on this class:

mutations.png

Here's what I learnt while trying to use Descartes on XWiki:

  • It's being actively developed
  • It's interesting to classify the results in 3 categories:
    • strong pseudo-tested methods: no matter the return values of a method, the tests still passes. This is the worst offender since it means the tests really needs to be improved. This was the case in the example above.
    • weak pseudo-tested methods: the tests passes with at least one modified value. Not as bad as strong pseudo-tested but you may want still want to check it out.
    • fully tested methods: the tests fail for all mutations and thus can be considered rock-solid!
  • So in the future, the generated report should provide this classification to help analyze the results and focus on important problems.
  • It would be nice if the Maven plugin was improved and be able to fail if the mutation score was below a certain threshold (as we do for test coverage).
  • Performance: It's quite slow compared to Jacoco execution time for example. In my example above it took 34 seconds to execute will all possible mutations (for a project with 14 test classes, 31 tests and 20 classes).
  • It would be nice to have a Sonar integration so that PIT/Descartes could provide some stats on the Sonar dashboard.
  • Big limitation: ATM there's a big limitation: PIT (and/or Descartes) doesn't support being executed on a multi-module project. This means that right now you need to compute the full classpath for all modules and run all sources and tests as if it was a single module. This causes problems for all tests that depend on the filesystem and expect a given directory structure. It's also tedious and a error-prone problem since the classpath order can have side effects.

Conclusion:

PIT/Descartes is very nice but I feel it would need to provide a bit more added-value out of the box for the XWiki open source project to use it in an automated manner. The test coverage report we have are already providing a lot of information about the code that is not tested at all and if we have 5 hours to spend, we would probably spend them on adding tests rather than improving further existing tests. YMMV. If you have a very strong suite of tests and you want to check its quality, then PIT/Descartes is your friend!

If Descartes could provide the build-failure-on-low-threshold feature mentioned above that could be one way we could integrate it in the XWiki build. But for that to be possible PIT/Descartes need to be able to run on multi-module Maven projects.

I'm also currently testing DSpot. DSpot uses PIT and Descartes but in addition it uses the results to generate new tests automatically. That would be even more interesting (if it can work well-enough). I'll post back when I've been able to run DSpot on XWiki and learn more by using it.

Now, the Descartes project could also use the information provided by line coverage to automatically generate tests to cover the spotted issues.

I'd like to thank Oscar Luis Vera Pérez who's actively working on Descartes and who's shown me how to use it and how to analyze the results. Thanks Oscar! I'll also continue to work with Oscar on improving Descartes and executing it on the XWiki code base. 

Sep 17 2017

Using Docker + Jenkins to test configurations

On the XWiki project, we currently have automated functional tests that use Selenium and Jenkins. However they exercise only a single configuration: HSQLDB, Jetty and FireFox (and all on a fixed version).

XWiki SAS is part of the STAMP research project and one domain of this research is improving configuration testing.

As a first step I've worked on providing official XWiki images but I've only provided 2 configurations (XWiki on Tomcat + Mysql and on Tomcat + PostgreSQL) and they're not currently exercised by our functional tests.

Thus I'm proposing below an architecture that should allow XWiki to be tested on various configurations:

architecture.png

Here's what I think it would mean in term of a Jenkins Pipeline (note that at this stage this is pseudo code and should not be understood literally):

pipeline {
  agent {
    docker {
      image 'xwiki-maven-firefox'
      args '-v $HOME/.m2:/root/.m2'
    }
  }
  stages {
    stage('Test') {
      steps {
        docker.image('mysql:5').withRun('-e "MYSQL_ROOT_PASSWORD=my-secret-pw"') { c ->                    
          docker.image('tomcat:8').withRun('-v $XWIKIDIR:/usr/local/tomcat/webapps/xwiki').inside("--link ${c.id}:db") {
            [...]
            wrap([$class: 'Xvnc']) {
              withMaven(maven: mavenTool, mavenOpts: mavenOpts) {
                [...]
                sh "mvn ..."
              }
            }
          }
        }
      }
    }
  }
}

Some explanations:

  • We would setup a custom Docker Registry so that we can prepare images that would be used by the Jenkins pipeline to create containers
  • Those images could themselves be refreshed regularly based on another pipeline that would use the docker.build() construct
  • We would use a Jenkins Agent dynamically provisioned from an image that would contain: sshd and a Jenkins user (so that Jenkins Master can communicate with it), Maven, VNC Server and a browser (FireFox for ex). We would have several such images, one per browser we want to test with.
    • Note that since we want to support only the latest browsers versions for FF/Chrome/Safari we could use apt to update (and commit) the browser version in the container prior to starting it, from the pipeline script.
  • Then the pipeline would spawn two containers: one for the DB and one for the Servlet container. Importantly for the Servlet container, I think we should mount a volume that points to a local directory on the agent, which would contain the XWiki exploded WAR (done as a pre-step by the Maven build). This would save time and not have to recreate a new image every time there's a commit on the XWiki codebase!
  • The build that contains the tests will be started by the Agent (and we would mount the the Maven local repository as a volume in order to sped up build times across runs). 
  • Right now the XWiki build already knows how to run the functional tests by fetching/exploding the XWiki WAR in the target directory and then starting XWiki directly from the tests, so all we would need to do is to make sure we map this directory in the container containing the Servlet container (e.g. in Tomcat it would be mapped to [TOMCATHOME]/webapps/xwiki).

This is just architecture at this stage. Now we need to put that in practice and find the gotchas (there always are emoticon_wink).

WDYT? Could this work? Are you doing this yourself?

Stay tuned, I should be able to report about it went in the coming weeks/months.

Created by Vincent Massol on 2008/12/20 13:47