23 February 2009

FsChecking dnAnalytics

I've been thinking lately what I can do to make FsCheck more widely used. Whenever I write "regular" unit tests, I feel like I'm back in the stone ages. It just feels so clumsy and tedious. Why don't more people see this? Is it because there's a learning curve? Surely that's part of it, but the benefit is so huge that this can't be the whole story. I came across this question on stackoverflow, and read a lot of misunderstandings about random testing (luckily the chosen answer was well-informed, and even mentions FsCheck). I could respond to each of those, but I'll keep that to a later post; instead, I'm resolved to convert the world to FsCheck, even if I have to do it one project at a time ;)

Today's candidate: dnAnalytics. dnAnalytics makes a good candidate for FsChecking because  it is in the mathematical field - finding properties for the functionality to satisfy should be straightforward: there's millennia of mathematical knowledge to choose from. Secondly, dnAnalytics already has some tests that I can trash later. Also, dnAnalytics  has an F# interface which I won't be using in this post, but at least the maintainers are familiar with F#, so I have some hope of "converting" them. Finally, dnAnalytics has quite a few downloads (over a 1000), so hopefully this can raise visibility of FsCheck outside the F# community.

Without further ado, let's go find bugs!  (I'll give it away now to keep you interested, as this has become a long post: I found a bug...read on.)

For my first experiment, I choose to test the Complex class, which represents a complex number a+bi, along with some operations. Fairly straightforward stuff that even a mathematically challenged person like myself can follow.

For education and amusement, I'll give an overview of how the tests I wrote revolved over time - errors and imperfections included.

The complex number generator

To test a type, typically the first step to take when using FsCheck is writing a generator for a type. In this case, we'll just be using a generator for a tuple of two floats, and map that to a complex number:

type Generators =
    static member Complex() =
        { new Arbitrary<Complex>() with
            override x.Arbitrary = two arbitrary |> fmapGen ( fun (a,b) -> new Complex(a,b))
            override x.Shrink (C (r,i)) = 
                shrink (r,i)  
                |> Seq.map (fun (r,i) -> new Complex(r,i))
        }
registerGenerators<Generators>()

Pretty easy. The shrink function also exploits the relation between a complex number and a pair of floats. In case you're wondering, I added an active pattern C to deal with the Complex class, it's just:

let (|C|) (c:Complex) = (c.Real, c.Imaginary)

The Absolute of a complex number

I started out with testing the Complex type's Absolute method. It's supposed to return the Absolute value of the Complex instance it's applied to. Here's the property I wrote:

let prop_Absolute (C (r,i) as c) = 
    let lhs = c.Absolute
    let rhs = Math.Sqrt( r*r + i*i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @| (lhs = rhs)

Basically this just checks that the outcome of the Absolute method is equal to the mathematical definition of the absolute value of a complex number. The sprintf and the label operator @| are there to display the intermediate values should the property fail. And failing it does:

Absolute-Falsifiable, after 6 tests (1 shrink):
Label of failing property: lhs=NaN (Niet-een-getal), rhs=NaN (Niet-een-getal)
NaN

Classic mistake: NaN is a special case; NaN is never equal to NaN. That's easily solved:

let prop_Absolute (C (r,i) as c) = 
    let lhs = c.Absolute
    let rhs = Math.Sqrt( r*r + i*i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @|
    (if Complex.IsNaN(c) then Double.IsNaN(lhs) else lhs = rhs)

produces

Absolute-Falsifiable, after 10 tests (2 shrinks):
Label of failing property: lhs=7,00446286306095, rhs=7,00446286306095
7 + 0,25i

Hmm. Instead of looking up how I could see all  of a float's significant digits, I just assumed a rounding error. I explored dnAnalytics existing tests and found just the thing to deal with that: a method to test equality taking into account a relative error. Using that method in the property results in:

let prop_Absolute (C (r,i) as c) = 
    let lhs = c.Absolute
    let rhs = Math.Sqrt( r*r + i*i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @|
    (   if Complex.IsNaN(c) then Double.IsNaN(lhs) 
        else TestHelper.TestRelativeError(lhs, rhs, 2e-16);true)

And yes:

Absolute-Ok, passed 100 tests.

Notice that FsCheck works nicely with NUnit here; suppose we introduce a "bug" by adding 1 to the right hand side:

let prop_Absolute (C (r,i) as c) = 
    let lhs = c.Absolute
    let rhs = Math.Sqrt( r*r + i*i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @|
    (   if Complex.IsNaN(c) then Double.IsNaN(lhs) 
        else TestHelper.TestRelativeError(lhs, rhs+1.0, 2e-16);true)

produces:

Absolute-Falsifiable, after 1 test (0 shrinks):
0 + 0i
with exception:
NUnit.Framework.AssertionException:   Expected: less than 2E-16.0d
  But was:  1.0d

   at NUnit.Framework.Assert.That(Object actual, Constraint constraint, String message, Object[] args)
   at NUnit.Framework.Assert.Less(Double arg1, Double arg2, String message, Object[] args)
   at NUnit.Framework.Assert.Less(Double arg1, Double arg2)
   at dnAnalytics.Tests.TestHelper.TestRelativeError(Double expected, Double approx, Double acceptableError) in c:\Documents and Settings\Kurt\My Documents\dnAnalytics\0.3\src\dnAnalytics.Tests\TestHelper.cs:line 53
   at Complex.prop_Absolute(Complex _arg1) in C:\Documents and Settings\Kurt\MyDocuments\dnAnalytics\0.3\src\dnAnalytics.FsCheck\Complex.fs:line 32
   at FsCheck.Property.evaluate[T,U](FastFunc`2 body, T a) in C:\Documents and Settings\Kurt\My Documents\FsCheck\FsCheck\Property.fs:line 162

But wait! Why aren't our labels displayed? We've found a bug...in FsCheck :) Hold on, I didn't con you earlier, I really did find a bug in dnAnalytics as well.

People can get a bit nervous now because they're not actually seeing what values FsCheck is generating. Let's find out:

let prop_Absolute (C (r,i) as c) = 
    let lhs = c.Absolute
    let rhs = Math.Sqrt( r*r + i*i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @|
    if Complex.IsNaN(c) then Double.IsNaN(lhs) 
    else TestHelper.TestRelativeError(lhs, rhs, 2e-16);true
    |> classify (Complex.IsNaN(c)) "NaN"
    |> classify (Complex.IsInfinity(c)) "Infinity"
    |> classify (c = Complex.Zero) "Zero"
    |> classify (c = Complex.One) "One"

Absolute-Ok, passed 100 tests.
17% Infinity.
8% NaN.
2% Zero.

As you can see, using the classify combinator you can make FsCheck print out the ratio of test cases that fulfill a certain criterion. Here we learn that One is never generated; and infinity quite a bit. This is due to the fact that the built in generator for floats generates these special values with preference. We can change this behavior by changing the generator. Suppose we'd like to generate the value One also:

override x.Arbitrary = 
  frequency   [ (98,two arbitrary |> fmapGen ( fun (a,b) -> new Complex(a,b)))
              ; (2, constant Complex.One) ]

Absolute-Ok, passed 100 tests.
10% Infinity.
9% NaN.
4% Zero.
2% One.
1% Infinity, NaN.

Easy enough. Our generator now indeed generates One as well.

But hold on: we see also that a Complex number can be both Infinity and NaN. That doens't make sense. Let's write a property to check this:

let prop_NaNInfinity (c:Complex) =
    not ( Complex.IsInfinity(c) &&  Complex.IsNaN(c))
checkName  "Both NaN and Infinity" { quick with MaxTest = 1000} prop_NaNInfinity 

Note that I didn't use the usual quickCheckN function to run the tests, because the Absolute test indicated that only one test in a hundred exhibited the behavior. So I made FsCheck run this test a bit more, 1000 times to be exact. Running this sure enough produces:

Both NaN and Infinity-Falsifiable, after 418 tests (0 shrinks):
NaN

and this find was confirmed as a bug by the dnAnalytics team. A small victory for FsCheck.

The conjugate of a complex number

Let's do one more: finding the conjugate.

let prop_Conjugate (C (r,i) as c) =
    let lhs = c.Conjugate
    let rhs = new Complex(r,-i)
    sprintf "lhs=%O, rhs=%O" lhs rhs @|
    if Complex.IsNaN(c) then Complex.IsNaN(lhs) 
    else lhs = rhs

Since the Absolute property, I've become a bit wiser and factored in the possibility of NaN from the start. Running this gives:

Conjugate-Ok, passed 100 tests.

And all is well. Except one thing: our tests like a bit ugly, because we had to duplicate some code.

Red, green, refactor!

We're going to refactor two things.

First, all the classify's we've added to the Absolute property are actually common to every property where we use our Complex generator. These kinds of "tests" are not uncommon when writing a new FsCheck generator - for example, it ensures that our generator does not throw an exception when generating values (which can happen when certain objects are constructed). I've taken the habit of separating these kinds of tests into a single separate property:

let prop_ComplexGen c = 
    ()
    |> classify (Complex.IsNaN(c)) "NaN"
    |> classify (Complex.IsInfinity(c)) "Infinity"
    |> classify (c = Complex.Zero) "Zero"
    |> classify (c = Complex.One) "One"

And we leave these classify's out of the other properties. (Note that a property that returns unit or true is interpreted as succeeded by FsCheck. An exception or false indicates failure.)

Then, we add the following helper method to abstract out the labeling of left and right hand side; cleaning it up in the process:

let compare expected actual prop = 
      sprintf "expected=%O, actual=%O" expected actual @| (prop expected actual)

Now our two properties can be written:

let prop_Absolute (C (r,i) as c) = 
    compare (Math.Sqrt(r*r + i*i)) c.Absolute (fun expected actual ->
        if Complex.IsNaN(c) then Double.IsNaN(actual) 
        else TestHelper.TestRelativeError(expected, actual, 2e-16);true)
let prop_Conjugate (C (r,i) as c) =
    compare (Complex(r,-i)) c.Conjugate (fun expected actual ->
        if Complex.IsNaN(c) then Complex.IsNaN(actual) 
        else expected = actual)

A successful experiment

In my eyes, the FsCheck based tests are hugely superior to the original tests, for the following reasons.

First, we've replaced 2 x 100 hand-written tests with presumably manually calculated values in dnAnalytics with just a few lines of code.  An excerpt from the original tests:

[Test]
public void Absolute()
{
  TestHelper.TestRelativeError(ComplexMath.Absolute(new Complex(0.0, 1.19209289550780998537e-7)), 1.19209289550780998537e-7, 2e-016);
  TestHelper.TestRelativeError(ComplexMath.Absolute(new Complex(0.0, -1.19209289550780998537e-7)), 1.19209289550780998537e-7, 2e-016);
  TestHelper.TestRelativeError(ComplexMath.Absolute(new Complex(0.0, 5.0e-1)), 5.0e-1, 2e-016);
  TestHelper.TestRelativeError(ComplexMath.Absolute(new Complex(0.0, -5.0e-1)), 5.0e-1, 2e-016);

(Note that these are actually tests for a static method on ComplexMath, but Complex.Absolute calls this method directly without further ado. In any case we could easily rewrite our properties to call this method directly as well.)

These must've been a pain to write. Probably someone generated a little script to apply the definition of Absolute in each of these cases. That should be the work of a computer! Using FsCheck, it is.

Second, the original tests do not reveal the intent of the Absolute or Conjugate methods. Basically you just see a bunch of values going in, and the expected values coming out. In a normal program, you would call these "magic numbers" and call the developer that wrote them names. In unit tests, this is commonly tolerated.

FsCheck's specification on the other hand reveals the intent of the tested methods directly - in fact, I just looked up the mathematical definition of these operators to come up with the properties, and this definition is still readily apparent.

Third, FsCheck forced us to make the specification complete, and factor in NaN values. I could not find any test using NaN in the original dnAnalytics tests. This led directly to the discovery of a previously unknown bug.

In conclusion; FsCheck's tests are shorter, clearer and more complete than the original tests.

To boot, I dare say they are faster to write: I downloaded dnAnalytics, explored the code, choose a type to test, wrote the above properties, reported the bug, and typed in the bulk of this blog post in the course of about 4 hours yesterday. I spent another hour or two today cleaning up the post itself.

2 comments:

  1. There is a reason to do both, random tests AND hard coded magic values tests. Assume that your processor has a bug*, FsCheck prop_ calculates simple functions the same way as the library you want to prof, the result is exactly the same but incorrect*.
    And there are more difficult functions (without inverse functions) where to write a prop_ is not so easy.
    That leads to the idea to compare the results of random testing of different libraries...

    ReplyDelete
  2. Wirbel,

    In some cases example-based unit tests may be your only option. However, I would not call that a reason to have them - rather a problem that still needs to be solved. The first resort should imo be generative tests (random, or pairwise, or whatever), not unit tests.

    I'm not sure what you mean when you say the processor has a bug. That seems a very very specific scenario that does fit into FsCheck's scope (nor unit testing frameworks). Since every software framework is software itself, it is implicitly assumed that all software except the software under test will work as advertised.

    If writing a property is not easy, I would dare say that there is an opportunity for refactoring and re-thinking. Same thing as when a unit test is difficult to write. There are many different kinds of properties to write, a function does not necessarily need an inverse.

    You give an excellent example: comparing the results of various libraries using FsCheck is indeed one source of interesting FsCheck properties.

    ReplyDelete