23 May 2009

Announcing FsCheck 0.6.1

No functional changes, just an update as a result of the May CTP released by the F# team.

Get it here.

Change log:

  • Many name changes as a result of name changes in the F# API
  • Used the new formatting API, which gets rid of a bunch of obsolete warnings
  • Tried to solve issues resulting from changed initialization order
  • Added a standalone version of the FsCheck binary, to accommodate C#/VB users better
  • Added a contributors file. Want your name in there? Then contribute ;)

Enjoy the new CTP!

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

12 May 2009

How to test DSLs (and: FsChecking FsCheck)

There are two causes for this post.

First, I’ve been watching videos from Lang.NET and DSL DevCon, so I’m in language-mode. It occurred to me that (domain specific) languages seem difficult to test using traditional unit tests: after all, the usage possibilities are far greater for a language than for, say, a typical user interface (maybe this means that typical user interfaces suck - you decide). The word “combinator library”, which are internal DSLs avant la lettre, says it all: the usage possibilities are combinatorial. Classic, example based unit testing thus becomes combinatorially expensive as well.

Second, I’ve been saying in the last release announcement of FsCheck that I wanted to write some FsChecks for FsCheck itself. So far, this effort is about 60% complete. As a sort of challenge, I want to check as much as possible from FsCheck without any traditional example-based unit testing – in other words, I want to test FsCheck using only FsCheck (and maybe some Pex and Code Contracts later on).

FsCheck itself consists of basically two internal DSLs: one for constructing random value generators for whatever value it is your tests need, and one for constructing properties: basically assertions augmented with property combinators for adding labels, classifying generated values and such. It’s the latter DSL that I’m going to test, and draw some general conclusions that may apply to testing other DSLs or even general purpose languages as well.

To see what we’re dealing with, here’s an example of using the property DSL:

let prop_InsertCombined (x:int) xs = 
    ordered xs ==> (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail"
        |> collect (List.length xs)
quickCheck prop_InsertCombined

This property asserts that the insert function maintains the invariant that the list is ordered. The classify and collect property combinators are used to gather some information about the generated data. So in this test, there are three property combinators at work: implies (==>), classify and collect. Also, the arguments to the property (x and xs) are implicitly universally quantified using the forAll property combinator. These and more combinators in FsCheck can be combined an arbitrary ways, so testing this using example-based testing can be quite tedious. Can FsCheck do better?

The property DSL’s syntax tree

First, we’re going to need to generate an arbitrary property, i.e. an arbitrary combination of property combinators. To do that, I made a symbolic representation of the property language – a sort of abstract syntax tree of the language. It’s no secret that discriminated unions are great for this, and this case is not different:

type SymProp =  | Unit | Bool of bool | Exception
                | ForAll of int * SymProp
                | Implies of bool * SymProp
                | Classify of bool * string * SymProp
                | Collect of int * SymProp

So, a property can consist of, respectively:

  • A function that returns the unit value;
  • A function that returns a bool value;
  • A function that throws an exception;
  • A function that uses the implies combinator, which needs two arguments: a boolean condition and another property;
  • A function that classifies the generated values, which needs three arguments: a boolean condition, the name to classify with if the condition is true and another property;
  • A function that collects values, which takes two arguments: the value to collect, and another property.

Note that I’ve made some simplifications here: in fact forAll takes a generator that generates values of an arbitrary type – but since we’re not testing generators here, I’ve assumed the forAll always generates a constant int value. Also, collect would normally collect a (function of a) generated value, but it’s here constrained to collecting a constant int value.

We can build an actual property from this symbolic representation:

let rec private toProperty prop =
        match prop with
        | Unit -> property ()
        | Bool b -> property b
        | Exception -> property (lazy (raise <| InvalidOperationException()))
        | ForAll (i,prop) -> forAll (constant i) (fun i -> toProperty prop)
        | Implies (b,prop) -> b ==> (toProperty prop)
        | Classify (b,stamp,prop) -> classify b stamp (toProperty prop)
        | Collect (i,prop) -> collect i (toProperty prop)

The simplifications are plainly visible here.

Defining the semantics of properties

Now, let’s define a specification of what a symbolic property means, in terms of what it should produce as Result. Result is an internal datatype used by FsCheck to track the result of one execution of a property. This is the gist of it:

type Outcome = 
    | Timeout of int
    | Exception of exn
    | False
    | True
    | Rejected

type Result = 
    {   Outcome     : Outcome
        Stamp       : list<string>
        Labels      : Set<string>
        Arguments   : list<obj> }

So, a result tracks the outcome of a property: timed out, raised exception, returned false or true, or value rejected. The latter means a generated value did not pass the condition of an implies combinator. Furthermore, it also tracks the generated arguments, the stamps applied using classify and collect, as well as labels which are not tested here.

And here is a formal, executable specification of a subset of the FsCheck property DSL:

let rec private determineResult prop =
        let addStamp stamp res = { res with Stamp = stamp :: res.Stamp }
        let addArgument arg res = { res with Arguments = arg :: res.Arguments }
        match prop with
        | Unit -> succeeded
        | Bool true -> succeeded
        | Bool false -> failed
        | Exception  -> exc <| InvalidOperationException()
        | ForAll (i,prop) -> determineResult prop |> addArgument i
        | Implies (true,prop) -> determineResult prop
        | Implies (false,_) -> rejected
        | Classify (true,stamp,prop) -> determineResult prop |> addStamp stamp
        | Classify (false,_,prop) -> determineResult prop
        | Collect (i,prop) -> determineResult prop |> addStamp (any_to_string i)

You can check this definition with the FsCheck documentation. For example, it shows that an assertion that returns unit or true is interpreted as succeeded, false as failed and so on. The succeeded function and friends are defined in FsCheck and just construct basic Result instances:

let private result =
  { Outcome     = Rejected
  ; Stamp       = []
  ; Labels       = Set.empty
  ; Arguments   = []
  }

let internal failed = { result with Outcome = False }

The test

Armed with these functions, we can write our test:

let SymProperty (symprop:SymProp) = 
        let expected = determineResult symprop 
        let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
        areSame expected actual
        |> label (sprintf "expected = %A - actual = %A" expected actual)
        |> collect symprop

Ignore the specifics of the “actual” symbol definition – it executes an actual property and extracts the Result from it. The gist of the property is the comparison between the actual Result of the execution of the property (that was built from the symbolic property representation), and compare that with our expected semantics. Just to see what kind of properties we’re testing, I’ve also added a collect combinator, as well as a label to see what’s wrong if the test should fail. Thanks to FsCheck’s built-in discriminated union generator, running this gives:

Property.SymProperty-Ok, passed 100 tests.
22% Exception.
12% Unit.
6% Bool true.
6% Bool false.
4% Classify (false,"",Unit).
2% Implies (false,Exception).
2% Classify (true,"",Unit).
2% Classify (true,"",Exception).
2% Classify (false,"",Exception).
1% Implies (true,Exception).
1% Implies (true,Collect (2,Bool true)).
1% Implies (true,Collect (0,Unit)).
1% Implies (true,Collect (0,Collect (0,Bool false))).
(and so on)

Generating more interesting properties

This leaves something to be desired: the depth of combinations isn’t very high, and almost half of the generated properties are the trivial bool, unit or exception cases. One way of increasing this depth is by increasing the size of the generator, but that didn’t work very well: it mostly increased the length of the generated strings, without increasing the depth too much. The reason is that three of the seven options are “leaf” nodes, where generation ends. So I wrote my own generator for the SymProp type:

let rec private symPropGen =
        let rec recGen size =
            match size with
            | 0 -> oneof [constant Unit; liftGen (Bool) arbitrary; constant Exception]
            | n when n>0 ->
                let subProp = recGen (size/2)
                oneof   [ liftGen2 (curry ForAll) arbitrary (subProp)
                        ; liftGen2 (curry Implies) arbitrary (subProp)
                        ; liftGen2 (curry Collect) arbitrary (subProp)
                        ; liftGen3 (fun b s p -> Classify (b,s,p)) arbitrary arbitrary (subProp)]
            | _ -> failwith "symPropGen: size must be positive"
        sized recGen

If you’re somewhat familiar with FsCheck’s generator combinators, this shouldn’t be hard to read. The idea is that we only generate one of the leaf nodes if the size equals zero, and halve the size if we generate a “real” property combinator to ensure the generation ends at some point. Let’s change the property to incorporate this generator:

let Property = 
        forAllShrink symPropGen shrink (fun symprop ->
            let expected = determineResult symprop 
            let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
            areSame expected actual
            |> label (sprintf "expected = %A - actual = %A" expected actual)
            |> collect (depth symprop)

I added a depth function that calculates the depth of the generated symbolic property (i.e. depth of Unit, Bool and Exception is zero, and the rest adds one to the depth of it’s subproperty). Running this gives:

Property.get_Property-Ok, passed 100 tests.
26% 5.
26% 4.
18% 3.
12% 2.
7% 6.
6% 0.
4% 1.

which shows that FsCheck is now generating more properties with greater depth (4-5), and trivial properties only 6% of the time.

Conclusion

This post showed an approach for testing DSLs, combinator libraries or “language-oriented” programs, whatever you’d like to call it:

  1. Write a symbolic representation of the abstract syntax tree of the language you want to test. This symbolic representation is used to abstract from details you don’t want to test, to define semantics of the language later on, and it to generate output for counter-examples that FsCheck may find.
  2. Generate the actual representation in your compiler or interpreter from the symbolic representation.
  3. Specify the semantics of the symbolic representation.
  4. Rely on FsCheck’s built-in generators, or make your own generator to generate the symbolic representation.
  5. Write an FsCheck property to compare the actual with the expected semantics.

This approach may even be usable for testing compilers of general purpose programming languages. The effort involved will be much larger, but it’s easy to start small.

Want more of this? Also check out the Checks.fs file in the FsCheck distribution for a complete test of the FsCheck property language.

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

Announcing FsCheck 0.6: Dot is the new pipe

It is about time to release a new version of FsCheck, and here it is.

What’s new in this release?

  • A fluent interface for C# and VB

This is in prototype stage at the moment, and has some limitations with respect to the F# interface, but I would say it’s about 90% functional. An example of a property in C#:

Spec.ForAny<int, int[]>((x, xs) => xs.Insert(x).IsOrdered())
                .When((x, xs) => xs.IsOrdered())
                .Classify((x, xs) => new int[] { x }.Concat(xs).IsOrdered(), "at-head")
                .Classify((x, xs) => xs.Concat(new int[] { x }).IsOrdered(), "at-tail")
                .QuickCheck("InsertClassify");

This property is the equivalent of the following in F#:

let prop_InsertClassify (x:int) xs = 
    ordered xs ==> (ordered (insert x xs))
    |> classify (ordered (x::xs)) "at-head"
    |> classify (ordered (xs @ [x])) "at-tail" 
quickCheck prop_InsertClassify

As you can see, it’s about the same amount of code, but because of the fluent interface I have to add a class for each number of type arguments - currently limited to three. Ideally, I’d like to generate some code for classes up to six or nine, but for the moment it’s all copy-paste and you can only write properties with up to three parameters this way. For the rest, almost everything should be working: you can register generators, provide configuration, add shrinkers, and write generators using some combinators and LINQ:

var gen = from x in Any.OfType<int>()
          from y in Any.IntBetween(5, 10)
          where x > 5
          select new { Fst = x, Snd = y };

Pretty neat.

Incidentally, there is no C# code in FsCheck itself, and the fluent interface is all pure F# code. This was a good test to see how interoperable F# really is. Overall, the experience has been great, with just a few minor hiccups.

I translated most of the F# examples from the documentation to C# as well, you can find those in the F# source distribution.

  • Possibility to replay a test run. FsCheck now displays the seed it used to start a test run when it fails. You can give this seed to a configuration parameter, so that FsCheck will repeat the run with exactly the same generated values:

check { quick with Name="Counter-replay"; Replay = Some <| Random.StdGen (395461793,1) } (asProperty spec)
  • Various bug fixes and updates to xml docs.

  • I’m making headway in testing FsCheck using FsCheck itself. It’s an interesting experience, and I have a blog post lined up about it.

The C#/VB “mainstream” support is something that I hope to continue to improve – I also plan to add reflective object generators and Gallio integration, mainly as a means to get some more attention for FsCheck. It’s also an interesting exercise to compare approaches. So far, if you know F#, I’d stick with it, but you can see that LINQ is occasionally very elegant for defining generators.

Happy FsChecking!

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

 

  

06 May 2009

Tests are not specifications

There seems to be some consensus in the TDD and BDD community that the tests (or behaviors) for a given software system constitute an (executable) specification of the system.

Wrong.

A specification is an unambiguous and complete description of a system or subsystem. Or at least it should be a heroic attempt.

A test, whether written as a traditional  unit test, or as an executable scenario in BDD is neither unambiguous nor complete. Worse, it’s not even trying.

A test is almost always written as one specific example of how a subsystem should behave. An implementation that can fail on other examples will pass the test, and so still fulfill the non-specification. Furthermore, there are many implementations with wildly differing behavior that can pass the test.Doesn't sound like a specification to me, or at least not a good one.

In fact, if TDD/BDD advocates would really follow their advice to program no more than what the tests test, then given the following test:

Assert.Equal (Reverse [1,2,3] , [3,2,1])

they should implement the pseudo function:

Reverse list = [3,2,1]

Compare with what FsCheck's properties look like (again, in pseudo-code):

Reverse [x] = [x]

Reverse (x::xs) = Reverse xs @ [x]

That's what I call a specification: unambiguous, and complete. So TDD talks the talk, but does not walk the walk.

Before you start commenting, foam at the mouth, I’m not saying testing is bad – on the contrary. It’s just that you should stop kidding yourself by calling tests specifications. It’s like saying that “8” is an algorithm for adding two numbers, because it happens to work for 3 and 5. 

Using frameworks and tools such as Code Contracts, Pex and FsCheck (I’m being completely unbiased here, of course), you will take a big step towards writing real specifications, not tests. That’s where their real value is.

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!