Forty Six and Two: How to test DSLs (and: FsChecking FsCheck)

There are two causes for this post.

First, I’ve been watching videos from Lang.NET and DSL DevCon, so I’m in language-mode. It occurred to me that (domain specific) languages seem difficult to test using traditional unit tests: after all, the usage possibilities are far greater for a language than for, say, a typical user interface (maybe this means that typical user interfaces suck - you decide). The word “combinator library”, which are internal DSLs avant la lettre, says it all: the usage possibilities are combinatorial. Classic, example based unit testing thus becomes combinatorially expensive as well.

Second, I’ve been saying in the last release announcement of FsCheck that I wanted to write some FsChecks for FsCheck itself. So far, this effort is about 60% complete. As a sort of challenge, I want to check as much as possible from FsCheck without any traditional example-based unit testing – in other words, I want to test FsCheck using only FsCheck (and maybe some Pex and Code Contracts later on).

FsCheck itself consists of basically two internal DSLs: one for constructing random value generators for whatever value it is your tests need, and one for constructing properties: basically assertions augmented with property combinators for adding labels, classifying generated values and such. It’s the latter DSL that I’m going to test, and draw some general conclusions that may apply to testing other DSLs or even general purpose languages as well.

To see what we’re dealing with, here’s an example of using the property DSL:

let prop_InsertCombined (x:int) xs = 
    ordered xs ==> (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail"
        |> collect (List.length xs)
quickCheck prop_InsertCombined

This property asserts that the insert function maintains the invariant that the list is ordered. The classify and collect property combinators are used to gather some information about the generated data. So in this test, there are three property combinators at work: implies (==>), classify and collect. Also, the arguments to the property (x and xs) are implicitly universally quantified using the forAll property combinator. These and more combinators in FsCheck can be combined an arbitrary ways, so testing this using example-based testing can be quite tedious. Can FsCheck do better?

The property DSL’s syntax tree

First, we’re going to need to generate an arbitrary property, i.e. an arbitrary combination of property combinators. To do that, I made a symbolic representation of the property language – a sort of abstract syntax tree of the language. It’s no secret that discriminated unions are great for this, and this case is not different:

type SymProp =  | Unit | Bool of bool | Exception
                | ForAll of int * SymProp
                | Implies of bool * SymProp
                | Classify of bool * string * SymProp
                | Collect of int * SymProp

So, a property can consist of, respectively:

A function that returns the unit value;
A function that returns a bool value;
A function that throws an exception;
A function that uses the implies combinator, which needs two arguments: a boolean condition and another property;
A function that classifies the generated values, which needs three arguments: a boolean condition, the name to classify with if the condition is true and another property;
A function that collects values, which takes two arguments: the value to collect, and another property.

Note that I’ve made some simplifications here: in fact forAll takes a generator that generates values of an arbitrary type – but since we’re not testing generators here, I’ve assumed the forAll always generates a constant int value. Also, collect would normally collect a (function of a) generated value, but it’s here constrained to collecting a constant int value.

We can build an actual property from this symbolic representation:

let rec private toProperty prop =
        match prop with
        | Unit -> property ()
        | Bool b -> property b
        | Exception -> property (lazy (raise <| InvalidOperationException()))
        | ForAll (i,prop) -> forAll (constant i) (fun i -> toProperty prop)
        | Implies (b,prop) -> b ==> (toProperty prop)
        | Classify (b,stamp,prop) -> classify b stamp (toProperty prop)
        | Collect (i,prop) -> collect i (toProperty prop)

The simplifications are plainly visible here.

Defining the semantics of properties

Now, let’s define a specification of what a symbolic property means, in terms of what it should produce as Result. Result is an internal datatype used by FsCheck to track the result of one execution of a property. This is the gist of it:

type Outcome = 
    | Timeout of int
    | Exception of exn
    | False
    | True
    | Rejected

type Result = 
    {   Outcome     : Outcome
        Stamp       : list<string>
        Labels      : Set<string>
        Arguments   : list<obj> }

So, a result tracks the outcome of a property: timed out, raised exception, returned false or true, or value rejected. The latter means a generated value did not pass the condition of an implies combinator. Furthermore, it also tracks the generated arguments, the stamps applied using classify and collect, as well as labels which are not tested here.

And here is a formal, executable specification of a subset of the FsCheck property DSL:

let rec private determineResult prop =
        let addStamp stamp res = { res with Stamp = stamp :: res.Stamp }
        let addArgument arg res = { res with Arguments = arg :: res.Arguments }
        match prop with
        | Unit -> succeeded
        | Bool true -> succeeded
        | Bool false -> failed
        | Exception  -> exc <| InvalidOperationException()
        | ForAll (i,prop) -> determineResult prop |> addArgument i
        | Implies (true,prop) -> determineResult prop
        | Implies (false,_) -> rejected
        | Classify (true,stamp,prop) -> determineResult prop |> addStamp stamp
        | Classify (false,_,prop) -> determineResult prop
        | Collect (i,prop) -> determineResult prop |> addStamp (any_to_string i)

You can check this definition with the FsCheck documentation. For example, it shows that an assertion that returns unit or true is interpreted as succeeded, false as failed and so on. The succeeded function and friends are defined in FsCheck and just construct basic Result instances:

let private result =
  { Outcome     = Rejected
  ; Stamp       = []
  ; Labels       = Set.empty
  ; Arguments   = []
  }

let internal failed = { result with Outcome = False }

The test

Armed with these functions, we can write our test:

let SymProperty (symprop:SymProp) = 
        let expected = determineResult symprop 
        let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
        areSame expected actual
        |> label (sprintf "expected = %A - actual = %A" expected actual)
        |> collect symprop

Ignore the specifics of the “actual” symbol definition – it executes an actual property and extracts the Result from it. The gist of the property is the comparison between the actual Result of the execution of the property (that was built from the symbolic property representation), and compare that with our expected semantics. Just to see what kind of properties we’re testing, I’ve also added a collect combinator, as well as a label to see what’s wrong if the test should fail. Thanks to FsCheck’s built-in discriminated union generator, running this gives:

Property.SymProperty-Ok, passed 100 tests.
22% Exception.
12% Unit.
6% Bool true.
6% Bool false.
4% Classify (false,"",Unit).
2% Implies (false,Exception).
2% Classify (true,"",Unit).
2% Classify (true,"",Exception).
2% Classify (false,"",Exception).
1% Implies (true,Exception).
1% Implies (true,Collect (2,Bool true)).
1% Implies (true,Collect (0,Unit)).
1% Implies (true,Collect (0,Collect (0,Bool false))).
(and so on)

Generating more interesting properties

This leaves something to be desired: the depth of combinations isn’t very high, and almost half of the generated properties are the trivial bool, unit or exception cases. One way of increasing this depth is by increasing the size of the generator, but that didn’t work very well: it mostly increased the length of the generated strings, without increasing the depth too much. The reason is that three of the seven options are “leaf” nodes, where generation ends. So I wrote my own generator for the SymProp type:

let rec private symPropGen =
        let rec recGen size =
            match size with
            | 0 -> oneof [constant Unit; liftGen (Bool) arbitrary; constant Exception]
            | n when n>0 ->
                let subProp = recGen (size/2)
                oneof   [ liftGen2 (curry ForAll) arbitrary (subProp)
                        ; liftGen2 (curry Implies) arbitrary (subProp)
                        ; liftGen2 (curry Collect) arbitrary (subProp)
                        ; liftGen3 (fun b s p -> Classify (b,s,p)) arbitrary arbitrary (subProp)]
            | _ -> failwith "symPropGen: size must be positive"
        sized recGen

If you’re somewhat familiar with FsCheck’s generator combinators, this shouldn’t be hard to read. The idea is that we only generate one of the leaf nodes if the size equals zero, and halve the size if we generate a “real” property combinator to ensure the generation ends at some point. Let’s change the property to incorporate this generator:

let Property = 
        forAllShrink symPropGen shrink (fun symprop ->
            let expected = determineResult symprop 
            let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
            areSame expected actual
            |> label (sprintf "expected = %A - actual = %A" expected actual)
            |> collect (depth symprop)

I added a depth function that calculates the depth of the generated symbolic property (i.e. depth of Unit, Bool and Exception is zero, and the rest adds one to the depth of it’s subproperty). Running this gives:

Property.get_Property-Ok, passed 100 tests.
26% 5.
26% 4.
18% 3.
12% 2.
7% 6.
6% 0.
4% 1.

which shows that FsCheck is now generating more properties with greater depth (4-5), and trivial properties only 6% of the time.

Conclusion

This post showed an approach for testing DSLs, combinator libraries or “language-oriented” programs, whatever you’d like to call it:

Write a symbolic representation of the abstract syntax tree of the language you want to test. This symbolic representation is used to abstract from details you don’t want to test, to define semantics of the language later on, and it to generate output for counter-examples that FsCheck may find.
Generate the actual representation in your compiler or interpreter from the symbolic representation.
Specify the semantics of the symbolic representation.
Rely on FsCheck’s built-in generators, or make your own generator to generate the symbolic representation.
Write an FsCheck property to compare the actual with the expected semantics.

This approach may even be usable for testing compilers of general purpose programming languages. The effort involved will be much larger, but it’s easy to start small.

Want more of this? Also check out the Checks.fs file in the FsCheck distribution for a complete test of the FsCheck property language.

Share this post :

Technet!