02 May 2008

FsCheck: Random testing for F#

[Updated for version 0.2]

[Update: this documentation is no longer kept up to date as of FsCheck 0.3. Please check out FsCheck on GitHub for up to date releases, information and what not.]

FsCheck is an as-literal-as-possible port of Haskell's QuickCheck 0.1. Using FsCheck, you define properties that should hold for all your functions. FsCheck checks that those properties hold by generating random values for the parameters of the properties. While writing the properties, you are actually writing a testable specification of your program.

This release is still a bit rough around the edges; basic things work, but there's for example no easy way to check all properties in a given file. Also parts of FsCheck itself are still untested (ah, the irony). I wrote FsCheck mainly as an exercise  - learn from the best, or suck like the rest, so Haskell seems like a good place to start.

The next part of this post is  a manual for using FsCheck; most of it is almost literally "adapted" from the QuickCheck manual and paper. Any errors and omissions are entirely my responsibility.

What is FsCheck?

FsCheck is a tool for testing F# programs automatically. The programmer provides a specification of the program, in the form of properties which functions should satisfy, and FsCheck then tests that the properties hold in a large number of randomly generated cases. Specifications are expressed in F#, using combinators defined in the FsCheck library. FsCheck provides combinators to define properties, observe the distribution of test data, and define test data generators.

FsCheck is currently best used to test the functions in your F# programs. I may be experimenting in the future with using it to test classes and methods, in combination with existing .NET unit testing frameworks. The generator combinators can be used already in any testing framework to easily generate a number of random values for many types. Mostly these are F# specific at this point (tuples, option values, function values), but that should be easy to remedy.

A Simple Example

A simple example of a property definition is

let prop_RevRev (xs:list<int>) = List.rev (List.rev xs) = xs  |> propl

This property asserts that the reverse of the reverse of a list of integers is the list itself. (the propl function comes with FsCheck and is explained shortly) To check the property, we load this definition in F# interactive and then invoke

> qcheck (Gen.List(Gen.Int)) prop_RevRev;;
Ok, passed 100 tests.
val it : unit = ()

As the example shows, you indicate to FsCheck the type to generate for the property (in this case, a list of ints).

When a property fails, FsCheck displays a counter-example. For example, if we define

let prop_RevId xs = List.rev xs = xs |> propl

then checking it results in

> qcheck (Gen.List(Gen.Int)) prop_RevId;;
Falsifiable, after 8 tests:
["[-2; -4; -1]"]
val it : unit = ()

Using FsCheck

To use FsCheck, you download the latest FsCheck solution at the very end of this post. Build and reference the assembly in any projects containing specifications or test data generators. You can then test properties by loading the module they are defined in into F# interactive, and calling
qcheck <generator> <property_name>;;
or by running and writing a small console application that calls the qcheck function.

What do I do if a test loops or encounters an error?

In this case we know that the property does not hold, but qcheck does not display the counter-example. There is another testing function provided for this situation. Repeat the test using

vcheck <generator> <property_name>;;

which displays each test case before running the test: the last test case displayed is thus the one in which the loop or error arises.

Properties

Properties are expressed as F# function definitions, with names beginning with prop_. Properties can be defined together with their generators, by universal quantification over their parameters using the function forAll, which takes the form

forAll <generator> (fun arg -> <property>)

The first argument of forAll is a test data generator. The second argument is a function that takes as argument one value generated by the test data generator, and returns the property that needs to be checked. The values generated by the test data generator must be of the correct type for the argument, which is checked at compile time. For example,

let prop_RevRev = 
    forAll (Gen.List(Gen.Int)) (fun xs -> 
        List.rev (List.rev xs) = xs |> propl)

will generate a number of int lists for the argument xs, and evaluate the function using the generated value.

The core of a property is an assertion function that returns a bool. Such a function should be made into a Property type by calling the function propl. This allows you to use property combinators explained below.

`Polymorphic' properties, such as the one above, are restricted to a particular type to be used for testing, which is done by specifying a generator. The anonymous function given to forAll can take only one argument: if a property needs more, they should be tupled using Gen.Tuple.

Note: thanks to Haskell's type classes, QuickCheck does not force you to use forAll or needs you to use prop. I do realize it makes the properties in F# harder to read - at the moment I see no elegant ways around this. But in FsCheck you can separate the generators from the properties using qcheck and vcheck. In the examples you can find examples of both styles. For clarity, I'll use the forAll function in the following.

Conditional Properties

Properties may take the form

<condition> ==> <property>

For example,

let rec ordered xs = match xs with
                     | [] -> true
                     | [x] -> true
                     | x::y::ys ->  (x <= y) && ordered (y::ys)
let rec insert x xs = match xs with
                      | [] -> [x]
                      | c::cs -> if x<=c then x::xs else c::(insert x cs)
let prop_Insert = forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
                    ordered xs ==> propl (ordered (insert x xs)))

Such a property holds if the property after ==> holds whenever the condition does.

Testing discards test cases which do not satisfy the condition. Test case generation continues until 100 cases which do satisfy the condition have been found, or until an overall limit on the number of test cases is reached (to avoid looping if the condition never holds). In this case a message such as
Arguments exhausted after 97 tests.
indicates that 97 test cases satisfying the condition were found, and that the property held in those 97 cases.

Since F# has eager evaluation by default, the above property does more work than necessary: it evaluates the property at the right of the condition no matter what the outcome of the condition on the left. While only a performance consideration in the above example, this may limit the expressiveness of properties as well - consider:

let prop_CheckLazy2 = forAll (Gen.Int) (fun a -> a <> 0 ==> propl (1/a = 1/a)))

Will throw an exception. However, FsCheck also provides the function prop which makes a lazy property, and so requires the F# keyword lazy to force the lazy evaluation of the condition:

let prop_CheckLazy2 = forAll (Gen.Int) (fun a -> a <> 0 ==> prop (lazy (1/a = 1/a)))

Works as expected. propl is actually a shorthand for prop (lazy ...)), but beware: the argument of propl is evaluated eagerly!

Generators

There are number of generators for some common types, but by supplying a custom generator, instead of using the default generator for that type, it is possible to control the distribution of test data. In the following example, by supplying a custom generator for ordered lists, rather than filtering out test cases which are not ordered, we guarantee that 100 test cases can be generated without reaching the overall limit on test cases:

let prop_Insert = forAll (Gen.Tuple(Gen.Int, orderedList(Gen.Int))) (fun (x,xs) -> 
                    propl (ordered (insert x xs)))

Combinators for defining generators are described below.

Observing Test Case Distribution

It is important to be aware of the distribution of test cases: if test data is not well distributed then conclusions drawn from the test results may be invalid. In particular, the ==> operator can skew the distribution of test data badly, since only test data which satisfies the given condition is used.

FsCheck provides several ways to observe the distribution of test data. Code for making observations is incorporated into the statement of properties, each time the property is actually tested the observation is made, and the collected observations are then summarized when testing is complete.

Counting Trivial Cases

A property may take the form

trivial <condition> <property>

For example,

let prop_Insert = forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) 
                    (fun (x,xs) -> 
                    ordered xs ==> propl (ordered (insert x xs))
                    |> trivial (List.length xs = 0)) 

Test cases for which the condition is true are classified as trivial, and the proportion of trivial test cases in the total is reported. In this example, testing produces

> quickCheck prop_Insert;;
Ok, passed 100 tests (46% trivial).
val it : unit = ()
Classifying Test Cases

A property may take the form

classify <condition> <string> <property>

For example,

let prop_Insert2 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail")

Test cases satisfying the condition are assigned the classification given, and the distribution of classifications is reported after testing. In this case the result is

> quickCheck prop_Insert2;;
Ok, passed 100 tests.
48% at-tail, at-head.
25% at-head.
21% at-tail.
val it : unit = ()

Note that a test case may fall into more than one classification.

Collecting Data Values

A property may take the form

collect <expression> <property>

For example,

let prop_Insert3 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> collect (List.length xs))

The argument of collect is evaluated in each test case, and the distribution of values is reported. The type of this argument is printed using any_to_string. In the example above, the output is

> quickCheck prop_Insert3;;
Ok, passed 100 tests.
46% 0.
36% 1.
11% 2.
6% 3.
1% 4.
val it : unit = ()
Combining Observations

The observations described here may be combined in any way. All the observations of each test case are combined, and the distribution of these combinations is reported. For example, testing the property

let prop_Insert3 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail"
        |> collect (List.length xs))   

produces

> quickCheck prop_Insert3;;
Ok, passed 100 tests.
46% 0, at-tail, at-head.
18% 1, at-head.
17% 1, at-tail.
6% 2, at-head.
3% 3.
2% 2.
2% 3, at-tail.
2% 2, at-tail.
1% 4.
1% 3, at-head.
1% 2, at-tail, at-head.
1% 1, at-tail, at-head.
val it : unit = ()

 
from which we see that insertion at the beginning or end of a list has not been tested for lists of four elements.

Test Data Generators: The Type Gen

Test data is produced by test data generators. FsCheck defines default generators for some often used types, but you can use your own, and will need to define your own generators for any new types you introduce.

Generators have types of the form 'a Gen; this is a generator for values of type a. The type Gen is a computation expression called gen, so F#'s computation expression syntax can be used to define generators.

Generators are built up on top of the function

val choose : (int * int -> int Gen)

which makes a random choice of a value from an interval, with a uniform distribution. For example, to make a random choice between the elements of a list, use

let chooseFromList xs = gen { let! i = choose (0, List.length xs) 
                              return (List.nth xs i) }

Note that choose does not include the upper bound of the interval, so you can use List.length without deducting 1.

Choosing Between Alternatives

A generator may take the form

oneof <list of generators>

which chooses among the generators in the list with equal probability. For example,

oneof [ gen { return true }; gen { return false } ]

generates a random boolean which is true with probability one half.

We can control the distribution of results using the function

val frequency: (int * 'a Gen) list -> 'a Gen 

instead. Frequency chooses a generator from the list randomly, but weights the probability of choosing each alternative by the factor given. For example,

frequency [ (2, gen { return true }); (1, gen { return false })]

generates true two thirds of the time.

The Size of Test Data

Test data generators have an implicit size parameter; FsCheck begins by generating small test cases, and gradually increases the size as testing progresses. Different test data generators interpret the size parameter in different ways: some ignore it, while the list generator, for example, interprets it as an upper bound on the length of generated lists. You are free to use it as you wish to control your own test data generators.

You can obtain the value of the size parameter using

val sized : ((int -> 'a Gen) -> 'a Gen)

sized g calls g, passing it the current size as a parameter. For example, to generate natural numbers in the range 0 to size, use

sized <| fun s -> choose (0,s)

The purpose of size control is to ensure that test cases are large enough to reveal errors, while remaining small enough to test fast. Sometimes the default size control does not achieve this. For example, towards the end of a test run arbitrary lists may have up to 50 elements, so arbitrary lists of lists may have up to 2500, which is too large for efficient testing. In such cases it can be useful to modify the size parameter explicitly. You can do so using

val resize : (int -> 'a Gen -> 'a Gen)

resize n g invokes generator g with size parameter n. The size parameter should never be negative. For example, to generate a random matrix it might be appropriate to take the square root of the original size:

let matrix gn = sized <| fun s -> resize (s|>float|>sqrt|>int) gn
Generating Recursive Data Types

Generators for recursive data types are easy to express using oneof or frequency to choose between constructors, and F#'s standard computation expression syntax to form a generator for each case. There are also liftGen functions for arity up to 4 (more are easy to add) to lift constructors and functions into the Gen type. For example, if the type of trees is defined by

type Tree = Leaf of int | Branch of Tree * Tree

then a generator for trees might be defined by

let rec unsafeTree = 
    oneof [ liftGen (Leaf) Gen.Int; 
            liftGen2 (fun x y -> Branch (x,y)) unsafeTree unsafeTree]

However, there is always a risk that a recursive generator like this may fail to terminate, or produce very large results. In any case, the F# compiler generates an error for the above function for exactly this reason: the function calls itself. To avoid this, recursive generators should always use the size control mechanism. For example,

let tree =
    let rec tree' s = 
        match s with
          | 0 -> liftGen (Leaf) Gen.Int
          | n when n>0 -> 
            let subtree = tree' (n/2) in
            oneof [ liftGen (Leaf) Gen.Int; 
                    liftGen2 (fun x y -> Branch (x,y)) subtree subtree]
    sized tree'

Note that

  • We guarantee termination by forcing the result to be a leaf when the size is zero.
  • We halve the size at each recursion, so that the size gives an upper bound on the number of nodes in the tree. We are free to interpret the size as we will.
  • The fact that we share the subtree generator between the two branches of a Branch does not, of course, mean that we generate the same tree in each case.
Useful Generator Combinators

If g is a generator for type t, then

  • two g generates a pair of t's,
  • three g generates a triple of t's,
  • four g generates a quadruple of t's,
  • vector n g generates a list of n t's.
  • If xs is a list, then elements xs generates an arbitrary element of xs.
Default Generators

FsCheck uses static members of the type Gen to define default test data generators for some types. FsCheck defines instances for the types (), bool, int, float, pairs, triples, quadruples, lists, and functions. You can define new ones by making new instances of 'a Gen.

To generate random functions of type 'a->'b, you need to provide an instance of type 'a-> 'b Gen -> 'b Gen, which we'll call the cogenerator. The implementation of ('a->'b) Gen uses a cogenerator for type a. If you only want to generate random values of a type, you need only to define a generator for that type, while if you want to generate random functions over the type also, then you should define a cogenerator as well.

A cogenerator function interprets a value of type a as a generator transformer. It should be defined so that different values are interpreted as independent generator transformers. These can be programmed using the function

val variant : (int -> 'a Gen -> 'a Gen)

For different natural numbers i and j, variant i g and variant j g are independent generator transformers. The argument of variant must be non-negative, and, for efficiency, should be small. Cogenerators can be defined by composing together generator transformers constructed with variant.

For example,for the type Tree defined above, a suitable instance of a cogenerator, allowing you to define functions Tree -> 'a, can be defined as follows

let rec cotree t = 
    match t with
       | (Leaf n) -> variant 0 << Co.Int n
       | (Branch (t1,t2)) -> variant 1 << cotree t1 << cotree t2
Properties of Functions

Since FsCheck can generate random function values, it can check properties of functions. For example, we can check associativity of function composition as follows:

let treegen = (Gen.Tuple(tree,three (Gen.Arrow(cotree,tree))))
let prop_Assoc gen = forAll gen (fun (x,(f,g,h)) -> 
    propl ( ((f >> g) >> h) x = (f >> ( g >> h)) x ))

where we use the generator and cogenerator for trees defined above; we thus generate functions Tree -> Tree. If a counter-example is found, function values will be displayed as "<func>".

FsCheck internals

FsCheck follows the Haskell implementation of QuickCheck almost literally. For some explanation, read the original QuickCheck paper.

In the implementation, there are still a few issues that I'm unhappy with:

  • I ported Haskell's random implementation, but I'm still not 100% satisfied with it.
  • decodeFloat is somewhat amateurish. Might even be wrong. I need to test the generator for floats more thoroughly.
  • It would be cool to have more standard generators, e.g. for strings and more .net oriented types, like Func<>, TimeSpan, DateTime and so on.

Download

Finally, download the latest Visual Studio solution from my Skydrive folder: FsCheck 0.2

Older releases:

Enjoy!

9 comments:

  1. This is great. Thanks for taking the initiative to implement this. Would you consider placing the source in an open repository so that more people could contribute?

    ReplyDelete
  2. Chance,

    Wow, you actually beat me to the announcement.

    Yes, I would like to put it in an open repository. Do you (or anyone) have any experience with any such repositories? Googlecode, sourceforge, codeplex?

    I'd prefer one with subversion support. I've heard that codeplex is quite bad in the sourcecontrol department.

    Anyways, I'll try to get it online by the end of the week.

    ReplyDelete
  3. I notice that fscheck's conditional expressions aren't quite lazy enough. For example, say I have

    let prop_foo = forAll (Gen.Int) (fun a -> false ==> prop (printf "foo\n"; true))

    quickCheck prop_foo will print foo 100 times before saying that the arguments were exhausted. But if the left-hand side of ==> is false, the right-hand side shouldn't be evaluated at all!

    Haskell gets around this problem by its implicit laziness (to say nothing of its referential transparency). QCheck for sml requires the arguments of ==> to be functions, and achieves its laziness that way.

    Actually QCheck for sml has a slightly different design from fscheck, but the authors seem to have gotten around some of the limitations you've run into. Have you looked at it?

    ReplyDelete
  4. Hi Galen,

    Yes, I've looked at QCheck/SML as well as RushCheck (for Ruby). I tried to port QCheck's random generator but couldn't get the kinks out of it. I haven't looked at any of those in sufficient detail for them actually influence the design of FsCheck though.

    I expected problems translating lazy Haskell to eager F#, but all in all there weren't much. The case you're presenting I see as a performance issue, and should be looked at, but presents no functionality problem imo.

    What are the (other) limitations you hint at?

    Thanks for taking the time to leave a comment!

    Kurt

    PS I've disabled comment checking, shouldn't be getting much spam at this point anyway (hopefully).

    ReplyDelete
  5. Hi Kurt.

    What about this property?

    let prop_foo = forAll (Gen.Int) (fun a -> a != 0 ==> prop (1/a == 1/a))

    quickCheck prop_foo stops with "System.DivideByZeroException: Attempted to divide by zero."

    Unless I'm missing a way of catching errors (which is very possible -- I'm quite new to F#), this is a severe limitation.

    One design difference between FsCheck and QCheck is that with QCheck you specify your properties independently of your generators, and then specify the generators when you run the tests. So whereas in FsCheck you write

    let prop_insert = forAll (Gen.Tuple(Gen.Int,Gen.List(Gen.Int))) (fun (x,xs) -> ordered xs ==> prop (ordered (insert x xs))

    quickCheck prop_insert

    in QCheck you would write something like

    val prop_insert = (fun (x,xs) -> ordered xs) ==> (fun (x,xs) -> ordered (insert x xs))

    checkGen (Gen.Tuple (Gen.Int, Gen.List(Gen.Int))) prop_insert

    In one way the QCheck property is easier to read, since it doesn't have all that generator junk tacked onto it. On the other hand, because QCheck uses functions to implement laziness, you have to pass two functions to ==>, which makes things ugly. It seems like there might be a nice middle ground where you use F#'s lazy capacities to make ==>'s right argument lazy, allowing QCheck's two functions to be combined into one. We would then have

    let prop_insert (x,xs) = ordered xs ==> ordered (insert x xs)

    quickCheck (Gen.Tuple(Gen.Int,Gen.List(Gen.Int))) prop_insert

    which I think is much clearer than the previous two examples. Of course, this ignores the prop combinator, but I haven't yet understood why it's necessary. And the fact that QCheck doesn't have it makes me think there might be a way to get rid of it.

    ReplyDelete
  6. Galen,

    >>>
    Unless I'm missing a way of catching errors (which is very possible -- I'm quite new to F#), this is a severe limitation.
    <<<

    You're right, of course. This bug just moved to the top of my todo list ;)

    Also FsCheck should catch exceptions like those and report them as a counterexample instead of crashing.

    >>>
    Of course, this ignores the prop combinator, but I haven't yet understood why it's necessary. And the fact that QCheck doesn't have it makes me think there might be a way to get rid of it.
    <<<

    There's no reason why you can't split the generator and property in FsCheck right now.

    Your last example can be almost achieved with FsCheck, given that it would define the straightforward function

    let quickCheck' gen pr = forAll gen pr |> quickCheck

    You could write:

    let prop_insert (x,xs) = ordered xs ==> prop (ordered (insert x xs))

    quickCheck' (Gen.Tuple(Gen.Int,Gen.List(Gen.Int))) prop_insert

    which is almost your wish save for the prop function. prop is necessary to make a bool a Property type, so that it can be combined using the other property generators (classify etc).

    Note that QCheck in fact does have the pred function, which servers a similar function as FsCheck's prop, except QCheck lets you drop it when you don't use any of the statistics gathering functions. I choose consistency over that (imo, minor) notational convenience. I think however I can get rid of prop alltogether by using member overloading, but haven't worked out all the kinks yet.

    Thanks for your feedback!

    Kurt

    ReplyDelete
  7. Hi Kurt.

    Any progress on lazifying fscheck's conditional expressions?

    galen

    ReplyDelete
  8. Hi,

    Yes, I have almost a new (minor) version ready, but it'll have to wait until august, I'm out of the country till then...

    sorry for the delay!

    Kurt

    ReplyDelete
  9. You might be interested to look at my translation of QuickCheck to SML. You can find it at http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/unit-test/unstable/.

    My design intentionally deviates in a number of ways from Haskell's QuickCheck to make it better fit into a ML-style "strict with unrestricted effects" language. In particular, tests, including randomized tests, are just "unit -> unit" functions that indicate various conditions by throwing exceptions.

    It makes use of generic programming techniques that allows convenient default generation of values of arbitrary SML types. It also makes use of a variable arguments fold technique to make test specifications succinct.

    See the readme file, the signature file in the public directory and the examples. If you wish to discuss the design (of testing frameworks in ML-style languages) further, feel free to e-mail me.

    ReplyDelete