22 December 2009

On Understanding Data Abstraction, Revisited

I’m not in the habit of posting links, but this essay by Willam R. Cook on the difference between ADTs and objects made such an excellent and humbling read... Humbling because in my arrogance I had assumed that I knew the difference – after all this is basic stuff. Excellent because despite the humbling experience, it’s a very enjoyable read.

Enjoy.

Share this post :

08 December 2009

How to get hg-git working with Mercurial without installing Python on Windows

Sorry for the horrendous search keyword-driven title, but I’ve been trying to get this to work for ages.

Here’s a step by step guide, along with some potential pitfalls.

1) Download TortoiseHg. I used 0.9.1.1. with Mercurial 1.4.1. Do not use the windows Mercurial command line only distribution – we’ll need to modify a zip file later and for some reason the zip file with the command line distribution cannot be modified with either WinRar, Winzip or 7-zip.

2) Install TortoiseHg as usual.

3) Download dulwich 0.4.0.

4) Add the directory called ‘dulwich’ inside the dulwich download (the one with the file __init__.py in it) to the library.zip file in the TortoiseHg folder (normally just C: \Program Files\TortoiseHg) using your favorite zip utility.

5) Download the tip of an up to date hg-git fork. I used http://bitbucket.org/gwik/hg-git/.

6) Put the hg-git directory in the TortoiseHg directory.

7) Add the following lines to your Mercurial.ini file:

bookmarks =

hggit = C:\Program Files\TortoiseHg\hg-git\hggit

That’s it. I finally managed to clone the ClojureClr repository with the above install:

> hg clone git://github.com/richhickey/clojure-clr.git
destination directory: clojure-clr.git
importing Hg objects into Git
Counting objects: 4001, done.
Compressing objects: 100% (1816/1816), done.
Total 4001 (delta 3153), reused 2774 (delta 2148)
importing Git objects into Hg
at:   0/307
at: 100/307
at: 200/307
at: 300/307
updating to branch default
313 files updated, 0 files merged, 0 files removed, 0 files unresolved

And with this we’re back to functional programming languages ;)

I didn’t test this any further.

Good luck!

Technorati: , , ,

Share this post :

26 October 2009

FsCheck 0.6.2: F# October 2009 CTP/2010 Beta 2

FsCheck is updated for the new F# release! Conversion was fairly straightforward, some name changes and indentation changes mostly. Furthermore, the following smaller changes: 

  • A nice set of new default generators contributed by Howard Mansell. FsCheck can now generate any Enum instance, as well as generators for DateTime, arrays, positive and negative ints and so on. Check out the Arbitrary.fs file for a complete overview.
  • Some more examples
  • registerInstances and overwriteInstances now throws an exception if it cannot find any instances in the given type.

Happy FsChecking!

19 September 2009

Simple and Fair Backtracking Computations

Translated to F# from the fair and terminating backtracking monad and monad transformer by Oleg Kiselyov.

Sequence expressions in F# can be seen as backtracking computations. For example,

let backtrack =
    seq {   for i in 1..10 do
            for j in 1..10 do
            if i*j > 10 then yield (i,j) }

will yield one by one (i.e. lazily) all the tuples i,j that satisfy the guard i*j>10. In Haskell, you would use list comprehensions to much the same effect. Once the values of j are exhausted for a given i, the computation backtracks, takes the next value for i and retries all values for j. Don’t mistake the syntactic sugar for nested loops  – there is really a computation builder behind this that translates to something like

1..10 >>= (fun i –> 1..10 >>= (fun j –> …))

where (>>=) represents monadic bind. This translations shows more clearly that the binds are really doing backtracking. It’s just F#’s syntactic sugar for computation expressions that makes the computation look like a nested loop.

Unfortunately, this simplicity has some drawbacks.

let infinite =
    seq {   for i in 1..10 do
            for j in Seq.initInfinite id do
            if i > 5 then yield (i,j) }

will never return any results, although there is an infinite number of solutions. The reason is that generation gets stuck on j – an infinite stream that will never yield a solution as long as the computation does not backtrack and tries new values for i. The computation is unfair in its choice of values to try – it always chooses the last stream until that is exhausted.

The thing to realize is that, if viewed as a backtracking computation, there are basically two operations that are important: choice and bind.

Choice gives the computation a range of elements to choose from. In the example above, we have an element i that can be chosen from the interval [1,10] and an element j that can be chosen from the interval [0,Infinity]. It can be viewed as logical disjunction: for i, choose 1 or 2 or 3 or 4 or…

Bind gives the computation a number of sub-computations that all need to return a value if the complete computation is to return a value. It can be understood as logical conjunction: in the example above, i should be chosen from [1,10] and j should be chosen from [0,Infinity] and i should be > 5.

Theoretically, it does not matter in which order these kinds of generate-and-test computations are executed. However, for performance and once you start using infinite streams, it becomes important that the computation is fair – in the sense that the choices it makes give every option consideration, even if once choice has not been exhausted yet. The results of all backtracking computations should be equivalent modulo the order in which the results are returned.

So, can we make an alternative computation builder for which this kind of computation returns results?

Fair streams

Let’s start by making a new datatype of streams, that we can use to make fairer choices, and that also can be suspended so that infinite streams can be supported:

type Stream<'a> = 
    | Nil                               //empty
    | One of 'a                         //one element
    | Choice of 'a * Stream<'a>         //one element, and maybe more
    | Incomplete of Lazy<Stream<'a>>    //suspended stream

 

Using this stream, we can code a fair choice function:

let rec choice r r' = 
    match r with
    | Nil -> r' //first empty-> try the second
    | One a -> Choice (a,r') //Put in front of the new stream
    | Choice (a,rs) -> Choice (a,choice r' rs) //interleave r and r' here
    | Incomplete i ->
        match r' with
        | Nil -> r
        | One b -> Choice (b,r)
        | Choice (b,rs') -> Choice (b,choice r rs') 
        | Incomplete j -> Incomplete (lazy (choice i.Value j.Value))

The important thing here is the switch of argument order in the recursive call to choice in the third case. This ensures that, in contrary to normal sequences, a choice will choose alternatively from the first and second stream (and so on, recursively, for more streams). Choice is actually somewhat equivalent to concatenate on sequences, only it interleaves the elements so that the choice is fair.

Bind is more straightforward:

let rec bind m f =
    match m with
        | Nil -> Nil
        | One a -> (f a)
        | Choice (a,r) -> choice (f a) (Incomplete (lazy bind r f))
        | Incomplete i -> Incomplete (lazy bind i.Value f)

This is what you would expect from a normal monadic bind on a list like type: a variant of map – concat on each element of the first stream. Note in the third case that we take care to encode the resulting stream as a number of choices that we can choose from fairly.

We now have the important elements to make a computation builder:

type FairStream() =
    member x.Return a = One a
    member x.Yield a = One a
    member x.Bind (m, f) = bind m f
    member x.Zero() = Nil
    member x.Combine (r,r') = choice r r'
    member x.Delay (f:unit->Stream<_>) = Incomplete (Lazy.Create f)
   
let bt = FairStream()

I called it bt, short for backtracking. Now we need a function to run our computation:

let rec run d st =
    match (d,st) with
    | _,Nil -> Seq.empty
    | _,One a -> seq { yield a }
    | _,Choice (a,r) -> seq { yield a; yield! run d r }
    | Some 0,Incomplete i -> Seq.empty //exhausted depth
    | d,Incomplete (Lazy r) -> let d' = Option.map ((-) 1) d in run d' r

The run function takes an optional parameter to bound the number of backtracking steps, which may be useful to help with termination.

A few examples

First, let’s define a stream of numbers [0,Infinity]:

let rec number() = bt { yield 0
                        let! n = number() in yield n+1 }

Nothing really amazing here – you could do something similar with sequences as well. However, if we use the choice constructor explicitly, we can define number as a left-recursive function:

let rec number() = choice (bt { let! n = number() in return n+1 }) (bt {return 0})

Both number functions yield the same results.

Now for something a bit more spectacular.

let guard assertion = 
    bt { if assertion then return () }
let pythagoreanTriples =
    bt{ let! i = number()
        do! guard (i > 0)
        let! j = number()
        do! guard (j > 0) 
        let! k = number()
        do! guard (k > 0)
        do! guard (i*i + j*j = k*k)
        return (i,j,k) }

This computation will never yield any results in a standard sequence expression, but with the fair streams it works, even if you leave off the guards.

The discerning reader might wonder why we can’t write:

let rec number() = 
    bt { let! n = number() in return n+1
         return 0 }

which is a left recursive version in a computation expression style. I don’t know the answer myself – some details are lost in the translation to the methods on the computation builder, but I’m guessing it has something to do with the bind not being lazy enough, and/or the choice function not getting inserted in the appropriate place. I’ve tried various approaches, but couldn’t get it to work. So I guess there’s some homework for you :) In the mean time, I think I’m going to work through Backtracking, Interleaving, and Terminating Monad Transformers by Kiselyov et al…

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

27 June 2009

More fun with data structures, continuations and call/cc

Last time I showed a few ways of implementing insert into an unbalanced binary tree, as an implementation of a set data structure. This post adds an implementation that is similar in functionality to the example that uses call/cc, but using the more familiar exceptions to avoid allocating elements needlessly:

let insertExc x t =
    try
        let rec insertRec x t =
            match t with
            | Empty ->  cont { return Elem(empty,x,empty) }
            | Elem (left,e,right) when x < e -> 
                cont {  let! t' = insertRec x left
                        return Elem(t',e,right) }
            | Elem (left,e,right) when x > e -> 
                cont {  let! t' = insertRec x right
                        return Elem(left,e,t') }
            | Elem _ -> failwith "alreadyExists"
        insertRec x t id
    with e –> t

Performance shootout

A comment on the last post indicated that exceptions should be much slower (on .NET, at least) than the callCC approach. Furthermore, I said that callCC was necessary to avoid re-allocating elements needlessly on the search path to an already existing element. Let’s put these claims to the test with a simple benchmark:

let longList = 
    let r = new System.Random()
    let l = 100000
    [ for i in 1..l -> r.Next(0,l) ]

let insertALot f nbRuns =
    for i in 1..nbRuns do
        longList |> List.fold (fun t elem -> f elem t) empty

How many duplicates are in the long list (i.e. how much can we expect to gain by using callCC/exceptions):

> longList |> Seq.distinct |> Seq.length;;
val it : int = 63362
So about 35% duplicates. The results:
> insertALot insert 100;;
Real: 00:01:38.717, CPU: 00:01:37.516, GC gen0: 2247, gen1: 520, gen2: 8
val it : unit = ()
> insertALot insert' 100;;
Real: 00:01:44.583, CPU: 00:01:37.251, GC gen0: 4050, gen1: 1444, gen2: 18
val it : unit = ()
> insertALot insertM 100;;
Real: 00:02:35.101, CPU: 00:02:29.792, GC gen0: 12698, gen1: 1875, gen2: 24
val it : unit = ()
> insertALot insertCallCC 100;;
Real: 00:01:09.410, CPU: 00:01:02.540, GC gen0: 12149, gen1: 7, gen2: 0
val it : unit = ()
> insertALot insertExc 100;;

- Interrupt

Observe that:

  • the readability of computation expressions comes with a cost;
  • even with that cost, the callCC code is still significantly faster;
  • Exceptions are indeed horribly slow – I interrupted after about an hour.
Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

24 June 2009

Fun with data structures, continuations and call/cc

Prompted by exercise 2.3 in Purely Functional Data Structures, by Chris Okasaki.

type Tree<'a> = Empty | Elem of 'a Tree * 'a * 'a Tree

let empty = Empty
Recursive
let rec insert x t =
    match t with
    | Empty ->  Elem(empty,x,empty)
    | Elem (left,e,right) when x < e -> Elem (insert x left,e,right)
    | Elem (left,e,right) when x > e -> Elem (left,e, insert x right)
    | Elem _ –> t
Tail Recursive
let insert' x t =
    let rec cont x t k =
        match t with
        | Empty -> k (Elem(empty,x,empty))
        | Elem (left,e,right) when x < e -> cont x left (fun t' -> k <| Elem (t',e,right))
        | Elem (left,e,right) when x > e -> cont x right (fun t' -> k <| Elem (left,e,t'))
        | Elem _ -> k t
    cont x t id
Using a continuation monad
type Cont<'a,'r> = ('a -> 'r) -> 'r

type ContBuilder() =
    member x.Return (a):Cont<'a,'r> = fun k -> k a
    member x.Bind (c:Cont<'a,'r>, f:'a -> Cont<'b,_>) =
        (fun k -> c (fun a -> f a k))
    member this.Delay(f) = f()

let cont = ContBuilder()
let insertM x t =
    let rec insertRecM x t =
        match t with
        | Empty ->  cont { return Elem(empty,x,empty) }
        | Elem (left,e,right) when x < e -> 
            cont {  let! t' = insertRecM x left
                    return Elem(t',e,right) }
        | Elem (left,e,right) when x > e -> 
            cont {  let! t' = insertRecM x right
                    return Elem(left,e,t') }
        | Elem _ -> cont { return t }
    insertRecM x t id
Using call with current continuation
let callCC (f:('a -> Cont<'b,'r>) -> Cont<'a,'r>) : Cont<'a,'r> = fun k -> f (fun a _ -> k a) k
let insertCallCC x t =
    callCC (fun alreadyExists ->
        let rec insertRecCallCC x t =
            match t with
            | Empty ->  cont { return Elem(empty,x,empty) }
            | Elem (left,e,right) when x < e -> 
                cont {  let! t' = insertRecCallCC x left
                        return Elem(t',e,right) }
            | Elem (left,e,right) when x > e -> 
                cont {  let! t' = insertRecCallCC x right
                        return Elem(left,e,t') }
            | Elem _ -> alreadyExists t
        insertRecCallCC x t
    ) id

Please note that the complexity of the last implementation is not obfuscation – it avoids recreating nodes on the path to an already existing element (exercise 2.3 actually asks to do it using exceptions, which works just as well).

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

13 June 2009

Traversing and transforming F# quotations: A guided tour

There seems to be some lack of information about quotations – most is either short, a specific example, or is a bit outdated. The most comprehensive example is probably Tomáš Petříček’s Quotations Visualizer in the  F# samples. His site is also one of the better resources about quotations that I’ve found. Another one with some recent activity is Alex Pedenko’s blog. Furthermore, even the otherwise excellent Expert F# book has little to say about it, and predates the major changes that were made in the September CTP.

Here I’ll focus on the important general patterns of using quotations. That is, I’m going to show how to traverse and transform quotations, independent of any specific example. I’m going to focus more on the API and programming aspects, than on introductory concepts, so this assumes some intermediate-level knowledge of F#. Basic knowledge about quotations doesn’t hurt either, as I‘ll make explanations terse.

Either way, this post represents some knowledge that took me a while to figure out on my own, piecing together scattered sources.

I would like to point out that once you get over the learning curve, working with quotations is actually enjoyable. I’ve been refraining from using them because I was put off by some bad experiences with System.Reflection (I think System.Reflection truly sucks as an API) and because I felt that to do anything useful, you’d almost be forced to support (i.e. at the very least provide default implementations of) every single construct in F#. It turns out that the quotations API is very enjoyable to work with and that there are built-in means to start small and focus on the things that are interesting to you.

The basics

In short, a quotation is metadata that represents the code of a particular function or code snippet, that gets compiled into the dll (along with the actual executable code, in some cases) by the F# compiler. To those of you who are familiar with LINQ Expression trees, it’s the same concept only F#’s quotations cover the entire reach of the F# language.

There are three ways to obtain a quotation:

1) Use the <@  @> operation around a code snippet to get a typed quotation of type Expr<’a>, where ‘a is the type of the actual value you put into the snippet. The Expr “wrapper” essentially indicates that we’re not dealing with an actual value of type ‘a, but instead with a reified representation of that value – the quotation:

let typedExpr = <@ 1 + 5 @>

let typedExpr2 = <@ fun i -> i + 5 @>

Sending these to FSI gives:

val typedExpr : Quotations.Expr<int> =
  Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
      [Value (1), Value (5)])
val typedExpr2 : Quotations.Expr<(int -> int)> =
  Lambda (i,
        Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
              [i, Value (5)]))

First notice the types: the first is Expr<int> because 1 + 5 of course is an int value. The second example illustrates a function value, of type Expr<int->int>.

FSI has printed the values of the quotations: a representation of the code! In fact, as we’ll soon see, the Expr type is nothing more than a discriminated union with one case for each possible language construct in F# (it’s not actually a discriminated union under the covers, but that’s how it’s abstracted, and the abstraction is very clean. It doesn’t hurt to think about it that way.)

2) Use the <@@  @@> operator around a code snippet to get an untyped quotation, of type Expr. Expr<’a> is a subtype of Expr, and every Expr<’a> can be converted to an Expr by calling .Raw:

let untypedExpr = <@@ 1 + 5 @@>

let untypedExpr2 = <@@ fun i -> i + 5 @@>

Sending to FSI gives:

val untypedExpr : Quotations.Expr =
  Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
      [Value (1), Value (5)])
val untypedExpr2 : Quotations.Expr =
  Lambda (i,
        Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
              [i, Value (5)]))

Pretty much exactly the same information, except for the type. For library writers, the Expr type does all the heavy lifting of inspecting and transforming quotations: since in a library you almost always want to write some code for any quotation, you cannot write this code in a generic way with typed quotations.

3) Using the ReflectedDefinition attribute on a top level function gives you the best of both worlds: the function can be called and executed as normal, but it can also be inspected as a quotation:

[<ReflectedDefinitionAttribute>]
let quotation i = i + 5

Based on a MethodInfo, the Expression.TryGetReflectedDefinition method allows you to get to the Expr associated with the function. The necessary metadata is actually marshaled into the dll when this function is compiled.

The Microsoft.FSharp.Quotations namespace

This namespace in the FSharp core libraries contains three modules and three types. The types are Expr, Expr<’a> and Var. The first two we’ve discussed already, the third one is just the representation of a variable.

The meat of quotations processing is in the module Quotations.Patterns and the static methods on Expr. If you compare these two, you’ll notice that for every active pattern in the Patterns modules, there is a static method on Expr. For example, you have Patters.Call(…) and Expr.Call(…), with equivalent signatures. This is no coincidence: the Patterns module contains all the patterns you’ll need to deconstruct a quotation expression. Together, these patterns can traverse any F# abstract syntax tree. It contains 32 active patterns (that’s a lot, but don’t despair yet: it gets better. ).

Its dual, the static methods on the Expr type, construct quotation expressions programmatically.

If you know F#, it shouldn’t be too hard to figure out what the arguments mean. For example:

Expr.Call : obj:Expr * methodInfo:MethodInfo * arguments:Expr list –> Expr

constructs an expression that represents a method call (defined by a methodInfo) on a target object (i.e. it’s an instance method), taking the given  arguments. Conversely,

( |Call|_| ) : Expr -> (Expr option * MethodInfo * Expr list) option

is an active pattern that matches a method call, with an (optional) target, methodinfo and a list of arguments.

This duality makes it fairly easy to get started with constructing and deconstructing quotations. Try to make some quotations and print them using fsi – it will give you a good idea of what code constructs the patterns represent. Alternatively, use the quotation visualizer mentioned above.

Another module, Quotations.DerivedPatterns, contains additional active patterns that are not needed in the strict sense, but that represent some common use cases for deconstructing a quotation. An example is

( |SpecificCall|_| ) : Expr -> (Expr -> (Type list * Expr list) option)

which is a variant of Call that takes an expression and matches if the quotation is the given method or function call. The beauty of this particular active pattern is that you can use quotations to match on the given quotation:

let recognizePlus quotation =
    match quotation with
    | SpecificCall <@ (+) @> (types,exprs) -> printfn "Plus! types=%A  exprs=%A" types exprs
    | _ -> printfn "something else"
> recognizePlus <@ 1 + 2 @>;;
Plus! types=[System.Int32; System.Int32; System.Int32]  exprs=[Value (1); Value (2)]

Come on, this is still a lot of work, isn’t it?

That last example has one major shortcoming, which will hit you pretty hard when you’re trying to do anything useful with quotations. Even when you’re only interested in one particular kind of node (say, you’re interested in doing something with all (+) calls), you still have to walk the whole expression tree in order to be sure that you’ve seen all the method calls. Consider:

> recognizePlus <@ 1 * (2 + 3) @>;;
something else

Why? Because the tree looks like:

Call (None, Int32 op_Multiply[Int32,Int32,Int32](Int32, Int32),
      [Value (1),
       Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
             [Value (2), Value (3)])])

so we’re calling multiply first – and our function doesn’t do anything with that. In order to find all plus calls, we need to look through the whole tree recursively – including the multiply call’s children. But doesn’t that mean that we’ll have to rewrite our recognizePlus function to incorporate a match on all of the 32 active patterns in Quotations.Patterns (or at least those that return child expressions, which is pretty much all of them) just to be able to traverse the whole expression tree looking for a plus call?

Luckily, no. This is were the third module in the Quotations namespace comes in.

The secret sauce: Quotations.ExprShape

This module is really the workhorse of pretty much anything you’ll do with quotations. Now, I believe this will come as a surprise to most people – in fact, it took me a bit of luck (when I saw it being used in passing, in a post on hubfs) to really understand what this module was all about. It was one of those moments for which a beautiful German word exists: Aha-Erlebnis.

This module’s mystery is in no small part thanks to its“terse” documentation. In fact, it’s so short I’ll reproduce it completely here:

val RebuildShapeCombination : obj * Expr list -> Expr Re-build combination expressions. The first parameter should be an object returned by the ShapeCombination case of the active pattern in this module.

val ( |ShapeVar|ShapeLambda|ShapeCombination| ) :
  Expr -> Choice<Var,(Var * Expr),(obj * Expr list)>

An active pattern that performs a complete decomposition viewing the expression tree as a binding structure.

First, look at the active pattern. The key thing is that this active pattern is complete: every quotation will match with one of the three cases of the active pattern: either it is a variable (ShapeVar), a lambda (ShapeLambda)  or a “combination” (ShapeCombination). So, we now have a general and complete decomposition of any quotation, and it is short:

let workhorse quotation =
    match quotation with
    | ShapeVar v -> ()
    | ShapeLambda (v,expr) -> ()
    | ShapeCombination (o, exprs) -> ()

Notice if you paste this in an .fs file that the compiler does not complain about any missing match cases, reinforcing my point. So, now we can do this recursively, which gives us:

let rec traverse quotation =
    match quotation with
    | ShapeVar v -> ()
    | ShapeLambda (v,expr) -> traverse expr
    | ShapeCombination (o, exprs) -> List.map traverse exprs |> ignore

Here’s the beauty: this simple function traverses all the expressions and sub-expressions in any quotation. You don’t have to deal with all 32 cases in order to do something with one node. This function should be the basis of all your work with traversing quotations: anything you want to do, as far as inspecting the tree goes, can be built on top of this simple structure. Let’s revisit our earlier example:

let rec recognizePlus' quotation =
    match quotation with
    | SpecificCall <@ (+) @> (types,exprs) -> 
        printfn "Plus! types=%A  exprs=%A" types exprs
        List.map recognizePlus' exprs |> ignore
    | ShapeVar v -> ()
    | ShapeLambda (v,expr) -> recognizePlus' expr
    | ShapeCombination (o, exprs) -> List.map recognizePlus' exprs |> ignore

Let’s run this on our failing example of earlier:

> recognizePlus' <@ 1 * (2 + 3) @>;;
Plus! types=[System.Int32; System.Int32; System.Int32]  exprs=[Value (2); Value (3)]
It sure found the plus now.

Transforming quotations

So far, we’ve just been inspecting the quotation. Now let’s actually do something with it. Again, the ExprShape module will be a big win: the workhorse here is the ShapeCombination and RebuildShapeCombination duality.

You might have noticed the o parameter in the ShapeCombination; this is an opaque marker that stores how the sub-expressions in a shape combination can be put back together , i.e. whether it is a function call, a let binding, or anything else that has child expressions. This is were the function RebuildShapeCombination comes into play: given an o marker and the right number and type of child expressions, it can put an expression deconstructed by ShapeCombination back together . So, whereas above we wrote a general traversal function for quotations, a general transformation function looks like:

let rec transform quotation =
    match quotation with
    | ShapeVar v -> Expr.Var v
    | ShapeLambda (v,expr) -> Expr.Lambda (v,transform expr)
    | ShapeCombination (o, exprs) -> RebuildShapeCombination (o,List.map transform exprs)

This function takes any quotation and transforms it into itself, visiting, deconstructing and reconstructing every node in the expression tree along the way.

If you’re familiar with a zipper data structure, this reminds me of it: the active patterns “open the zipper” and the RebuildShapeCombination function “closes the zipper”. (Actually, I’m not sure that the sub-expressions get reused or are made anew, but it seemed familiar anyway.)

So, let’s do some transformation: let’s replace all calls to (+) with calls to (-). (Maniacal laugh and organ playing):

let rec evilTransform quotation =
    let minus l r = <@@ %%l - %%r @@>
    match quotation with
    | SpecificCall <@ (+) @> (types,l::r::[]) -> 
        minus (evilTransform l) (evilTransform r)
    | ShapeVar v -> Expr.Var v
    | ShapeLambda (v,expr) -> Expr.Lambda (v,evilTransform expr)
    | ShapeCombination (o, exprs) -> RebuildShapeCombination (o,List.map evilTransform exprs)

Ignoring the specifics of the actual transformation, which I’ll get to in a moment, notice that the structure of the transformation is exactly the same as above, except now we’ve added one pattern to deal with the case we’re interested in: a call to addition.  Essentially, this is saying: do something special for addition, and just traverse the tree, leaving it unchanged, for all the rest.

Splicing

Now, I know I said earlier that the Expr.Call static method is the dual of the Call active pattern, so you’d expect that I would have used that in the result of the SpecificCall match above. I could have done that, but instead I chose to demonstrate another useful technique for constructing expressions: splicing, denoted by the % or %% operator, for typed and untyped splicing respectively. All this operator does is insert an expression in a certain place (or places) into a quotation. So the minus function constructs a quotation that represents the subtraction of two given expressions (this doesn’t always make sense – in that case we’ll get a runtime exception with untyped quotations). Splicing is a very readable and versatile way of constructing expressions, because you can actually write your expressions in F# instead of using Expr.Call and friends.

The proof of the pudding

In the F# powerpack, there are some useful (but somewhat experimental) functions to actually compile and evaluate any F# quotation. The process is somewhat convoluted: the quotation trees are first translated to .NET Expressions, and then that API is used to compile them. This comes with a significant performance hit: code compiled using that route can be a factor of 5 slower. So don’t use it for optimization. So far, I haven’t run into any limitations yet, and it’s a nice way to check whether everything works:

let before = <@ 1 + 2 * 3 + 5 @>
let after = evilTransform <@ 1 + 2 * 3 + 5 @>
printfn "%A" <| before.EvalUntyped()
printfn "%A" <| after.EvalUntyped()

Shows:

12
-10

val evilTransform : Expr -> Expr
val before : Expr<int> =
  Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
      [Call (None, Int32 op_Addition[Int32,Int32,Int32](Int32, Int32),
             [Value (1),
              Call (None, Int32 op_Multiply[Int32,Int32,Int32](Int32, Int32),
                    [Value (2), Value (3)])]), Value (5)])
val after : Expr =
  Call (None, Int32 op_Subtraction[Int32,Int32,Int32](Int32, Int32),
      [Call (None, Int32 op_Subtraction[Int32,Int32,Int32](Int32, Int32),
             [Value (1),
              Call (None, Int32 op_Multiply[Int32,Int32,Int32](Int32, Int32),
                    [Value (2), Value (3)])]), Value (5)])

Pretty cool, and it opens up all kinds of interesting possibilities. For example, you can write a lazy transformer, which inserts lazy and force in the appropriate places to simulate call-by-need semantics in F#. You can add side-effects to print the result of each call as it is evaluated. Or you can transform the expression in something completely different – SQL, PHP, FPGA instructions, whatever you want.

Conclusions

The three main things you should know about the quotations API:

  1. Quotations.Expr and Quotations.Patterns are each other’s dual: they are used for constructing and deconstructing Expr types respectively;
  2. Quotations.ExprShape contains the workhorse of any quotations-using implementation: a zipper-like pattern for traversing and transforming any quotation in a straightforward way;
  3. Splicing is used to effectively and readably construct parameterized expressions as an alternative to Expr.Whatever calls.

Like I said before, I think the API strikes an excellent balance between simplicity and abstraction. F#’s features come together nicely to provide a clean development experience. I was somewhat afraid of quotations before, because I had an image in my mind that it would be messy, lots of work, and lots of special cases. Turns out it isn’t, and I hope this post will do its part in getting some more attention to this aspect of F#.

Download the example .fs file. Don’t forget to add references to FSharp.PowerPack and FSharp.PowerPack.Linq, if you want to run the dynamic compilation examples.

Technorati:

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

23 May 2009

Announcing FsCheck 0.6.1

No functional changes, just an update as a result of the May CTP released by the F# team.

Get it here.

Change log:

  • Many name changes as a result of name changes in the F# API
  • Used the new formatting API, which gets rid of a bunch of obsolete warnings
  • Tried to solve issues resulting from changed initialization order
  • Added a standalone version of the FsCheck binary, to accommodate C#/VB users better
  • Added a contributors file. Want your name in there? Then contribute ;)

Enjoy the new CTP!

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

12 May 2009

How to test DSLs (and: FsChecking FsCheck)

There are two causes for this post.

First, I’ve been watching videos from Lang.NET and DSL DevCon, so I’m in language-mode. It occurred to me that (domain specific) languages seem difficult to test using traditional unit tests: after all, the usage possibilities are far greater for a language than for, say, a typical user interface (maybe this means that typical user interfaces suck - you decide). The word “combinator library”, which are internal DSLs avant la lettre, says it all: the usage possibilities are combinatorial. Classic, example based unit testing thus becomes combinatorially expensive as well.

Second, I’ve been saying in the last release announcement of FsCheck that I wanted to write some FsChecks for FsCheck itself. So far, this effort is about 60% complete. As a sort of challenge, I want to check as much as possible from FsCheck without any traditional example-based unit testing – in other words, I want to test FsCheck using only FsCheck (and maybe some Pex and Code Contracts later on).

FsCheck itself consists of basically two internal DSLs: one for constructing random value generators for whatever value it is your tests need, and one for constructing properties: basically assertions augmented with property combinators for adding labels, classifying generated values and such. It’s the latter DSL that I’m going to test, and draw some general conclusions that may apply to testing other DSLs or even general purpose languages as well.

To see what we’re dealing with, here’s an example of using the property DSL:

let prop_InsertCombined (x:int) xs = 
    ordered xs ==> (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail"
        |> collect (List.length xs)
quickCheck prop_InsertCombined

This property asserts that the insert function maintains the invariant that the list is ordered. The classify and collect property combinators are used to gather some information about the generated data. So in this test, there are three property combinators at work: implies (==>), classify and collect. Also, the arguments to the property (x and xs) are implicitly universally quantified using the forAll property combinator. These and more combinators in FsCheck can be combined an arbitrary ways, so testing this using example-based testing can be quite tedious. Can FsCheck do better?

The property DSL’s syntax tree

First, we’re going to need to generate an arbitrary property, i.e. an arbitrary combination of property combinators. To do that, I made a symbolic representation of the property language – a sort of abstract syntax tree of the language. It’s no secret that discriminated unions are great for this, and this case is not different:

type SymProp =  | Unit | Bool of bool | Exception
                | ForAll of int * SymProp
                | Implies of bool * SymProp
                | Classify of bool * string * SymProp
                | Collect of int * SymProp

So, a property can consist of, respectively:

  • A function that returns the unit value;
  • A function that returns a bool value;
  • A function that throws an exception;
  • A function that uses the implies combinator, which needs two arguments: a boolean condition and another property;
  • A function that classifies the generated values, which needs three arguments: a boolean condition, the name to classify with if the condition is true and another property;
  • A function that collects values, which takes two arguments: the value to collect, and another property.

Note that I’ve made some simplifications here: in fact forAll takes a generator that generates values of an arbitrary type – but since we’re not testing generators here, I’ve assumed the forAll always generates a constant int value. Also, collect would normally collect a (function of a) generated value, but it’s here constrained to collecting a constant int value.

We can build an actual property from this symbolic representation:

let rec private toProperty prop =
        match prop with
        | Unit -> property ()
        | Bool b -> property b
        | Exception -> property (lazy (raise <| InvalidOperationException()))
        | ForAll (i,prop) -> forAll (constant i) (fun i -> toProperty prop)
        | Implies (b,prop) -> b ==> (toProperty prop)
        | Classify (b,stamp,prop) -> classify b stamp (toProperty prop)
        | Collect (i,prop) -> collect i (toProperty prop)

The simplifications are plainly visible here.

Defining the semantics of properties

Now, let’s define a specification of what a symbolic property means, in terms of what it should produce as Result. Result is an internal datatype used by FsCheck to track the result of one execution of a property. This is the gist of it:

type Outcome = 
    | Timeout of int
    | Exception of exn
    | False
    | True
    | Rejected

type Result = 
    {   Outcome     : Outcome
        Stamp       : list<string>
        Labels      : Set<string>
        Arguments   : list<obj> }

So, a result tracks the outcome of a property: timed out, raised exception, returned false or true, or value rejected. The latter means a generated value did not pass the condition of an implies combinator. Furthermore, it also tracks the generated arguments, the stamps applied using classify and collect, as well as labels which are not tested here.

And here is a formal, executable specification of a subset of the FsCheck property DSL:

let rec private determineResult prop =
        let addStamp stamp res = { res with Stamp = stamp :: res.Stamp }
        let addArgument arg res = { res with Arguments = arg :: res.Arguments }
        match prop with
        | Unit -> succeeded
        | Bool true -> succeeded
        | Bool false -> failed
        | Exception  -> exc <| InvalidOperationException()
        | ForAll (i,prop) -> determineResult prop |> addArgument i
        | Implies (true,prop) -> determineResult prop
        | Implies (false,_) -> rejected
        | Classify (true,stamp,prop) -> determineResult prop |> addStamp stamp
        | Classify (false,_,prop) -> determineResult prop
        | Collect (i,prop) -> determineResult prop |> addStamp (any_to_string i)

You can check this definition with the FsCheck documentation. For example, it shows that an assertion that returns unit or true is interpreted as succeeded, false as failed and so on. The succeeded function and friends are defined in FsCheck and just construct basic Result instances:

let private result =
  { Outcome     = Rejected
  ; Stamp       = []
  ; Labels       = Set.empty
  ; Arguments   = []
  }

let internal failed = { result with Outcome = False }

The test

Armed with these functions, we can write our test:

let SymProperty (symprop:SymProp) = 
        let expected = determineResult symprop 
        let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
        areSame expected actual
        |> label (sprintf "expected = %A - actual = %A" expected actual)
        |> collect symprop

Ignore the specifics of the “actual” symbol definition – it executes an actual property and extracts the Result from it. The gist of the property is the comparison between the actual Result of the execution of the property (that was built from the symbolic property representation), and compare that with our expected semantics. Just to see what kind of properties we’re testing, I’ve also added a collect combinator, as well as a label to see what’s wrong if the test should fail. Thanks to FsCheck’s built-in discriminated union generator, running this gives:

Property.SymProperty-Ok, passed 100 tests.
22% Exception.
12% Unit.
6% Bool true.
6% Bool false.
4% Classify (false,"",Unit).
2% Implies (false,Exception).
2% Classify (true,"",Unit).
2% Classify (true,"",Exception).
2% Classify (false,"",Exception).
1% Implies (true,Exception).
1% Implies (true,Collect (2,Bool true)).
1% Implies (true,Collect (0,Unit)).
1% Implies (true,Collect (0,Collect (0,Bool false))).
(and so on)

Generating more interesting properties

This leaves something to be desired: the depth of combinations isn’t very high, and almost half of the generated properties are the trivial bool, unit or exception cases. One way of increasing this depth is by increasing the size of the generator, but that didn’t work very well: it mostly increased the length of the generated strings, without increasing the depth too much. The reason is that three of the seven options are “leaf” nodes, where generation ends. So I wrote my own generator for the SymProp type:

let rec private symPropGen =
        let rec recGen size =
            match size with
            | 0 -> oneof [constant Unit; liftGen (Bool) arbitrary; constant Exception]
            | n when n>0 ->
                let subProp = recGen (size/2)
                oneof   [ liftGen2 (curry ForAll) arbitrary (subProp)
                        ; liftGen2 (curry Implies) arbitrary (subProp)
                        ; liftGen2 (curry Collect) arbitrary (subProp)
                        ; liftGen3 (fun b s p -> Classify (b,s,p)) arbitrary arbitrary (subProp)]
            | _ -> failwith "symPropGen: size must be positive"
        sized recGen

If you’re somewhat familiar with FsCheck’s generator combinators, this shouldn’t be hard to read. The idea is that we only generate one of the leaf nodes if the size equals zero, and halve the size if we generate a “real” property combinator to ensure the generation ends at some point. Let’s change the property to incorporate this generator:

let Property = 
        forAllShrink symPropGen shrink (fun symprop ->
            let expected = determineResult symprop 
            let (MkRose (Common.Lazy actual,_)) = generate 1 (Random.newSeed()) (toProperty symprop) 
            areSame expected actual
            |> label (sprintf "expected = %A - actual = %A" expected actual)
            |> collect (depth symprop)

I added a depth function that calculates the depth of the generated symbolic property (i.e. depth of Unit, Bool and Exception is zero, and the rest adds one to the depth of it’s subproperty). Running this gives:

Property.get_Property-Ok, passed 100 tests.
26% 5.
26% 4.
18% 3.
12% 2.
7% 6.
6% 0.
4% 1.

which shows that FsCheck is now generating more properties with greater depth (4-5), and trivial properties only 6% of the time.

Conclusion

This post showed an approach for testing DSLs, combinator libraries or “language-oriented” programs, whatever you’d like to call it:

  1. Write a symbolic representation of the abstract syntax tree of the language you want to test. This symbolic representation is used to abstract from details you don’t want to test, to define semantics of the language later on, and it to generate output for counter-examples that FsCheck may find.
  2. Generate the actual representation in your compiler or interpreter from the symbolic representation.
  3. Specify the semantics of the symbolic representation.
  4. Rely on FsCheck’s built-in generators, or make your own generator to generate the symbolic representation.
  5. Write an FsCheck property to compare the actual with the expected semantics.

This approach may even be usable for testing compilers of general purpose programming languages. The effort involved will be much larger, but it’s easy to start small.

Want more of this? Also check out the Checks.fs file in the FsCheck distribution for a complete test of the FsCheck property language.

Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!