20 December 2008

How to change the accessibility of a constructor using implicit object construction

The recommended way to define a class in F# is by using so-called implicit object construction.  The more traditional (for C# programmers, that is) but typically more tedious explicit object construction syntax feels distinctly less "functional". In short, implicit construction relieves you of writing an explicit constructor for your class, and also allows you to use let and do clauses in the body of your class that take the place of static or instance initializers. Check out Robert Pickering's F# wiki for a nice overview.

The other day I was writing a class that needs only factory methods to construct it, a not so uncommon pattern. In C#, I wouldn't think twice about how to do this: just add a private or internal constructor, and a few public factory methods. This is also straightforward to do using F#'s explicit object construction syntax, but I wondered if it is possible using implicit construction. Turns out it is!

The trick is simple:

type Foo internal()=
static member FactoryMethod = new Foo()

Notice the position of the 'internal' modifier. Modifying the accessibility of the 'Foo' class proper is done in the usual way, by putting the modifier right after the 'type' keyword. You can even mix these up:

type internal Foo private()=

This defines an internal class with a private constructor.

Thanks to Brian and Tomas for helping me out on this one!

03 December 2008

Announcing FsCheck 0.3

I am very pleased to announce a new release of FsCheck, for the first time on codeplex! Here is a short description.

FsCheck is a port of Haskell's QuickCheck: it is a tool for testing F# programs automatically. The programmer provides a specification of the program, in the form of properties which functions, methods or objects should satisfy, and FsCheck then tests that the properties hold in a large number of randomly generated cases. While writing the properties, you are actually writing a testable specification of your program. Specifications are expressed in F#, using combinators defined in the FsCheck library. FsCheck provides combinators to define properties, observe the distribution of test data, and define test data generators.

This release brings FsCheck about up to par with QuickCheck 0.1, and adds some smaller features. An overview:

  • Added feature to derive a generator from the type of arguments. You can add your own generators for custom types. Radically reduces the use of forAll.
  • Added feature to reduce use of prop and propl. For example: "quickCheck (fun a -> a+a = 2*a)" just works.
  • Added feature to group related properties in a class, and check it at once. Can also be used to check all toplevel properties in a module.
  • FsChecks no longer dies when a property throws an exception, but reports a test failure.
  • Bug fixes and suggestions received through this blog, email and hubfs implemented. Thanks Neil, Chance, Mat and Anton, and everyone who emailed me!
  • Some extension points to use FsCheck with whateverUnit frameworks.
  • Updated examples, and documentation, the whole codeplex thing.

It was a lot of work, but I hope it was worth it.

I plan to devote more of my time to other projects from now on, at least for the next coming months. That said, your contributions to FsCheck are more than welcome. Besides, this release should keep you quiet for a while ;).

I will be posting some suggestions for features and improvements soon, should you run out of ideas.

[Update: check out the FsCheck issue tracker for some ideas on how to contribute]

15 November 2008

Future plans for FsCheck

Because of a recent question on HubFS, I made my future plans for FsCheck public. I'm re-iterating them here:

  • smaller bug fixes, some cleaning (mostly done)
  • adding FsCheck.fsi module declaration & comments
  • Feature: derive generators from types of arguments (mostly done)
  • Feature: ability to group properties in classes, and check the class with one command (mostly done)
  • Investigate using FsCheck with unit testing frameworks like MbUnit & co, or generally how to use it in practice (currently active)
  • putting the project on codeplex + clarifying license . I'm checking compatibility with the QuickCheck license, but I'm aiming for a permissive license, like the new BSD license.

I intend to finalize this before the end of the year.

Finally, you can help by answering the following question. How are you using FsCheck? Do you have a separate console app that you run that contains your properties, or do you use F# interactive with the properties inline, or something else?

10 November 2008

How to use named arguments in F#

Named arguments are a little-known feature of F#, that is actually very useful when calling heavily overloaded members. The DateTime constructor is one such example: it has twelve overloads, and I can never remember what (in C#) DateTime d = new DateTime(2008,11,12) means. Is it December 11th or 11 November (as we would say it in Dutch)? Fortunately, in F# there is a nice solution for this: you can always use the names of the formal arguments to specify a named argument. This not only clarifies what overload you're calling, you can also easily change the order of the arguments. For example:

let d = new DateTime(month=12, day=11, year=2008)

makes it very clear that we're talking about December 11th here. If you're from the US, it even looks 'normal'. Europeans may like:

let d' = new DateTime(day=11, month=12, year=2008)

For F#, it's all the same.

You can also mix positional with named arguments, and if the object has setters, you can use these to. For details, read Section 8.4.5 in the F# (draft) language spec.

Technorati: ,

11 October 2008

Fun with units of measure

Since the CTP, F# has a nice feature in its type system, called units of measure. Briefly, it allows you to specify a unit with any float, and the type system checks that your uses of units are consistent. For example, you can't add a float<m> to a float<ft>, which results in, among other things, less crashes of spacecraft on Mars. I won't go into details, if you're interested but don't know what I'm talking about, read Andrew Kennedy's blog before you continue.

But anyway, using units for, well, tracking units is kind of boring - type systems are made to be abused after all. So here's my attempt to use units of measure to track the lengths of vectors, so that the compiler will for example detect that you try to add two vectors of unequal length.

How can this be achieved? The basic idea is to encode the length of the vector as an exponent to a unit, that is kept in the 'safe vector' type as a unit of measure parameter. To do this, we first need a unit of measure and a 'safe vector' type that uses it:

[<Measure>] type length

type SVec<'a,[<Measure>]'l> = 
    SVec of list<'a> * float<'l>

 

So, SVec is a vector containing elements of type 'a, and with a length encoded in 'l. We'll use the float to both encode the length of the string as a value and as a type; the idea is that SVec ([1,5], 2.0) has type SVec<int, length ^2>. Let's define some utility functions to make type safe vectors:

let nil = SVec ([],0.0)
let cons elem (SVec (xs,un:float<'l>)) : SVec<_,'l length> = 
        SVec (elem::xs, (un + 1.0<_>) * 1.0<length>)
let (=+=) = cons
let removeUnits f = f |> box |> unbox<float> 
let length (SVec (_,c)) = removeUnits c |> int 
let from_list xs (l:float<'u>) : SVec<'a,'u> = 
    if List.length xs <> int (removeUnits l) then raise <| new ArgumentException("Actual length of list is not equal to given length")
    else SVec (xs,l)

'nil' constructs an empty vector. It has type SVec<'a>,1>, denoting a list with zero elements. 'cons' adds an element to the front of the vector. To do this, it needs to do three things:

  1. Add the actual element to the front of the list
  2. Update the length  value by adding 1
  3. Update the length type by multiplying with 1.0<length>

To get the length of a vector, we return the unit value, stripping of all static unit types. I tried this first by dividing by 1.0<'u>,  but it appears that you can't use a type argument as a unit of measure with a constant value. I'm not exactly sure why.

'from_list' takes a list and turns it into a safe vector. You need to provide a list, with the length of the list correctly encoded in both value and type. The value is tested, the type is not: as far as I can see, this is impossible to check.

Let's make a safe vector:

let l = 3 =+= (2 =+= (1 =+= nil))

The type of l is SVec<int,length^3>. Now we can define type-safe multiplication and addition:

type SVec<'a,[<Measure>]'l> = 
    SVec of list<'a> * float<'l>
        static member (+) (SVec (xs, l1): SVec<_,'l>, SVec (ys, l2): SVec<_,'l>) = 
            SVec (zip xs ys |> map (fun (x,y) -> x + y), l1)
        static member (*) (SVec (xs, l1) : SVec<_,'l>, SVec (ys, l2): SVec<_,'l>)=
            zip xs ys |> sum_by (fun (x,y) -> x*y)    

The interesting part is of course not the implementation, but the types: they statically enforce that when you add or multiply two vectors, they have the same length. So:

let sum = l + l

will typecheck fine, but

let badsum = l + nil

will give a type error!

That concludes my experiment. Some final thoughts:

  • This cannot be used to make a type safe head and tail operation, as far as I can see. We'd need an additional type to represent the empty list, and somehow make a head function that can transform between the two types - the kind of overloading necessary does not seem to be possible in F#.
  • Try to make a recursive function  with SVec: it's impossible, since generic recursion is impossible.
Technorati: ,

07 October 2008

How to invoke a method with type parameters using reflection (in F#)

Methods with type parameters arise naturally in F# code, for example:

type Example =
    static member Test l = List.rev l

'Test' has one type parameter 'a, the type of the elements of the list that is being reversed.

In F# you can call the Test method with a list of any type, and the compiler infers the type parameters for you:

let result = Example.Test [1;2;3]

The compiler infers that 'a must be an int.

Mainstream languages such as C# and VB.NET only infer the type arguments at the call site. The programmer needs to declare all the type parameters of a method explicitly. Consider the type signature of the following method:

type Example =
    static member Test2 (a,b) = (fst a = snd b)

Test2 has three type arguments: it takes two tuples a and b as its arguments, which is a total of four types. But since Test2 compares the first element of a with the second element of b, these two are inferred to have the same types. In C#, a programmer would have to infer this for herself, and write:

public static bool Test2<T, U, V>(Tuple<T, U> a, Tuple<V, T> b)
{
   return a.Fst == b.Snd;
}

(of course, assuming that a type Tuple would be defined in .NET, which is not the case if you're not using the F# core libraries)

Because it is obvious that such type annotations require much more work by the programmer, in C# and VB.NET type arguments to methods are in my experience much less common, and much less complex, than what is written without a second thought in F#.

However, recently I faced the problem of calling a number of static methods such as the above using reflection. I had the actual arguments to the method available, so for Test2 above I would have the (type-correct, of course) arguments (true,2.0) and ("whatever", false) available.

To call a method using reflection, you can use MethodInfo.Invoke, on a MethodInfo object obtained using typeof<Example>.GetMethod("Test2") or some such. Executing a non-generic method is easy - just call Invoke() with the actual arguments. However, System.Reflection refuses to call Invoke on a method with unspecified generic arguments (T,U and V for Test2) . You need to specify the actual arguments (bool, float and string for Test2 resp.) manually. So, I had to deduce these from the actual arguments given to the method. The following two functions demonstrate how to do this:

let rec resolve (a:Type) (f:Type) (acc:Dictionary<_,_>) =
    if f.IsGenericParameter then
        if not (acc.ContainsKey(f)) then acc.Add(f,a)
    else 
        Array.zip (a.GetGenericArguments()) (f.GetGenericArguments())
        |> Array.iter (fun (act,form) -> resolve act form acc)

let invokeMethod (m:MethodInfo) args =
    let m = if m.ContainsGenericParameters then
                let typeMap = new Dictionary<_,_>()
                Array.zip args (m.GetParameters()) 
                |> Array.iter (fun (a,f) -> 
                    resolve (a.GetType()) f.ParameterType typeMap)  
                let actuals = 
                    m.GetGenericArguments() 
                    |> Array.map (fun formal -> typeMap.[formal])
                m.MakeGenericMethod(actuals)
            else 
                m
    m.Invoke(null, args)

The second function, invokeMethod, can be used as a replacement for MethodInfo.Invoke that also works for methods with generic arguments. The function above only works for static methods, but taking away this restriction should be straightforward.

invokeMethod takes a MethodInfo m(which should be a static method) and the arguments you want to call the method with. First we check if m is a generic method. If not, nothing needs to be done and we can just call Invoke.

If m is a generic method, we build a a typeMap which maps the formal type arguments to their actual types, for which we can use the signature of the method on the one hand (i.e. as given by the MethodInfo), and the types of the actual arguments args. The function 'resolve' does most of the heavy lifting here, building up the typeMap by comparing actual and formal arguments in a pairwise fashion. 'resolve' needs to be called recursively, since type arguments may be nested arbitrarily deeply. For example,   the formal argument list<'a*option<'b>> should resolve with the actual argument list<int*option<bool>> by mapping 'a to int and 'b to bool.

Once we've determined the actual type arguments to the generic method, System.Reflection lets us instantiate an invoke-able method using MethodInfo.MakeGenericMethod(), which takes an array of actual types that fill in the generic type arguments to the method. If we've determined the type arguments correctly, the result of MakeGenericMethod() is another MethodInfo object that can be invoked as usual.

The interested reader can figure out the details.

Some notes:

  • 'resolve' is not tail-recursive. I just didn't bother since you would need to write some extremely convoluted method arguments to blow the stack. list<list<list<list<(repeat x 1000 000)>>>> anyone? Also for the same reason I didn't worry about performance - I don't actually expect to have very deeply nested types. (if type classes were ever added to F#, I guess I would  need to start worrying about that...)
  • I dislike that I used a Dictionary in there, though it seemed to be the most elegant solution. I tried using a Map, but then I had to merge maps when the recursive call to resolve returned. It seemed easier with a (non-functional) Dictionary. If anyone can do better, I'd like to hear about it.
Technorati: ,

15 August 2008

Announcing FsCheck 0.2

[Update: I noticed that the download comes with a preliminary version of random.fs, which contained many mistakes! I've now updated the link. Sorry! I also added the link directly to this post as well.]

[Update 2: I converted the old F# project to a shiny new .fsproj compatible with the F# September CTP which was released yesterday. Had to clean up and change some code as well, mostly due to renames and a different type signature of assert. New download link below.]

Version 0.2 introduces the following changes:

  • Ported the Haskell random implementation, which gives more dependable results when splitting seeds.
  • Introduced lazy evaluation of properties to improve checking of conditional properties. The function prop needs a lazy argument from now on; a shorthand propl can be used if you don't need or want this. (Thanks to galen for bringing this to my attention)
  • Tried lots of approaches to get rid of both prop and lazy when they are not needed. Also tried to get rid of the generators using reflection and quotations. Very few approaches were feasible, none were satisfactory. In the end, nothing changed. I'm hoping for type classes or some other way of overloading and/or implicit parameter passing, but I'll guess that'll be for F# 2.0 at the earliest. If anyone has any suggestions, please let me know.
  • Added some examples in another style, where the property is separated from the actual generator, and added some convenience functions qcheck and vcheck to support this style. This improves readability of the actual properties, and makes them somewhat more reusable.

Breaking change: if you have some properties lying around, the easiest way to get them to work is to replace prop by propl.

Read the updated complete documentation in my previous post. My plan is to keep that post as the up to date and complete documentation of the latest version, and announcing new releases with changes in separate posts like this.

Thanks for your interest, and do keep giving me feedback and suggestions!

Download FsCheck 0.2 source

Download FsCheck 0.2 for the F# September CTP

Technorati: ,

10 May 2008

Parsing Dot with F#: Part 4 - Conclusion

In this very last part, I'll just show how to put everything together: the types, the lexer and the parser. First, we need to generate some F# files from the dotlexer.fsl  and the dotparser.fsy files. This is easily accomplished by running fslex and fsyacc, respectively. This creates dotlexer.fs, dotparser.fs and dotparser.fsi. If, like me, you're working in Visual Studio, you can add these files to your project, but take care that you put everything in the right order or it won't compile (I actually lost quite a bit of time with this): first the definition of the abstract syntax type, then the parser files, then the lexer file. Finally, you can call the parser:

let parseFromString(text) =
    let lexbuf = Lexing.from_string text
    try
        DotParser.start DotLexer.token lexbuf
    with e ->
        let pos = lexbuf.EndPos
        printf "error near line %d, character %d\n%s\n" pos.Line pos.Column (e.ToString())
        Graph (new List<_>())

This is fairly straightforward: the Lexing module defines functions to create a lexing buffer from a string. Feed this buffer into the DotParser.start function generated by fsyacc. This function will return the expected type, as defined in the .fsy file.

That's really all there is to it - the rest is some exception catching and printing an error for debugging purposes. One thing I'd like to warn you about is that debugging this stuff can be quite tedious, due to the fact that first the F# files are generated by the fsyacc and the fslex tools, and afterwards compiled. Although the compiler does a fairly good job of showing where the actual errors are in the fsy and fsl files, it is sometimes hard to track down what exactly is the problem.

Finally, you can download the example project.

Technorati: ,,,

09 May 2008

Parsing Dot with F#: Part 3 - The parser

The parser is produced by fsyacc (yacc stands for Yet Another Compiler Compiler). Basically it turns the tokens produced by the lexer into a nice set of types that you can easily manipulate from within your program. Fsyacc can generate a parser based on a definition that looks pretty much the same as the simplified grammar for Dot that I discussed in part 1.

Our dotparser.fsy file starts off with some preliminaries:

%{

open Ast
open System.Collections.Generic

%}

Same as the lexer, this code will eventually  be transformed into regular F# code. So here we can open some modules for use later, and potentially do other preliminaries.

Then, some parser-related startup:

%start start

%token <string> ID
%token NODE EDGE DIGRAPH GRAPH DIEDGE UNEDGE QUOTE LACC RACC LPAREN RPAREN SEMI COMMA ASSIGN EOF 

%type < Ast.Graph > start

The first and last line indicate to fsyacc where to start parsing (at which rule, to be defined later), and the  type the complete parsing process will construct; if you remember our abstract syntax tree from part 1, we're trying to make a Graph type.

The lines in between are just an enumeration of all the possible terminal tokens (i.e. tokens that needn't be parsed further), along with the type of data that they carry. You should recognize these from the lexer. In our parser, only the ID token carries information of type string.

And finally, the part that makes this all tick:

start: Graph { $1 }

Graph: DIGRAPH ID LACC StmtList SEMI RACC    { ($4:List<_>).Reverse(); Graph $4 }

StmtList: Stmt            { let l = new List<Element>() in l.Add($1); l }
    | StmtList SEMI Stmt  { $1.Add($3); $1 }

Stmt: 
    | ID AttrListOpt             { Node($1,$2) } 
    | ID DIEDGE ID AttrListOpt   { Edge($1,$3,$4) }
    | GRAPH AttrListOpt          { GraphAttributeList($2) }
    | NODE AttrListOpt           { NodeAttributeList($2) }
    | EDGE AttrListOpt           { EdgeAttributeList($2) }

AttrListOpt:
    |                            { new Dictionary<string,string>() }
    | LPAREN AttrList RPAREN     { $2 }

Attr: ID ASSIGN ID               { new KeyValuePair<string,string>($1,$3) }

AttrList: Attr              { let attr = new Dictionary<string,string>() in attr.Add($1.Key,$1.Value); attr }
    | AttrList COMMA Attr   { $1.Add($3.Key, $3.Value); $1 }

Notice that this is very similar to the grammar defined earlier. I guess with some experience one can derive this fsyacc spec from the grammar in a fairly straightforward way. Expect some bumps on the road the first time around though. Notice that with each rule we can again associate the transformation that needs to occur, similar as we did in the lexer. Only this time we're not producing tokens, we're producing the actual F# types that we need. The $n variables in the right part refer to the data that is carried by the nth token on the left.

Some things to be aware of:

  • If you need to parse a list of things, like the StmtList above, you need to do this "backwards", i.e. the second option in StmtList is 'StmtList SEMI Stmt', not 'Stmt SEMI StmtList'. This also means I reverse the list once read in the Graph rule. This is simply a consequence of how the parser produced by fsyacc works under the hood - in this case a minor inconvenience.
  • I chose here to use regular .NET types as much as possible, since I intended to use this code from C#. If you use F# types like list, the code actually becomes a lot shorter. For example, 'let l = new List<Element>() in l.Add($1); l" just becomes '[$1]'.
  • I deliberately reversed the order of the rules Attr and AttrList, so that F# could infer the types of the arguments better. This is a result of the fact that F# parser top to bottom, left to right. If you get a lot of type inference errors, it pays off to experiment with the order of your statements, if you can reverse them. I try to avoid adding type annotations as much as possible, they make the code harder to read and to maintain. (in C# my use of the C# 'var' keyword has increased exponentially)

That's it; next time, I'll show you how to put everything together and call the parser.

02 May 2008

FsCheck: Random testing for F#

[Updated for version 0.2]

[Update: this documentation is no longer kept up to date as of FsCheck 0.3. Please check out FsCheck on GitHub for up to date releases, information and what not.]

FsCheck is an as-literal-as-possible port of Haskell's QuickCheck 0.1. Using FsCheck, you define properties that should hold for all your functions. FsCheck checks that those properties hold by generating random values for the parameters of the properties. While writing the properties, you are actually writing a testable specification of your program.

This release is still a bit rough around the edges; basic things work, but there's for example no easy way to check all properties in a given file. Also parts of FsCheck itself are still untested (ah, the irony). I wrote FsCheck mainly as an exercise  - learn from the best, or suck like the rest, so Haskell seems like a good place to start.

The next part of this post is  a manual for using FsCheck; most of it is almost literally "adapted" from the QuickCheck manual and paper. Any errors and omissions are entirely my responsibility.

What is FsCheck?

FsCheck is a tool for testing F# programs automatically. The programmer provides a specification of the program, in the form of properties which functions should satisfy, and FsCheck then tests that the properties hold in a large number of randomly generated cases. Specifications are expressed in F#, using combinators defined in the FsCheck library. FsCheck provides combinators to define properties, observe the distribution of test data, and define test data generators.

FsCheck is currently best used to test the functions in your F# programs. I may be experimenting in the future with using it to test classes and methods, in combination with existing .NET unit testing frameworks. The generator combinators can be used already in any testing framework to easily generate a number of random values for many types. Mostly these are F# specific at this point (tuples, option values, function values), but that should be easy to remedy.

A Simple Example

A simple example of a property definition is

let prop_RevRev (xs:list<int>) = List.rev (List.rev xs) = xs  |> propl

This property asserts that the reverse of the reverse of a list of integers is the list itself. (the propl function comes with FsCheck and is explained shortly) To check the property, we load this definition in F# interactive and then invoke

> qcheck (Gen.List(Gen.Int)) prop_RevRev;;
Ok, passed 100 tests.
val it : unit = ()

As the example shows, you indicate to FsCheck the type to generate for the property (in this case, a list of ints).

When a property fails, FsCheck displays a counter-example. For example, if we define

let prop_RevId xs = List.rev xs = xs |> propl

then checking it results in

> qcheck (Gen.List(Gen.Int)) prop_RevId;;
Falsifiable, after 8 tests:
["[-2; -4; -1]"]
val it : unit = ()

Using FsCheck

To use FsCheck, you download the latest FsCheck solution at the very end of this post. Build and reference the assembly in any projects containing specifications or test data generators. You can then test properties by loading the module they are defined in into F# interactive, and calling
qcheck <generator> <property_name>;;
or by running and writing a small console application that calls the qcheck function.

What do I do if a test loops or encounters an error?

In this case we know that the property does not hold, but qcheck does not display the counter-example. There is another testing function provided for this situation. Repeat the test using

vcheck <generator> <property_name>;;

which displays each test case before running the test: the last test case displayed is thus the one in which the loop or error arises.

Properties

Properties are expressed as F# function definitions, with names beginning with prop_. Properties can be defined together with their generators, by universal quantification over their parameters using the function forAll, which takes the form

forAll <generator> (fun arg -> <property>)

The first argument of forAll is a test data generator. The second argument is a function that takes as argument one value generated by the test data generator, and returns the property that needs to be checked. The values generated by the test data generator must be of the correct type for the argument, which is checked at compile time. For example,

let prop_RevRev = 
    forAll (Gen.List(Gen.Int)) (fun xs -> 
        List.rev (List.rev xs) = xs |> propl)

will generate a number of int lists for the argument xs, and evaluate the function using the generated value.

The core of a property is an assertion function that returns a bool. Such a function should be made into a Property type by calling the function propl. This allows you to use property combinators explained below.

`Polymorphic' properties, such as the one above, are restricted to a particular type to be used for testing, which is done by specifying a generator. The anonymous function given to forAll can take only one argument: if a property needs more, they should be tupled using Gen.Tuple.

Note: thanks to Haskell's type classes, QuickCheck does not force you to use forAll or needs you to use prop. I do realize it makes the properties in F# harder to read - at the moment I see no elegant ways around this. But in FsCheck you can separate the generators from the properties using qcheck and vcheck. In the examples you can find examples of both styles. For clarity, I'll use the forAll function in the following.

Conditional Properties

Properties may take the form

<condition> ==> <property>

For example,

let rec ordered xs = match xs with
                     | [] -> true
                     | [x] -> true
                     | x::y::ys ->  (x <= y) && ordered (y::ys)
let rec insert x xs = match xs with
                      | [] -> [x]
                      | c::cs -> if x<=c then x::xs else c::(insert x cs)
let prop_Insert = forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
                    ordered xs ==> propl (ordered (insert x xs)))

Such a property holds if the property after ==> holds whenever the condition does.

Testing discards test cases which do not satisfy the condition. Test case generation continues until 100 cases which do satisfy the condition have been found, or until an overall limit on the number of test cases is reached (to avoid looping if the condition never holds). In this case a message such as
Arguments exhausted after 97 tests.
indicates that 97 test cases satisfying the condition were found, and that the property held in those 97 cases.

Since F# has eager evaluation by default, the above property does more work than necessary: it evaluates the property at the right of the condition no matter what the outcome of the condition on the left. While only a performance consideration in the above example, this may limit the expressiveness of properties as well - consider:

let prop_CheckLazy2 = forAll (Gen.Int) (fun a -> a <> 0 ==> propl (1/a = 1/a)))

Will throw an exception. However, FsCheck also provides the function prop which makes a lazy property, and so requires the F# keyword lazy to force the lazy evaluation of the condition:

let prop_CheckLazy2 = forAll (Gen.Int) (fun a -> a <> 0 ==> prop (lazy (1/a = 1/a)))

Works as expected. propl is actually a shorthand for prop (lazy ...)), but beware: the argument of propl is evaluated eagerly!

Generators

There are number of generators for some common types, but by supplying a custom generator, instead of using the default generator for that type, it is possible to control the distribution of test data. In the following example, by supplying a custom generator for ordered lists, rather than filtering out test cases which are not ordered, we guarantee that 100 test cases can be generated without reaching the overall limit on test cases:

let prop_Insert = forAll (Gen.Tuple(Gen.Int, orderedList(Gen.Int))) (fun (x,xs) -> 
                    propl (ordered (insert x xs)))

Combinators for defining generators are described below.

Observing Test Case Distribution

It is important to be aware of the distribution of test cases: if test data is not well distributed then conclusions drawn from the test results may be invalid. In particular, the ==> operator can skew the distribution of test data badly, since only test data which satisfies the given condition is used.

FsCheck provides several ways to observe the distribution of test data. Code for making observations is incorporated into the statement of properties, each time the property is actually tested the observation is made, and the collected observations are then summarized when testing is complete.

Counting Trivial Cases

A property may take the form

trivial <condition> <property>

For example,

let prop_Insert = forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) 
                    (fun (x,xs) -> 
                    ordered xs ==> propl (ordered (insert x xs))
                    |> trivial (List.length xs = 0)) 

Test cases for which the condition is true are classified as trivial, and the proportion of trivial test cases in the total is reported. In this example, testing produces

> quickCheck prop_Insert;;
Ok, passed 100 tests (46% trivial).
val it : unit = ()
Classifying Test Cases

A property may take the form

classify <condition> <string> <property>

For example,

let prop_Insert2 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail")

Test cases satisfying the condition are assigned the classification given, and the distribution of classifications is reported after testing. In this case the result is

> quickCheck prop_Insert2;;
Ok, passed 100 tests.
48% at-tail, at-head.
25% at-head.
21% at-tail.
val it : unit = ()

Note that a test case may fall into more than one classification.

Collecting Data Values

A property may take the form

collect <expression> <property>

For example,

let prop_Insert3 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> collect (List.length xs))

The argument of collect is evaluated in each test case, and the distribution of values is reported. The type of this argument is printed using any_to_string. In the example above, the output is

> quickCheck prop_Insert3;;
Ok, passed 100 tests.
46% 0.
36% 1.
11% 2.
6% 3.
1% 4.
val it : unit = ()
Combining Observations

The observations described here may be combined in any way. All the observations of each test case are combined, and the distribution of these combinations is reported. For example, testing the property

let prop_Insert3 = 
    forAll (Gen.Tuple(Gen.Int, Gen.List(Gen.Int))) (fun (x,xs) -> 
        ordered xs ==> propl (ordered (insert x xs))
        |> classify (ordered (x::xs)) "at-head"
        |> classify (ordered (xs @ [x])) "at-tail"
        |> collect (List.length xs))   

produces

> quickCheck prop_Insert3;;
Ok, passed 100 tests.
46% 0, at-tail, at-head.
18% 1, at-head.
17% 1, at-tail.
6% 2, at-head.
3% 3.
2% 2.
2% 3, at-tail.
2% 2, at-tail.
1% 4.
1% 3, at-head.
1% 2, at-tail, at-head.
1% 1, at-tail, at-head.
val it : unit = ()

 
from which we see that insertion at the beginning or end of a list has not been tested for lists of four elements.

Test Data Generators: The Type Gen

Test data is produced by test data generators. FsCheck defines default generators for some often used types, but you can use your own, and will need to define your own generators for any new types you introduce.

Generators have types of the form 'a Gen; this is a generator for values of type a. The type Gen is a computation expression called gen, so F#'s computation expression syntax can be used to define generators.

Generators are built up on top of the function

val choose : (int * int -> int Gen)

which makes a random choice of a value from an interval, with a uniform distribution. For example, to make a random choice between the elements of a list, use

let chooseFromList xs = gen { let! i = choose (0, List.length xs) 
                              return (List.nth xs i) }

Note that choose does not include the upper bound of the interval, so you can use List.length without deducting 1.

Choosing Between Alternatives

A generator may take the form

oneof <list of generators>

which chooses among the generators in the list with equal probability. For example,

oneof [ gen { return true }; gen { return false } ]

generates a random boolean which is true with probability one half.

We can control the distribution of results using the function

val frequency: (int * 'a Gen) list -> 'a Gen 

instead. Frequency chooses a generator from the list randomly, but weights the probability of choosing each alternative by the factor given. For example,

frequency [ (2, gen { return true }); (1, gen { return false })]

generates true two thirds of the time.

The Size of Test Data

Test data generators have an implicit size parameter; FsCheck begins by generating small test cases, and gradually increases the size as testing progresses. Different test data generators interpret the size parameter in different ways: some ignore it, while the list generator, for example, interprets it as an upper bound on the length of generated lists. You are free to use it as you wish to control your own test data generators.

You can obtain the value of the size parameter using

val sized : ((int -> 'a Gen) -> 'a Gen)

sized g calls g, passing it the current size as a parameter. For example, to generate natural numbers in the range 0 to size, use

sized <| fun s -> choose (0,s)

The purpose of size control is to ensure that test cases are large enough to reveal errors, while remaining small enough to test fast. Sometimes the default size control does not achieve this. For example, towards the end of a test run arbitrary lists may have up to 50 elements, so arbitrary lists of lists may have up to 2500, which is too large for efficient testing. In such cases it can be useful to modify the size parameter explicitly. You can do so using

val resize : (int -> 'a Gen -> 'a Gen)

resize n g invokes generator g with size parameter n. The size parameter should never be negative. For example, to generate a random matrix it might be appropriate to take the square root of the original size:

let matrix gn = sized <| fun s -> resize (s|>float|>sqrt|>int) gn
Generating Recursive Data Types

Generators for recursive data types are easy to express using oneof or frequency to choose between constructors, and F#'s standard computation expression syntax to form a generator for each case. There are also liftGen functions for arity up to 4 (more are easy to add) to lift constructors and functions into the Gen type. For example, if the type of trees is defined by

type Tree = Leaf of int | Branch of Tree * Tree

then a generator for trees might be defined by

let rec unsafeTree = 
    oneof [ liftGen (Leaf) Gen.Int; 
            liftGen2 (fun x y -> Branch (x,y)) unsafeTree unsafeTree]

However, there is always a risk that a recursive generator like this may fail to terminate, or produce very large results. In any case, the F# compiler generates an error for the above function for exactly this reason: the function calls itself. To avoid this, recursive generators should always use the size control mechanism. For example,

let tree =
    let rec tree' s = 
        match s with
          | 0 -> liftGen (Leaf) Gen.Int
          | n when n>0 -> 
            let subtree = tree' (n/2) in
            oneof [ liftGen (Leaf) Gen.Int; 
                    liftGen2 (fun x y -> Branch (x,y)) subtree subtree]
    sized tree'

Note that

  • We guarantee termination by forcing the result to be a leaf when the size is zero.
  • We halve the size at each recursion, so that the size gives an upper bound on the number of nodes in the tree. We are free to interpret the size as we will.
  • The fact that we share the subtree generator between the two branches of a Branch does not, of course, mean that we generate the same tree in each case.
Useful Generator Combinators

If g is a generator for type t, then

  • two g generates a pair of t's,
  • three g generates a triple of t's,
  • four g generates a quadruple of t's,
  • vector n g generates a list of n t's.
  • If xs is a list, then elements xs generates an arbitrary element of xs.
Default Generators

FsCheck uses static members of the type Gen to define default test data generators for some types. FsCheck defines instances for the types (), bool, int, float, pairs, triples, quadruples, lists, and functions. You can define new ones by making new instances of 'a Gen.

To generate random functions of type 'a->'b, you need to provide an instance of type 'a-> 'b Gen -> 'b Gen, which we'll call the cogenerator. The implementation of ('a->'b) Gen uses a cogenerator for type a. If you only want to generate random values of a type, you need only to define a generator for that type, while if you want to generate random functions over the type also, then you should define a cogenerator as well.

A cogenerator function interprets a value of type a as a generator transformer. It should be defined so that different values are interpreted as independent generator transformers. These can be programmed using the function

val variant : (int -> 'a Gen -> 'a Gen)

For different natural numbers i and j, variant i g and variant j g are independent generator transformers. The argument of variant must be non-negative, and, for efficiency, should be small. Cogenerators can be defined by composing together generator transformers constructed with variant.

For example,for the type Tree defined above, a suitable instance of a cogenerator, allowing you to define functions Tree -> 'a, can be defined as follows

let rec cotree t = 
    match t with
       | (Leaf n) -> variant 0 << Co.Int n
       | (Branch (t1,t2)) -> variant 1 << cotree t1 << cotree t2
Properties of Functions

Since FsCheck can generate random function values, it can check properties of functions. For example, we can check associativity of function composition as follows:

let treegen = (Gen.Tuple(tree,three (Gen.Arrow(cotree,tree))))
let prop_Assoc gen = forAll gen (fun (x,(f,g,h)) -> 
    propl ( ((f >> g) >> h) x = (f >> ( g >> h)) x ))

where we use the generator and cogenerator for trees defined above; we thus generate functions Tree -> Tree. If a counter-example is found, function values will be displayed as "<func>".

FsCheck internals

FsCheck follows the Haskell implementation of QuickCheck almost literally. For some explanation, read the original QuickCheck paper.

In the implementation, there are still a few issues that I'm unhappy with:

  • I ported Haskell's random implementation, but I'm still not 100% satisfied with it.
  • decodeFloat is somewhat amateurish. Might even be wrong. I need to test the generator for floats more thoroughly.
  • It would be cool to have more standard generators, e.g. for strings and more .net oriented types, like Func<>, TimeSpan, DateTime and so on.

Download

Finally, download the latest Visual Studio solution from my Skydrive folder: FsCheck 0.2

Older releases:

Enjoy!

11 April 2008

WPF Pointers

I recently went to a session at Microsoft Belgium for ISV's, where Tim Sneath gave a packed session about WPF, answering some question but mainly giving a wealth of pointers of the WPF related information out there. Enjoy!

WPF/Silverlight reference applications

Silverlight in financials

Family.Show

NBC Olympics in Silverlight (Tim showed a demo but it was local on his machine - don't think it is publicly available as the time of this writing)

Hard rock memorabilia (Tim also showed an application for actually making the deep zoom effect that you can see on the site, I guess it's included when with Silverlight 2 somehow?)

WPF Performance

WPF performance blog

WPF Performance white paper

Optimizing WPF performance on MSDN

WPF Performance suite download

WPF controls and dev tools

Kevin's bag-o-tricks

Kaxaml (XamlPad, but better)

Blendables (the only link in this list which is not free, but some nice samples in the blog, so worthwhile to visit even if you're not spending)

Snoop

Mole

Neat Trick

Did you know that an .xps file is actually just a zip file with various files in them, and that one of those files contains a xaml dialect? This is really cool to export clip art from office. Tim just took a random Office piece of clip art, printed it to xps, renamed the file extension to .zip, unpack and pasted a part of one of the files in XamlPad. One or two adjustments for fonts later, and the clip art displayed in XamlPad. Nice to keep in mind!

Technorati:

09 April 2008

Parsing Dot with F#: Part 2 - The lexer

Now that the preliminaries are over, we can get on with using the first part in the fslex/fsyacc tandem: the lexer. The lexer's responsibility is to break up the long input text into tokens that are better manageable for the parser, which in turn will produce the actual abstract syntax tree. So, instead of having to deal with the string "digraph MyGraph {  0 [label="a string_{}"] }", the lexer will turn this into a list of tokens, where each token can be attributed with some information. For the example, the output of our lexer is the following list of tokens: DIGRAPH ID["MyGraph"] LACC ID["0"] LBR ID["label"] ASSIGN ID["a string_{}"] RBR RACC.  As you can see, the only token with an attribute is the ID, in which all kinds of string names are parsed. Notice also that any ID string can contain "control characters" like { and }. The lexer's main job is to filter those kinds of things out so the parser's job is much easier.

You can tell fslex how to tokenize a string by providing it with a list of rules. Each rule is a series of regular expressions-action pairs, each producing an (attributed) token. These rules are specified in a special format, which is pre-processed to an actual F# code file by the fslex tool. I'm not going to explain that in detail, there are other sources for that.

Our parser.fsl file starts with some preliminaries:

{
open System
open DotParser
open Lexing
}

let digit = ['0'-'9']
let whitespace = [' ' '\t' ]
let newline = ('\n' | '\r' '\n')
let letter = [ 'a'-'z' 'A'-'Z' '_']

Notice here that, besides opening the System and Lexing namespaces, the DotParser namespace is opened. That's because the type that defines the tokens (actually a regular discriminated union type) is defined in the fsyacc file. We'll come back to that in the next post. Then, some basic regular expressions that are useful for almost any parser, are defined. If you're at all familiar with regular expressions, this stuff should be easy. Notcie that these are just regular let variable bindings in F#. Fslex just translates this file to actual F# code, so you're free to use F# constructs as you see fit.

Our actual parser consists of 2 rules. This is the first:

rule token = parse
| whitespace     { token lexbuf }
| ('\r' | '\t')  { token lexbuf }
| "digraph"      { DIGRAPH }
| "node"         { NODE }
| "edge"         { EDGE }
| "graph"        { GRAPH }
| "->"           { DIEDGE }
| "--"           { UNEDGE }
| "\""           { ID ( (string lexbuf.StartPos "" lexbuf) ) }
| '{'            { LACC }
| '}'            { RACC }
| '['            { LPAREN }
| ']'            { RPAREN }
| ';'            { SEMI }
| ','            { COMMA }
| '='            { ASSIGN }
| letter(letter|digit)*                { ID (lexeme lexbuf) }
| ['-']?('.'digit+ | digit+('.'digit*)? ) { ID (lexeme lexbuf) }
| newline        { lexbuf.EndPos <- lexbuf.EndPos.NextLine; token lexbuf }
| eof            { EOF }

On the left, written like a pattern match, are the regular expressions that will be matched against the input string. On the right, the tokens that are returned. As you can see in the line for lexing IDs, you can access the current lexeme, i.e. the currently-being-lexed part of the input string, by calling lexeme lexbuf. Just skipping the current lexeme can be achieved by calling token lexbuf. Also, it is useful if you keep track of the line number for debugging later. That's what the action in the newline pattern does.

One gotcha here is that the order of these regular expressions matter not only for lexing (a line higher up will take precedence over a lower line), but also for type inference. The above rule needed no type annotations, but try placing the newline rule at the top; after pre-processing, F# will complain during compilation that it needs some type annotations. It's not a big deal, but if the order does not matter for lexing, it is nicer and more flexible to change later if you don't need to put any type annotations.

One line in the above sample may puzzle you:

| "\""            { ID ( (string lexbuf.StartPos "" lexbuf) ) }

That's because this calls a second rule, called  string, which parses a string,i.e. something between double quotes ("). We go into this second rule when we encounter an open double quotes in the input. Such multiple rules are useful when you have different contexts in the input string in which different lexing rules apply. For dot, a string can contain characters such as brackets, arrows, semicolons and such that have a particular meaning in the main dot file. But when they're between double quotes, the lexer is in another context, where these characters lose their meaning and should just be added to a string that is part of the name of some entity. So once we encounter a " in our main file, we start parsing with the second rule; once the second rule finds a ", we return the string parsed so far and continue in the main rule:

and string pos s = parse
| "\\"    { string pos s lexbuf }
| "\r"    { string pos s lexbuf }
| "\""    { s }
| "\n"    { lexbuf.EndPos <- lexbuf.EndPos.NextLine;
            string pos s lexbuf }
| eof     { failwithf "end of file in string started at or near %A" pos }
| _       { string pos (s + (Lexing.lexeme lexbuf)) lexbuf }

As you can see, the string function takes the pos of the start of the string (for debugging) and accumulates the string in its parameter s. If the third pattern matches, the rule is finished: we've encountered the second ". The last pattern matches with anything that isn't matched by the previous lines, so we add it to the s accumulator.

The other matches, 1st 2nd and 4th match, are control characters that are simply skipped, and in the case of newline we update the line number. I did this because graphviz has the annoying habit of inserting a newline after a certain number of characters on one line. Presumably this is for readability, but why anyone would want to read dot files is beyond me. Anyway, Graphviz inserts a \ followed by \r in such cases, and this is what the two first rules filter out. This solved quite some pains afterwards when interpreting IDs ("230,345" is a much easier string to interpret than "230,3\\r45", especially when you don't know if, when or where graphviz will insert a line break. Yes, it actually did this in the middle of numbers.).

By the way, another typical example where it is useful to have multiple rules is when you would like to parse comments; also there the normal lexing rule does not apply. In fact, I adapted the above string function from a similar function designed to parse comments in the excellent Expert F# book.

That's it for the lexer. Paste all the stuff from this post together, and you should have a working .fsl file. Not compilable yet I'm afraid, you need the token definitions which are produced by the parser (remember the open DotParser?).

You've made it halfway! Next post: the parser.

Technorati: ,,,

02 April 2008

Parsing Dot with F#: Part 1

A while back I decided to write a parser for dot, the language used by Graphviz. This is both my first real project in F#, and in parsing. I learned the basics from the excellent Expert F# book.

I'll try to explain my solution, using fslex and fsyacc, in the order that I tackled the problem.  There are a few more basic examples out there, explaining what parsing is, what a lexer and a parser are etc., However it seems the examples given are always small, typically parsing some expressions. Graphviz' dot definitely has more of a real world flavor, and I'll present it as a real world example of using fslex and fsyacc, without explaining much about those tools per se.

This is how I see this mini-series play out:

  • Simplified dot grammar and abstract syntax tree
  • The lexer
  • The parser
  • Putting it all together

This post is part 1.

Graphviz

Graphviz is a command line tool that takes a description of a graph as input, and outputs a description (or an image) of a layout of the graph. For example, as input you can give:

digraph G {
0 [label="Type1<>", shape=box];
1 [label="Type1<Type2>", shape=box];
2 [label="T", shape=box];
3 [label="Type2", shape=box];
4 [label="Type2[], Type2*, Type2&", shape=box];
5 [label="#Type2", shape=box];
6 [label="Type2.Type3", shape=box];
0 -> 1 [ label="MakeGenericType(Type2)"];
0 -> 2 [ label="GetGenericArguments()"];
1 -> 0 [ label="GetGenericTypeDefinition()"];
1 -> 3 [ label="GetGenericArguments()"];
2 -> 0 [ label="DeclaringType"];
3 -> 4 [ label="MakeArrayType(), MakePointerType(), MakeByRefType()"];
3 -> 6 [ label="GetNestedType(Type3.Name)"];
4 -> 3 [ label="GetElementType()"];
5 -> 3 [ label="BaseType"];
6 -> 3 [ label="DeclaringType"];
}

Basically, just a sequence of nodes and edges annotated with attributes. In the above example, only label and shape are used, but dot supports many, many more. Given such a file, graphviz outputs:

digraph G {
    node [label="\N"];
    graph [bb="0,0,1028,328"];
    0 [label="Type1<>", shape=box, pos="502,302", width="1.03", height="0.50"];
    1 [label="Type1<Type2>", shape=box, pos="380,210", width="1.61", height="0.50"];
    2 [label=T, shape=box, pos="625,210", width="0.75", height="0.50"];
    3 [label=Type2, shape=box, pos="568,118", width="0.78", height="0.50"];
    4 [label="Type2[], Type2*, Type2&", shape=box, pos="249,26", width="2.61", height="0.50"];
    5 [label="#Type2", shape=box, pos="703,210", width="0.92", height="0.50"];
    6 [label="Type2.Type3", shape=box, pos="784,26", width="1.42", height="0.50"];
    0 -> 1 [label="MakeGenericType(Type2)", pos="e,322,216 465,301 381,298 181,288 161,266 156,259 156,252 161,246 170,235 252,223 312,217", lp="263,256"];
    0 -> 2 [label="GetGenericArguments()", pos="e,623,228 539,298 561,293 589,284 607,266 615,258 619,247 621,237", lp="708,256"];
    1 -> 0 [label="GetGenericTypeDefinition()", pos="s,465,293 455,290 430,284 403,275 394,266 385,256 381,240 380,228", lp="504,256"];
    1 -> 3 [label="GetGenericArguments()", pos="e,540,121 394,192 405,179 420,163 437,154 467,137 504,128 531,123", lp="528,164"];
    2 -> 0 [label=DeclaringType, pos="s,539,300 548,300 628,295 785,284 802,266 807,259 807,252 802,246 781,222 691,237 661,228 658,227 655,226 652,225", lp="863,256"];
    3 -> 4 [label="MakeArrayType(), MakePointerType(), MakeByRefType()", pos="e,155,36 540,117 432,114 50,100 32,82 26,75 27,68 32,62 40,53 93,44 145,37", lp="251,72"];
    3 -> 6 [label="GetNestedType(Type3.Name)", pos="e,733,40 596,101 616,89 645,73 671,62 688,54 707,48 725,43", lp="781,72"];
    4 -> 3 [label="GetElementType()", pos="s,540,110 532,107 515,100 495,92 480,82 470,75 472,67 461,62 441,51 390,42 343,36", lp="557,72"];
    5 -> 3 [label=BaseType, pos="e,595,136 676,192 655,178 625,157 602,141", lp="692,164"];
    6 -> 3 [label=DeclaringType, pos="s,596,117 605,117 687,113 875,103 894,82 915,58 873,43 835,34", lp="956,72"];
}

The basic structure of the file is the same: a sequence of nodes and edges, but Graphviz has added position, height, width and other layout info. This is actually the file that was used to draw the graph in my previous post.

The dot grammar

When parsing using fslex and fsyacc, the first thing you should find or make is a grammar of the thing you're trying to parse. Everything else sort of flows from there. Luckily, the complete dot grammar can be found here. I thought it was a bit complicated for my purposes, so I simplified the grammar a bit:

(in the following, terminals are shown in bold. Literal characters are given in single quotes. Parentheses ( and ) indicate grouping when needed. Square brackets [ and ] enclose optional items. Vertical bars | separate alternatives.)

graph    :    digraph [ ID ] '{' stmt_list '}'
stmt_list:    [ stmt [ ';' ] [ stmt_list ] ]
stmt     :    node_stmt
         |    edge_stmt
         |    attr_stmt /*defines a default attribute*/
attr_stmt:    (graph | node | edge) attr_list
attr_list:    '[' attr  [ ',' ] [ attr_list ] ']' 
atrtr    :    ID '=' ID  
edge_stmt:    node_id -> node_id [ attr_list ]
node_stmt:    node_id [ attr_list ]
node_id  :    INT

If you compare with the original dot grammar, I made the following simplifications:

  • No node ports
  • No sub-graphs
  • No short definition of multiple edges (a -> b -> c)
  • no HTML IDs
  • only digraphs

Basically this grammar says that a graph is "digraph name { bunch of node, edge or default statements }". We've already seen node and edge statements; default statements basically just set an attribute on all the nodes and edges that follow. It is overridden by an attribute of the same name on a node or edge itself, or by a new default statement.

The abstract syntax tree

Based on that grammar, I came up with the following abstract syntax tree using F# discriminated unions:

#light

open List
open System
open System.Collections.Generic

type Attributes = Dictionary<string,string>

type Element = 
    | Node of string * Attributes
    | Edge of string * string * Attributes
    | GraphAttributeList of Attributes
    | NodeAttributeList of Attributes
    | EdgeAttributeList of Attributes

type Graph = Graph of List<Element>

It's easiest to read this from bottom to top (unfortunately it needs to be defined the other way round, otherwise F# has difficulty parsing). A graph is a list of elements. An element can either be a node, an edge or a default attribute list for the graph, the subsequent nodes or the subsequent edges. Each of these elements can have a number of attributes. Attributes are simply presented as a Dictionary. The abstract syntax tree was fairly straightforward to build from the grammar; I expect the same for any well-written grammar.

The abstract syntax tree is the parser's interface to the outside, so it's important that you think about how you're going to use the parser when making decisions about the representation of the abstract syntax tree. For example, I took care not to use any F# specific types in the above AST definition, so that client assemblies in other language would not need to reference F# specific assemblies. On that topic, don't forget to compile your F# assemblies with the --standalone flag, otherwise client assemblies will still need some F# specific libraries (e.g. discriminated unions implement IStructuralHash, so clients also need to know this interface).

Another issue you should think about is how 'deep' you want to parse. For example, it would be theoretically possible to define separate cases for each of the different types of attributes that can be generated by Graphviz. This would also allow us to parse some of the arguments (e.g. the list of position coordinates could be parsed into a list of tuples). However, given the large amount and frequent changes in Graphviz dot attributes, I choose not to take this route.

Next episode: the lexer!

16 March 2008

Reflection: Type unraveled

The Type class in .NET is not the easiest to understand, I believe because of the many different "kinds of types" that are modeled by it: generic types, generic type parameters, reference types, pointer types, array types, etc. Furthermore, sometimes you have one kind of type (e.g. a generic type) and want to make another kind (i.e. "instantiate" a generic parameter with another type). The Type class offers all the necessary methods to do the feasible conversions; however it is not always easy to find. The following graph should help.

The boxes show the different kinds of types that exist. An example of Type1<> is typeof(IList<>); Type1<Type2> is typeof(IList<string>) and so on. The arrows denote how you can obtain one kind of type from the other. For example, if you have a generic type definition (e.g. typeof(IList<>)) you can make this into an instantiatable type (typeof(IList<string>))by calling typeof(IList<>).MakeGenericType(typeof(string)). With "#Type2" I mean a subtype of Type2.

Hope this helps someone.

PS the graph was laid out by Graphviz's dot layout. It was rendered using a self-made WPF control. More on that later.

Technorati : , ,
Del.icio.us : , ,

15 March 2008

A graph control for WPF

Recently, I decided to write a WPF control that can display graphs (I mean, the kind with edges and nodes; not charts). This post describes the beginning of that little project. It does not contain many technical details, just a few hints and pointers for anyone who would like to start a similar project.

I have the time nor the inclination to write a graph layout algorithm myself. Graph algorithms are complex and fairly specific, and I couldn't really find a good overview of the algorithms involved on the net. I found many references to papers, but it seems some papers are hard to get if you don't have a subscription to the ACM portal. (When I was still a PhD candidate this stuff was much easier). So much for free science.

Maybe I can reuse a graph layout algorithm from some existing Winforms control? After some searching, it seems that .NET lacks a free graph layout control. A good graph algorithm library is QuickGraph , but as far as I can see it does not offer any algorithm for layout. Another library is GLEE from Microsoft, but that's is not available for commercial usage (something I'd like for later when my imaginary startup will take over the world).

Outside the .NET world then? After all, .NET has excellent interoperability.
Graphviz by AT&T is an excellent graph layout utility. Not only can it do beautiful hierarchical graphs (like GLEE does) it can also do spring layout and radial graphs, is fast and is highly configurable. Graphviz reads graphs in a proprietary format, called dot. It then layouts the graph, and can export to a variety of formats, of which attributed dot is one. Attributed dot is basically the same as the dot input file, but attributed with node, edge and label positions, as well as other layout info.

Another C++ based alternative is Dynagraph which is a daughter of Graphviz but is geared toward dynamical graph layout. Whereas Graphviz takes an input graph, does layout and produces an output graph, Dynagraph can add and delete nodes and edges incrementally, keeping existing positions the same as much as possible.

Graphviz Dynagraph
.NET interoperability
  • P/Invoke
  • (COM)
  • 'incrface' in and out
  • dot
Maintained Actively

Hardly

Incremental No Yes
Algorithms Hierarchical, spring, radial, circular Hierarchical
Speed Fast, even for large graphs Noticeably slower than Graphviz for large graphs

After some tinkering with Dynagraph, which I liked better at first because of its flexibility, and after some back and forth email with Gordon Woodhull, the maintainer of Dynagraph, it turned out that Dynagraph COM support is not completely implemented yet. By the time we came to that conclusion, I had already sort of given up and started on an interface for Graphviz using the Wingraphviz COM interface.

It turns out that a few people before me had thought of that. Quickgraph can actually output a dot file to be consumed by Graphviz. Furthermore, D ot2Wpf is a WPF control that parses output from Graphviz.

You'd think I'm done, right? So did I. However, there were a few more things I expected from my graph control than dot2wpf offered. For example, I'd like to change the color or line style of an edge without having to do a complete layout of the graph. So I figured I'd take the dot2wpf source and add some features. That didn't quite work out as planned.
That's because dot2wpf actually parses the 'plain' output format of Graphviz. It is supposedly easier to parse than the dot output format, but loses some information. It turns out that it contains a flaw: for cyclic graphs, the plain format does not allow you to determine whether an arrow should be at the beginning or the end of an edge. Graphviz' hierarchical layout algorithm cannot handle cycles, so it reverses one edge in a cycle automatically for layout purposes. However, in the plain output format there is no indication of this reversal, and the points of the edge are given in the reverse direction. There's no way for dot2wpf to distinguish these reversed edges from normal edges in the plain output format, so it has no way to determine where to draw the arrow.

That one took me a while to figure out.

So it turns out that to make a faithful representation of a graph layout by Graphviz, I need to parse the (fairly complicated) dot output format of Graphviz. So my idea now is that the WPF graph control takes as input a string containing a dot representation of the graph it should draw (such a dot representation is easy to get from e.g. Quickgraph), passes this string to Graphviz which passes an annotated dot file back. The control parses it, and draws the visuals on the canvas.

Since I've been wanting to learn F# for a while, and F# comes with a bunch of parsing tools (fslex and fsyacc), I thought it would be a good idea to use F# to write a dot parser - it could be useful for other things than the graph control. More on parsing with F# in a later post.

Technorati : , , ,
Del.icio.us : , , ,

14 January 2008

Comparing presentation styles

 I recently watched a recorded presentation by Jim Hugunin and John Lam, entitled "Just Glue It! Ruby and the DLR in Silverlight".

The presentation made me wish that more performers would be like Edsger W. Dijkstra, or at least make an effort towards imitating his style. See and listen for yourself, and I hope you'll share my preference. Unfortunately the "demo-oriented" presentation style, in which contrived demonstrations that require nor provide any insight are exchanged for content, is becoming common.

PS I would like to add that, judging from their blogs, both Jim Hugunin and John Lam are intelligent, creative and well-spoken human beings. Which makes their performance all the more saddening: it may be a cultural phenomenon.