27 June 2009

More fun with data structures, continuations and call/cc

Last time I showed a few ways of implementing insert into an unbalanced binary tree, as an implementation of a set data structure. This post adds an implementation that is similar in functionality to the example that uses call/cc, but using the more familiar exceptions to avoid allocating elements needlessly:

let insertExc x t =
    try
        let rec insertRec x t =
            match t with
            | Empty ->  cont { return Elem(empty,x,empty) }
            | Elem (left,e,right) when x < e -> 
                cont {  let! t' = insertRec x left
                        return Elem(t',e,right) }
            | Elem (left,e,right) when x > e -> 
                cont {  let! t' = insertRec x right
                        return Elem(left,e,t') }
            | Elem _ -> failwith "alreadyExists"
        insertRec x t id
    with e –> t

Performance shootout

A comment on the last post indicated that exceptions should be much slower (on .NET, at least) than the callCC approach. Furthermore, I said that callCC was necessary to avoid re-allocating elements needlessly on the search path to an already existing element. Let’s put these claims to the test with a simple benchmark:

let longList = 
    let r = new System.Random()
    let l = 100000
    [ for i in 1..l -> r.Next(0,l) ]

let insertALot f nbRuns =
    for i in 1..nbRuns do
        longList |> List.fold (fun t elem -> f elem t) empty

How many duplicates are in the long list (i.e. how much can we expect to gain by using callCC/exceptions):

> longList |> Seq.distinct |> Seq.length;;
val it : int = 63362
So about 35% duplicates. The results:
> insertALot insert 100;;
Real: 00:01:38.717, CPU: 00:01:37.516, GC gen0: 2247, gen1: 520, gen2: 8
val it : unit = ()
> insertALot insert' 100;;
Real: 00:01:44.583, CPU: 00:01:37.251, GC gen0: 4050, gen1: 1444, gen2: 18
val it : unit = ()
> insertALot insertM 100;;
Real: 00:02:35.101, CPU: 00:02:29.792, GC gen0: 12698, gen1: 1875, gen2: 24
val it : unit = ()
> insertALot insertCallCC 100;;
Real: 00:01:09.410, CPU: 00:01:02.540, GC gen0: 12149, gen1: 7, gen2: 0
val it : unit = ()
> insertALot insertExc 100;;

- Interrupt

Observe that:

  • the readability of computation expressions comes with a cost;
  • even with that cost, the callCC code is still significantly faster;
  • Exceptions are indeed horribly slow – I interrupted after about an hour.
Share this post : Technet! del.icio.us it! del.iri.ous! digg it! dotnetkicks it! reddit! technorati!

No comments:

Post a Comment