Alexandre Beaulne's Webpage

HOME

Toying with memoization in OCaml

While having a go at the 99 Problems in OCaml, I ran into a problem that may have potentially benefited from memoization. Despite the fancy name^[1], memoization is straightforward. When dealing with a function that is (i) often called, (ii) computationally expensive, (iii) pure (i.e. has no side-effects) and (iv) have a relatively small domain, a table of inputs-outputs (a.k.a. a "cache") is built in memory to prevent the function to be computed repeatedly on the same input(s). When the memoized function is called, the program first checks if the lookup table contains the arguments. If yes, it simply returns the previously computed result. Otherwise, the result is computed as usual, stored in the cache, and result is returned. All this unbeknownst to the caller of the function. Obviously, what we are doing is simply a trade-off between computing resources and memory storage.

Although I was acquainted with memoization in more traditional language^[2], it appeared to be a bit trickier in a functional language. After all, what's more stateful than a lookup table? But here's where the multiparadigm of OCaml enters the picture: it is possible (and easy) to use mutable variables and datastructures in OCaml, and the memoization example below takes advantage of this.

The Helloworld of memoization is the recursive computation of the N^th Fibonacci element, probably because it is the simplest function to satisfy the four conditions mentioned above. Its canonical implementation has a castastrophic (O(2^N)) runtime:

let rec fib x =
    match x with
    | 0 -> 0
    | 1 -> 1
    | x -> (fib (x - 1)) + (fib (x - 2))
;;

The memoized version is a tiny bit more involved, as some accounting needs to be done with respect to which arguments have been called so far:

let cache = Hashtbl.create 100;;
Hashtbl.add cache 0 0;;
Hashtbl.add cache 1 1;;
let rec memoized_fib x =
    match Hashtbl.mem cache x with
    | true -> Hashtbl.find cache x
    | false ->
        let result = (memoized_fib (x - 1)) + (memoized_fib (x - 2)) in
        Hashtbl.add cache x result;
        result
;;

Here's a quick breakdown of what is going on for those of use not too familiar with OCaml's syntax. A hash table named cache is declared in the global namespace (line 1). On line 5, we check if the argument x is one of the key in the cache table. If yes, we simply return the corresponding value (here's when the computational savings occurs). Otherwise, the function is called as usual, but the result is stored in the cache table (line 9) before it is returned.

Before you send me an email telling me using recursion to compute Fibonacci numbers is wrong, yes I do know there exist a closed-form solution ^[3]:

$$F_n={φ^n - ψ^n}/{√{5}}$$ where $$ φ = {1 + √{5}}/{2} $$ and $$ ψ = {1 - √{5}}/{2} $$

Or, in OCaml:

let fib_cf x =
    let phi = 1.618033988749895 in
    let psi = (-0.6180339887498949) in
    int_of_float (((phi ** (float_of_int x)) -. (psi ** (float_of_int x))) /. 2.23606797749979)
;;

Admittedly this example was cherry-picked to yield huge improvements in performance, but occurences do happen "in the wild" where functions greatly benefit from memoization. Here are the result of a simple benchmark running the two functions above:

Computing the 10th, 30th, 40th and 20th Fibonacci elements using the traditional way takes 80334.003 ms
Computing the 10th, 30th, 40th and 20th Fibonacci elements using the closed-form solution takes 0.084 ms
Computing the 10th, 30th, 40th and 20th Fibonacci elements using the memoized way takes 0.256 ms

Worth having in one's toolbox I guess. Complete code used for the benchmark is available here.

- Alex, Mar. 31^st 2013

1. The term memoization was aptly coined by British AI researcher Donald Michie. It comes from the Latin word "memorandum" which means "let it be remembered".
2. check this post for a good illustration of using decorators to do memoization in Python.
3. Note that the closed-form solution starts spitting incorrect results at the 71^st Fibonacci number on my 64-bit machine because of floating point errors.