dhilst

Implementing ADT with macros in Racket using Scott encoding

I was playing with a simple ML like grammar in the past weeks involving algebraic data types at the form data <type> = <ctr> <arg0> ... <argn> | ... I got me asking myself if it would be possible to compile this to Scott encoding so that we have only functions at evaluation time. I will use option type as example here: data option = some x | none and match expressions as eliminators match x with | some x => x | none => 0 end.

To make my life easier I decided to try this with Racket. If can define a macro expanding the data definition to function definition and the match expression to function calls, then for sure I can use the same strategy to do a tree transformation in the AST to achieve the same goal. By doing it with Racket macros I don’t have to deal with the AST and evaluation details, it’s a perfect tree transformation prototyping tool, so let’s go.

Since Racket is a LISP dialect I will use these forms

(data option (some x)(none))

(match x
  ((some x) x)
  ((none) 0))

Starting with (data option (some x) (none)) the first step was to write the expansion by hand and then trying to translate it to a macro. If you don’t know Scott encoding you can Google for it as there are plenty of materials on this topic. Regarding this post, you just need to know that Scott encoding is a way to encode data as functions. The insight is that data will be a high order function that will receive one callback for each construtor (in the option data type we have two constructors, some and none) and it will call the callback correspoding do the variant that it is. It’s easier to understand with an example, so here is it for option:

;; (data option (some x) (none)) should exapands to:
(define (some x) (lambda (some none) (some x)))
(define (none) (lambda (some none) (none)))

With some and none functions defined we can use it like this:

((some 1)
 (lambda (x) (format "is some ~a" x))
 (lambda () "is none"))
;; outputs "is some 1"

((none)
 (lambda (x) (format "is some ~a" x))
 (lambda () "is none"))
;; outputs "is none"

So you can see that the data knows what it is and it calls the corresponding callback. This already works like a match expression.

It took some time to me to understand how ... works in the Racket macros and to be honest I’m not 100% confident of how it works but I found the macro that I was looking for after lots of trial and error, here is it.

(define-syntax data
  (syntax-rules ()
    [(data _ (ctr args ...) ...)
     (begin 
       (define (ctr args ...) (lambda (ctr ...) (ctr args ...)))
       ...
       )]))

I’m not a Racket expert, it took a time to understand that I need begin

Anyway, this works as expected, here are some examples of it in use

(data option (some x) (none))

((some 1)
 (lambda (x) (format "is some ~a" x))
 (lambda () "is none")) ;; outputs "is some 1"
  
((none)
 (lambda (x) (format "is some ~a" x))
 (lambda () "is none")) ;; outputs "is none"

(data bool (true) (false))

((true) 
 (lambda () "is true")
 (lambda () "is false")) ;; outputs "is true"

((false) 
 (lambda () "is true")
 (lambda () "is false")) ;; outputs "is false"

(data list (cons x tail) (nil))

((cons 1 (cons 2 (cons 3 (nil))))
 (lambda (x l) (format "is cons with ~v" x))
 (lambda () "is nil")) ;; outputs "is cons with 1"

((nil)
 (lambda (x l) (format "is cons with ~v" x))
 (lambda () "is nil")) ;; outputs "is nil"

To see the macro expansion we can use this

(pretty-print
 (syntax->datum
  (expand #'(data option (some x) (none)))))
  
;; '(begin
;;   (define-values (some) (lambda (x) (lambda (some none) (#%app some x))))
;;   (define-values (none) (lambda () (lambda (some none) (#%app none)))))

#%app is some specificity of Racket that I don’t fully understand, I know that it stands for application, but not more than that

Okay, so with data working we only need to implement the match macro and this one happens to be simpler.

We want to do this transformation

;; (match (some 1) ((some x) x) ((none) 0)) should expand to
((some 1) (lambda (x) x) (lambda () 0))

And here is it with some examples:

(define-syntax match
  (syntax-rules ()
    [(match scrutinee ((ctr args ...) body) ...)
     (scrutinee (lambda (args ...) body) ...)]))
     
(match (some 1)
  [(some x) (format "is some ~a" x)]
  [(none) "is none"]) ;; outputs "is some 1"

(match (none)
  [(some x) (format "is some ~a" x)]
  [(none) "is none"]) ;; outputs "is none"

You can find the full code here

So to conclude: It’s very feasible to have algebraic data types as syntax sugar for functions.